Enabling & Support

Teamwork, skill enable quick Cluster recovery

19/02/2010 1690 views 3 likes

For scientists and engineers who operate satellites, one of the worst experiences is when data unexpectedly stop arriving from space. That's precisely what happened a few days ago to one of ESA's Cluster spacecraft. Its recovery required brainstorming, skill and a great deal of dynamic teamwork.

ESA's Cluster mission comprises four spacecraft orbiting in formation around Earth, relaying the most detailed ever information on how the solar wind affects our planet. Sent aloft in two dual launches in July and August 2000, the four satellites have been operating extremely well for the past decade, and are now in routine operations in an elliptical orbit between 3000 and 130 000 km above Earth.

On 4 February, the first satellite, 'Rumba' (partnered with Tango, Salsa and Samba), failed to transmit data that was due to arrive during a post-eclipse ground station contact, starting at 11:52 CET. The Agency's 35m deep space tracking station at New Norcia, Australia, reported that it was receiving only a carrier signal from Rumba - essentially an 'empty' radio signal.

The event triggered a immediate action in the Cluster Dedicated Control Room (DCR) at ESOC, ESA's European Space Operations Centre, Darmstadt, Germany.

Taking quick action

Cluster Flight Control Team on console at ESA/ESOC

"When we receive a carrier signal but no data with it, we immediately implement a pre-planned 'no telemetry' procedure, checking that everything is configured as it should be on the ground and on board Cluster. The initial ground station contact period ended at 12:35 CET, so we decided to wait until the next scheduled pass over ESA's Maspalomas station at 13:09 CET to re-configure the spacecraft to use its back-up computer," says Silvia Sangiorgi, the Deputy Spacecraft Operations Manager (SOM) for Cluster.

However, the anomaly persisted during the next ground contact, and the entire Cluster Flight Control Team (FCT) began an intensive series of checks to troubleshoot the problem and narrow down the possible causes. The telemetry data that Cluster usually sends provides the team with a complete picture of what's happening on board. But with no telemetry being received, the team couldn't get a direct update from the satellite's systems, making the problem especially difficult to diagnose.

Rumba loses its voice

"We were able to confirm that the the satellite was receiving certain types of simple, direct commands, but any commands which required the on-board computer to react were not being processed. Cluster could 'hear' us, could look at us, but couldn't 'speak' back - as though it had lost its voice after the eclipse," says Jürgen Volpp, Spacecraft Operations Manager for Cluster.

It clearly wasn't going to be a simple problem to solve.

The Cluster team worked until late on 4 February and throughout a long and frustrating 5 February, supported by an extensive group of technical experts at ESOC working on ground stations and software.

The satellite prime contractor, Astrium, sent the senior engineer responsible for the mission from Friedrichshafen, Germany, to ESOC, and the Italian maker of the on-board computer system provided in-depth support via lengthy telephone conferences. Engineers who had previously worked on Cluster and since moved to other missions also volunteered their time and experience to boost the brainstorming process.

After careful troubleshooting, it turned out that the software on both the prime and back-up on-board computers was not running correctly.

Eclipse power-down provides clue

One crucial clue lay in the fact that Rumba had just come out of a power-down eclipse when her ability to send telemetry was lost. Eclipses happen regularly, and mean that the solar panels are shadowed from the Sun by the Earth. Normally, batteries take over from the solar panels to provide vital power for the spacecraft, but Cluster's batteries - after exceeding their expected lifetime by several years - are now non-operational and the spacecraft power falls to nil during eclipses.

No one knew for certain if Rumba would be able to reboot itself given the onboard problem.

The satellite basically has to 'reboot' itself once sunlight (and power) returns, and engineers began to suspect that perhaps something hadn't gone right in the boot-up process.

There were indications that the on-board memory in a device called a 'switch-over controller' did not properly initialise after the power down. Some of the stored bits might have been incorrectly set, causing the software on Rumba to abnormally block all basic functions, including the telemetry transmissions. But it was impossible to tell for sure with no response from the satellite.

Work continued again the following day, Saturday, and now it was a race against time. The four Cluster satellites would soon enter a new eclipse and no one knew if Rumba would be able to reboot itself given the onboard problem.

That's when Ignacio Clerigo, one of the team's On-Board Data Handling engineers, proposed an ingenious way to test where the on-board software was blocked. "As the manual selection of the backup on-board computer did not have any effect, the idea was to find a set of simple commands that would allow an autonomous switch-over to the back-up computer - and hopefully reinitialise the switch-over controller module," says Clerigo.

Bad news: Peak indicates an 'empty' radio signal - no data

The commands were radioed up and, at midday on 6 February, the satellite reset itself with full, normal communications re-enabled via the back-up computer. Regular telemetry data started flowing in at 12:16 CET, as the tired team of engineers cheered and clapped. It was time for a break.

"I am extremely proud of the entire Cluster team. They demonstrated excellent technical understanding of the spacecraft and the problem, and they devised a number of very clever trouble-shooting procedures, finally recovering the satellite. They did this, I should add, while correctly operating the three other satellites of this mission and under a lot of time pressure," says Paolo Ferri, Head of the Solar & Planetary Mission Division at ESOC.

The specific cause of the error is now under investigation, but now the team have a tried-and-tested procedure in place to recover their satellite should it happen again.

Normal communications recovered

Video clip recorded in the Cluster Dedicated Control Room
6 February 2010