The Silicon Graybeard

In the World of the High Tech Redneck, the Graybeard is the old guy who earned his gray by making all the mistakes, and tries to keep the young 'uns from repeating them. Silicon Graybeard is my term for an old hardware engineer; a circuit designer. The focus of this blog is on doing things, from radio to home machine shops and making all kinds of things, along with comments from a retired radio engineer, that run from tech, science or space news to economics; from firearms to world events.

Friday, April 17, 2015

On the 45th Anniversary of Apollo 13

Apollo 13 lifted off at 1313 CST on April 11, 1970 for the start of a planned mission to the lunar Fra Mauro highlands. Of course, Apollo 13 suffered the worst inflight crisis that a US mission had ever faced, one that could have led to the loss of the vehicle and crew when an oxygen tank exploded two days en route to the moon during a routine mixing operation on one of the liquid oxygen tanks. The mission returned 45 years ago today.

Yesterday, Ars Technica posted an article on the mission including technical details I had never heard of. The central question is why did the tank explode in the first place? The best answer appears to be a combination of bad luck, bad mistakes, and insufficient design of the monitors used to determine the health of the tank.

Apollo 13’s big problem centered around the second of the two oxygen tanks—called, appropriately enough, "tank no. 2." The spherical tank had been manufactured years earlier by Beech Aircraft under contract to North American Rockwell, and it was originally fitted to the Apollo 10 service module in 1969. Some time before Apollo 10’s launch, the tank was removed from the Apollo 10 service module for maintenance or modification, and it was dropped. It fell from a height of about two inches.

Rather than re-use a potentially damaged tank, another was fitted to Apollo 10. Meanwhile, the dropped tank was inspected and no damage was found. However, the external inspection missed one red flag. Internally, a fill line suffered slight damage.

NASA assigned the seemingly undamaged tank to fly in Apollo 13’s service module. Extensive testing took place again prior to launch, and during one test, the tank couldn’t be properly purged of liquid oxygen (this was done by feeding gaseous oxygen into the tank to push the liquid oxygen out; the damaged fill line made that impossible). The testing team decided to empty the tank by heating it up and forcing the liquid oxygen to boil off.

Here, a significant mistake occurred.

The tank’s heater—normally used to keep the tank’s temperature and pressure elevated to facilitate the flow of oxygen—had been designed to accept power from the spacecraft’s 28-volt DC system, but it was connected to the ground’s 65-volt DC system for eight hours. The high-voltage current welded the (28V) heater switches closed, preventing automated shut-off, and the temperature in the tank rose to more than 1,000 degrees Fahrenheit. The tank’s internal thermometer could display a maximum temperature of only 80 degrees Fahrenheit. Nothing external indicated a problem.

This overnight bake-in did the job of emptying the tank, but it also caused an unknown amount of damage to the tank’s internals. A NASA report suggests that "serious damage" was done to the Teflon insulation coating the tank’s internal wiring. [Bold and reference to the switches being rated for 28V added: SiG]

The author, Lee Hutchinson, does a good job of looking into the major factors involved in the explosion and the recovery from the disaster. Ground controllers had to make a large number of extremely complex, extremely difficult choices on what actions needed to be taken to save the crew, and it was all accomplished very quickly: within six hours of the explosion.

I'm sure you've encountered the idea that children today, even high school age, can't imagine that we went to the moon; the technology is incomprehensibly primitive to today's kids who have almost had a smart phone in their hands since they were in utero. That primitive technology contributed to the problems with determining how to save the crew.

The discrete steps that led to the Apollo 13 explosion were each minor mistakes—the tank drop, the lack of internal inspection, the botched propellant drain, and the disastrous high-temperature bake. Taken in total, the explosion caused so many simultaneous problems that the controllers on the ground at first had a hard time believing what they were seeing was real. (Apollo flight controller Sy) Liebergot explained that the error reporting mechanisms available at the time were relatively primitive. Systems that went out of their normal operating boundaries would trigger warning lights on flight controllers’ consoles, but those lights wouldn’t stay illuminated when problems passed. Playing back recent telemetry from tapes wasn’t an instantaneous affair, and tracking problems often required pencil and paper. Troubleshooting multi-part failures was extremely complicated.

The thing that saved Apollo 13 more than anything else was the fact that the controllers and the crew had both conducted hundreds—literally hundreds—of simulated missions. Each controller, plus that controller’s support staff, had finely detailed knowledge of the systems in their area of expertise, typically down to the circuit level. The Apollo crews, in addition to knowing their mission plans forward and backward, were all brilliant test pilots trained to remain calm in crisis (or "dynamic situations," as they’re called). They trained to carry out difficult procedures even while under extreme emotional and physical stress.

For Apollo 13, keeping calm and working the problems as they appeared allowed three astronauts to escape unharmed from a complex failure. The NASA mindset of simulate, simulate, simulate meant that when things did go wrong, even something of the magnitude of the Apollo 13 explosion, there was always some kind of contingency plan worked out in advance. Controllers had a good gut-level feel for the limits of the spacecraft’s systems when trying to work through emergency problems.

Apollo 13 Service Module - NASA photo. The crew cut loose of the SM five hours before reentry, as a normal mission profile would. It provided the crew with their first and only opportunity to see the damage the explosion had caused. Three and a half hours later, the Lunar Module Aquarius was also cut loose. Aquarius had acted as a lifeboat for the crew, being pushed well beyond its design intentions to keep them alive for the 3 day journey back to Earth. Even the steely-eyed missile men dropped their air of invincibility long enough to say, "Farewell, Aquarius, and we thank you."

For 16 years, Apollo 13 was the worst disaster to hit a NASA mission in flight and it was actually a successful mission, at least in the sense that the crew survived and the mission ended with the same landing profile as was intended at liftoff. Astronauts were lost in that interim, but the Challenger explosion in 1986 easily eclipsed Apollo 13 as a space disaster by taking out the vehicle and seven person crew. It was an almost-identical 17 years (to the week) later than Challenger in 2003, when the Shuttle Columbia broke up on re-entry, destroying the vehicle and killing all seven aboard. I'd say that any Astronauts scheduled for a mission in 2019 or 2020 might want to be a bit more careful with checking things, but we don't fly manned missions any more.

6 comments:

DivemedicApril 18, 2015 at 6:51 AM
We don't fly in space any more, and we likely never will.
Back in those days, people were expected to provide for themselves. We used the might of our country to build things, and to elevate the human race to its potential. We were reaching for greatness, and great men were expected to achieve great things.
Today, people are pitied and made into perpetual victims when they do not provide for themselves. The might of our country is used to pay people not to accomplish anything. Those who do accomplish things are punished through taxes. We now reach for mediocrity and equality of outcome, and great men are accused of cheating the system and having some sort of unseen, innate "privilege." I swear to you, that living in this country today is like living in a novel that is cowritten by George Orwell and Ayn Rand.
ReplyDelete
Replies
SiGraybeardApril 18, 2015 at 11:27 AM
I swear to you, that living in this country today is like living in a novel that is cowritten by George Orwell and Ayn Rand.

Well, said, DiveMedic. Every newscast seems like a cut out of "Atlas Shrugged".

ReplyDelete
Replies
AnonymousApril 18, 2015 at 4:07 PM
I can send you the JSC Safety and Mission Assurance Qualification Training powerpoint of BA_603_Apollo_13_case_study_Nov_2004.

The proximate cause was a CM failure. The third tier contractor never got the change notice on the 65V change.
ReplyDelete
Replies
SiGraybeardApril 18, 2015 at 5:11 PM
Anon - that would be interesting. I'm sigraybeard at gmail dot com - with appropriate symbols.

At the bottom of that ARS Technica article there's a link to an article on IEEE Spectrum, the general reading journal of the Institute of Electrical and Electronics Engineering. The author of that ARS piece says, "If you've got an hour or so to spare and want to go a lot deeper, the 2005 IEEE Spectrum article "Apollo 13, We Have a Solution" is the most comprehensive account of the Apollo 13 accident and its aftermath available on the Web."

ReplyDelete
Replies
Jonathan HApril 20, 2015 at 2:22 PM
I investigate industrial fatal and non fatal accidents for a living - most accidents are caused by a series of smaller factors that add up to a big problem. Typically there are maintenance and training components to every accidents, and many accidents, though definitely not all, have a design component to them as well.
ReplyDelete
Replies
WeetabixApril 20, 2015 at 3:15 PM
Great post. I like the geekery.
ReplyDelete
Replies

Add comment