Software failure’s banner week

A tweet posted by an upset United Airlines customer last week.

A tweet posted by an upset United Airlines customer last week.

Wow, what a week for software failure! (No, we’re not cheering for failure. Read the exclamation point as a jeer.)

On the same day one of the world’s largest airlines was brought to a halt, the New York Stock Exchange stopped trading – all because of entirely avoidable systems issues. And that wasn’t all the failure we saw.

Here’s the worst from the week that was.

NYSE failure pinned on ‘software update’

The knee-jerk reaction these days is to blame faceless “hackers” anytime computers fail us, but often you find the actual cause is less malicious and more simple, human bumbling. So it appears to have gone in the case of the New York Stock Exchange.

A busy July trading day was brought to a screeching halt about 11:30 a.m. Eastern time because of what the market called a major computer “malfunction.” Actually, officials told brokers they were investigating a “reported issue with a gateway connection.” Pretty generic explanation. Trading was offline until 3:10 p.m. Eastern.

After 24 hours of forensic analysis, the exchange explained that the cause was the rollout of a software update – not an attack by bad guys. The software in question is used to properly determine timestamps on trades.

“It was determined that the NYSE … customer gateways were not loaded with the proper configuration compatible with the new release” once markets opened, the exchange said. That triggered security measures that eventually shut down the whole market and canceled outstanding orders.

This totally preventable problem – preventable through simple simulation testing – was disappointing to analysts. As Larry Tabb, CEO of TABB Group, told The Wall Street Journal, the outage “indicates more needs to be done.” Wrote Journal Steve Rosenbush, “After years of investments in resilient and redundant networks, problems with financial technology are remarkably persistent.”

United Airlines grounds flights because of swamped router

Remember the days of going out for coffee while you waited for your Fuzzball router to download a program you really needed at 56k? Yeah, those days are gone. Right?

Well, maybe not so much at United Airlines, which (on the same day as the NYSE crash!) inconvenienced passengers on 4,900 flights because of overloaded routers that feed United’s automated operations. That system handles everything from reservations to crew scheduling.

Because of security precautions – terrorist no-fly lists are incorporated into the reservations system – any outage can bring the system to a halt. The ground stop came at 8:26 Eastern time.

We’ve reported here frequently that airlines seem especially prone to such problems. As Michael Ibbitson, who leads operations at London’s Gatwick Airport, told CNN, “This incident reflects the fact that much of the airline industry’s computer and technology programs are old and have been cobbled together.”

That cobbling has accelerated in recent years because of numerous airline mergers.

U.S. State Department system fails; Visas delayed

In yet another system failure that definitely cannot be blamed on cyber attackers, the U.S. State Department was unable to issue visas for several weeks because of an unspecified failure.

The failure brought down the biometric processing capability throughout the worldwide Consular Consolidated Database, which allows the United States to issue an estimated 50,000 travel visas every day.

According to Australia’s edition of Computerworld, the same database failed last year, ruining vacation plans for around 200,000 people. That failure was blamed on a buggy software patch.

The State Department is saying little about the latest snafu except that the database needed to be rebuilt. It did rule out a cyber attack, though. Which just goes to show: The problem most often isn’t with the bad guys. It’s with us and what we do – or fail to do – before going live.