Broken Glass, Sharp Tempers

Optical fiber is fast, but fragile. Without redundancy built into our networks, we're destined for more of last week's Internet outages.

The past week was, to say the least, a difficult time for the Internet. Gigaswitch Number 1 at MAE-West went offline due to a power failure - not once but twice. This switch is arguably one of the five most important pieces of hardware on the Internet, but apparently, MFS, which operates MAE-West, wasn't providing an adequate amount of power protection.

Online, several network operators have been making jokes comparing MAE-West with the electrically-challenged Mir space station. However, the effects of this downtime are no joke. When the MAE went down, Internet traffic was affected worldwide. The traffic that should have been routed through the MAE was correctly re-routed elsewhere, but other routers and connections quickly became overloaded, and parts of the network became nearly unusable.

But power issues were just the start. On two consecutive days, two separate backhoe operators broke underground fiber optic cables. Wednesday morning, a WorldCom trunk running from Los Angeles to Las Vegas was cut in the California desert. This severed almost 500 45-Mbps DS-3 connections, and brought much of the Internet to a crawl for several hours. This was not expected behavior. The Internet was designed to automatically reroute traffic around damaged connections. However, to route around a problem, a network provider must have access to an adequate amount of reserve bandwidth that doesn't rely on the damaged connection. WorldCom simply did not have this available. It had consolidated many of its connections onto a single cable, which was more efficient and economical, but had disastrous repercussions.

Nobody knows this better than Nathan Stratton, president of Netrail, an Internet service provider. When he purchased connectivity from WorldCom, he was assured that his three DS-3 lines were connected via divergent paths (so at no point should all three of his connections go across the same cable). On Wednesday, he discovered that was not the case. Because of the cut, NetRail's POP in Palo Alto, California, was completely cut off from the Internet for most of the day. Stratton was irate, and is now considering changing providers because of WorldCom's outage. "[Changing our routes so] all three of our connections go over the same fiber without telling us was a bit too much for me."

Thursday afternoon, during railroad construction in Laurel, Maryland, another optical cable operated by WorldCom was cut. This one affected some Internet connectivity on the East Coast, and some toll-free telephone lines operated by Cable and Wireless. However, unlike the cut in the desert, there were enough redundant paths for most of the traffic to be routed around the damaged piece of network. Rob Deker, a network operator at Digex, reports that his network was "essentially unaffected" shortly after the cut.

Steven Balbach, an operator at ClarkNet, an ISP partially affected by the outage, went out to look along the railroad tracks for the scene of the cut. Soon, he came across a number of vans, a back hoe and a crowd of people loitering near the tracks. "A piece of black cable about the width of a quarter lay strung along the tracks and one piece could be seen sticking out of the dirt, the white innards splayed out," he reported.

"It was a single cable laid in the dirt carrying 144 strands of fiber," he continued. "This cut was not if, but when. And more cuts like it. There was no protection for the cable, no markers of the cable location."

Balbach's discovery was one of those little details that's known, but usually overshadowed by the alluring rush of high-technology. Despite its massive bandwidth, optical fiber is fragile, much more fragile than copper wire. Because of this, it is much more likely to get broken, and because a single fiber can carry the traffic of dozens of wires, the breaks are much more likely to cause a major outage.

WorldCom, for its part, reacted as quickly as it could. Spokeswoman Linda Laughlin said that the company was able to re-route the traffic from the Los Angeles cut in 3 and a half hours - fast, but not nearly fast enough for WorldCom customers.

There's really no way to predict when and where a construction crew might dig up a cable. But this unpredictability is exactly why, as carriers continue to consolidate their traffic - moving copper circuits to fiber and wrapping fibers into ever larger bundles - divergent paths must be built into their networks to maintain reliability.

And more than anything else, they need to protect their cables; they're far too vital to be so easily broken. Anyone who looks at the events of this week can see that redundancy is an expensive but necessary aspect of networking, and it is not something that any carrier can afford to overlook.