Major ISP Level 3 experiences downtime

Friday, October 21, 2005

On Thursday at about 23:30 PST, Level 3 Communications, one of the largest Internet service providers (ISPs), disconnected from most of the internet. The Broomfield Colorado, USA-based Level 3 is a tier 1 provider that connects smaller ISPs together in order to pass data.

There are various reasons why they may have dropped off. The official answer given on the phone was that they had an OSPF failure in Chicago that caused massive internal routing issues. This was followed by a network wide failure of BGP – the protocol that allows the Internet to route between providers.

Other possible reasons are more nefarious. Apparently, Level 3 recently demanded that its peers (other ISPs that connect to them on a mutually beneficial level) pay Level 3 $30,000 USD. Some speculate that their quarterly report released yesterday and subsequent stock price dropping caused them to make a display of power in order to convince other ISPs to pay up. If this was the case, it failed horribly as many other ISPs are now disconnecting from their network, and, due to their demands of payment, are considering not reconnecting at all. So far there is no reason to believe that this is anything more than an urban legend, as the outage impacted all other peers and customers as well.

Larger network-based projects noticed the outage immediately. The Freenode IRC network set up a channel for ongoing news and to figure out how to work around the problem. As reports came in from around the world, it became more obvious that this would not be a simple fix. Level 3 had lost connections to AT&T, Cogent, Internap, Qwest, Savvis, SBC, Sprint, UUNet, Verio, WilTel, XO, and more.

An hour into the problem, many network administrators had routed around Level 3 to avoid their problems, but Level 3 was still out. Two hours in and the response from Level 3 was that “we're having technical problems – no estimated time of completion yet”.

Finally, after about two and a half hours, Level 3 started routing packets correctly, but it could hardly be considered fully functional. Pieces of the network seemed to be going up and down at random and Level 3 tech support said they would need more time to fix the problem.

At about 3:30 PST, Level 3's services returned to normal, and they reconnected to the Internet.