▼ Laurens Dassen, a new member of the Dutch parliament after the March elections, representing the pan-European party Volt, put several questions about the October 4th Facebook outage to the Dutch cabinet (administration). Yesterday, minister Blok of the ministry of Economic Affairs and Climate answered those. The fourth question was about BGP, among other things.
The Facebook outage was caused by installing a BGP configuration with an error in it. Which underlines what I've been saying for a long time: when all the important parts of your network are redundant and you're using BGP to reroute automatically when failures happen, the remaining outages are your fault. So quite a heavy responsibility. Redundancy wasn't an issue with this incident: Facebook has datacenters all over the world. But if you use automated tools to push out a broken configuration to all of them at once, then it's game over. Remote access also didn't work anymore, and I gather that even access to the buildings didn't work anymore. Probably not exactly what Zuckerberg had in mind with move fast and break things.
The main points: BGP worked correctly and BGP is being developed by the IETF, which is an appropriate forum for that work.
They probably don't realize that BGP has been essentially the same for 27 years: in 1994 BGP version 4 was defined and that's the version we still use today, with relatively minor additions.
Further reading and listening:
- Presentation NL-ix BGP security update
- Geoff Huston's Survey on Securing Inter-Domain Routing part one en part two (beginner-onvriendelijk) en "epic diatribe" in podcast-vorm over dit onderwerp, ook in twee delen: part 1 and part 2