RIPE 46 Intro

RIPE 46, september 1 - 5, Amsterdam

Three times a year there is a RIPE meeting. Twice a year it's in the Krasnapolsky hotel in Amsterdam, and one is elsewhere in Europe.

This week there is the RIPE 46 meeting, once again in Amsterdam. Note that if you can't attend, you can follow what's happening using the experimental streaming service. The streaming bandwidth is around 225 kbps for the highest quality but you can fall back to lower quality or audio only, I think. It works both with Windows Mediaplayer and Video Lan Client on my Mac.

There are also archives of the streamed sessions. The presentation slides are also generally available.

Permalink - posted 2003-09-01

RIPE 46 Monday - EOF

Monday

VoIP

On monday there were talks about voice over IP the whole day as part of the European Operator Forum (EOF). I had somewhat mixed feelings about this. On the one hand I'm pretty interested in VoIP, but I haven't done anything with it in practice so I was expecting to learn a few things. Unfortunately, many of the talks were way too detailed, explaining stuff like the old electromechanical switching mechanisms in the phone network. There was also lots of stuff on how to interconnect your VoIP stuff with the plain old telephone system. This could/should be interesting but I found it again too detailed. I guess I would have liked a smaller scale, more practical approach on how to call over the net and not immediately focus on the POTS network as I'm not going to get rid of my existing phone just yet.

SIP/H.323

But some cool stuff: I got to know a little bit more about SIP vs H.323. Apps like Microsoft's Netmeeting use the ITU H.323 protocol family as their signalling protocol, but today's products are more inclined towards SIP, which is an IETF standard. Note that the actual voice packets are governed by a host of other protocols. Usually, it's even possible to call using an IP address without using SIP or H.323 on VoIP (Ethernet) phones. But a SIP server/proxy provides all the features that usually come from a PABX: implementing numbering plans, connecting to gateways, transferring calls, putting calls on hold, that kind of thing.

There are now VoIP phones that cost about 60 - 75 dollars/euros and there is the free Asterisk "Open Source Linux PBX" software.

MPLS DoS traffic shunt

In the afternoon, there also was a presentation about using an MPLS DoS traffic shunt. (Also presented at NANOG.) This is basically similar to what I talk about in my anti- DoS article, but they use MPLS to backhaul the traffic to a location where there is a Riverhead anti-DoS filtering box and then push the traffic out to where it needs to go. The MPLS paths are automatically created when the right iBGP routes are present, but COLT (who implemented this) doesn't want it's customers to automatically enable this, the NOC must create the iBGP routes for this manually. Note that the reason this makes sense is that COLT has huge amounts of bandwidth and around 60 locations where they interconnect with other networks, but the Riverheads are way too expensive to buy 60 boxes.

Permalink - posted 2003-09-01

RIPE 46 Tuesday - EOF, geo aggregation, security BOF

Tuesday

Sometimes everything seems like routing

It seems the IETF doesn't like to organize its meetings around San Francisco because those meetings tend to be "zoos" because they attract too many "local tourists". Well, a local tourist is what I kind of am at the RIPE meeting this week. But unfortunately I'm not completely local: I take the train from The Hague to Amsterdam each morning. Today, after the NSF Security BOF (more on that below) I walked to the Amsterdam Central Station to take the train back home. But there was a power outage in Leiden, so I've been rerouted over Utrecht. Obviously there is lots of congestion and increased delay, but so far no commuter loss. So I'm typing this sitting on the floor of a hugely overstuffed train, two hours after I got to the train station, still no closer to home.

Anyway... More EOF

This morning's two sessions concluded the EOF part of this week's RIPE meeting. First we heard from David Malone from tcd.ie in Dublin, and in the second session from Dave Wilson, his ISP, about their efforts to enable IPv6. Some good stuff: don't use a stateless autoconfigured address (with a MAC address in it) for the DNS you register with your favorite registries: you don't want to have to change the delegation when you replace the NIC. And since there are more than enough subnets, why not splurge on a dedicated /64 for this? Makes it easy to move the primary/secondary DNS service around the network. They also talked about possible multicasting problems with switches that do ICMP snooping but are unaware of IPv6, and about next hop troubles when exchanging IPv6 routes over a BGP session with IPv4 transport or the other way around, but in both cases I think this is supposed to work, so I'll look into this to see what the real story is. Next week, that is.

Geographical aggregation

During the routing working group session in the afternoon, I once again presented my provider-internal geographical aggregation draft. (See my presentations page for a link.) I have added some stuff on why this is still useful even if we can have multihoming using multiple addresses per host:

We can deploy multihoming using geographical aggregation really fast, because we start multihoming as soon as a suitable address assignment system is operational, and clean up the routing table later in individual networks by implementing the necessary filtering.
Because optical (fiber path) switching is so cheap relative to layer 2 switching or routing (Cees de Laat of the University of Amsterdam says it's even 100 : 10 : 1) it's entirely possible we'll see on-demand fiber links. Routing obviously has to deal with this, and if we can aggregate away the uninteresting stuff we're in much better shape for this.
If a host gets several addresses with different geographical properties, the host has the opportunity to do some elimentary source routing and use the address with the shortest (or highest bandwidth) path.

Unfortunately, the room didn't seem all that receptive. Well, too bad. I guess the routing crowd isn't ready for IPv6 stuff in general and somewhat non-mainstream IPv6 stuff in particular.

NSP Security BOF

At six, there was the NSP Security Birds of a Feather session. It seems there is a special mailinglist for operators who deal with denial of service, but they keep membership limited to a few key people per ISP. Obviously that means discussing who is allowed on the list and who is kicked off takes a lot of time.

After that there was an hour-by-hour account from a Cisco incident response guy (Damir Rajnovic) about the input queue locking up vulnerability that got out in june. It seems they spend a lot of time keeping this under wraps. I'm not sure if I'm very happy about this. One the one hand, it gives them time to fix the bug, on the other hand our systems are vulnerable without us knowing about it. They went public a day or so earlier than they wanted because there were all kinds of rumours floating around. It turned out that keeping it under wraps wasn't entirely a bad idea as there was an exploit within the hour after the details got out. But I'm not a huge fan of them telling only tier-1 ISPs that "people should stay at the office" and then disclosing the information to them first. It seems to me that announcing to everyone that there will be an announcement at some time 12 or 24 hours in the future makes more sense. In the end they couldn't keep the fact that something was going on under wraps anyway. And large numbers of updated images with vague release notes also turned out to be a big clue to people who pay attention to such things. One thing was pretty good: it seems that Cisco itself is now working on reducing the huge numbers of different images and feature sets, because this makes testing hell.

Permalink - posted 2003-09-01

Power problems

The power outage in parts of Canada and the US a little over a week ago didn't cause too many problems network-wise. (I'm sure the people who were stuck in elevators, had to walk down 50 flights of stairs or walk from Manhattan to Brooklyn or Queens have a different take on the whole thing.) The phone network in general also experienced problems, mostly due to congestion. The cell phone networks were hardly usable.

There was a 2% or so decline in the number of routes in the global routing table. The interesting thing is that not all of the 2500 routes dropped off the net immediately, but some did so over the course of three hours. This would indicate depleting backup power.

See the report by the Renesys Corporation for more details.

Following the outage there was (as always) a long discussion on NANOG where several people expressed surprise about how such a large part of the power grid could be taken out by a single failure. The main reason for this is the complexity in synchronizing the AC frequencies in different parts of the grid. Quebec uses high voltage direct current (HVDC) technology to connect to the surrounding grids, and wasn't affected.

In the mean time the weather in Europe has been exceedingly hot. This gives rise to cooling problems for many power plants as the river water that many of them use gets too warm. In Holland, power plants are allowed to increase the water temperature by 7 degrees Celsius with a maximum outlet temperature of 30 degrees. But the input temperature got as high as 28 degrees in some places. So many plants couldn't work at full capacity (even with temporary 32 degree permits), while electricity demand was higher than usual, also as a result of the heat. (The high loads were also an important contributor to the problems in the US and Canada as there was little reserve capacity in the distribution grids.) The plans for rolling blackouts were already on the table when the weather got cooler over the last week.

Moral of these stories: backup power is a necessity, and batteries alone don't hack it, as the outage lasted for three days in some areas.

Permalink - posted 2003-08-24

Worms

The worm situation seems to be getting worse, but fortunately worm makers fail to exploint the full potential of the vulnerabilities they use. For instance, we were all waiting to see what would happen to the Windows Update site when the "MS Blaster" worm was going to attack it on august 16th. But nothing happened. I read that Microsoft took the site offline to avoid problems, having no intention to bring it back online again, but obviously this information was incorrect because they're (back?) online now.

However, Microsoft received help from another worm creator who took it upon him/herself to fix the security hole that Blaster exploits, by first exploiting the same vulnerability, then removing Blaster and finally downloading Microsoft's patch. But in order to help scanning, the new worm (called "Nachi") first pings potential target, leading to huge ICMP floods in some networks, although others didn't see much traffic generated by the new worm. Nachi uses an uncommong ICMP echo request packet size of 92 bytes, which makes it possible to filter the worm without having to block all ping traffic. See Cisco's recommendations. It seems TNT dial-up concentrators have a hard time handling this traffic and reboot periodically. The issue seems to be lack of memory/CPU to cache all the destinations the worm tries to contact, just like what happened with the MS SQL worm earlier this year and others before it.

Then there's the Sobig worm. This one uses email to spread, which has two advantages: the users actually see the worm so they can't ignore the issue and the network impact is negligible as the spread speed is limited by mail server capacity. Unfortunately, this one forges the source with email addresses found on the infected computer, which means "innocent bystanders" receive the blame for sending out the worm. Interestingly enough, I haven't received a single copy of either this worm or its backscatter so far, even though it seems to be rather aggressive, many people report receiving lots (even thousands) of copies.

Then on friday there was a new development as it was discovered that Sobig would contact 20 IP addresses at 1900 UTC, presumably to receive new instructions/malicious code. 19 of the addresses were offline by this time, and the remaining one immediately became heavily congested. So far, nobody has been able to determine what was supposed to happen, but presumably, it didn't.

There was some discussion on the NANOG list about whether it's a good idea to block TCP/UDP ports to help stop or slow down worms. The majority of those who posted their opinion on the subject feel that the network shouldn't interfere with what users are doing, unless the network itself is at risk. This means temporary filters when there is a really aggressive worm on the loose, but not permanently filtering every possible vulnerable service. However, a sizable minority is in favor of this. But apart from philosophical preferences, it makes little sense to do this as the more you filter, the bigger the impact for legitimate users, and it has been well-established that worms manage to bypass filters and firewalls, presumably through VPNs or because people bring in infected laptops and connect them to the internal network.

Something to look forward to: with IPv6, there are so many addresses (even in a single subnet) that simply generating random addresses and see if there is a vulnerable host there isn't a usable approach. On the other hand, I've already seen scans on a virtual WWW server which means that the scanning happened using the DNS name rather than the server's IP address, so it's unlikely that the IPv6 internet will remain completely wormfree, even if things won't be as bad as they're now in IPv4.

Permalink - posted 2003-08-24

What is wrong with this picture (DOS and worms)

What's wrong with this picture?

Under normal circumstances, not much. But when several machines are infected with an aggressive worm or participating in a denial of service attack when an attacker has compromised them, the switch will receive more traffic from the hosts that are connected to the 100 Mbps ports than it can transmit to the router, which also has a 100 Mbps port.

The result is that a good part of alll traffic is dropped by the switch. This doesn't matter much to the abusive hosts, but the high packet loss makes it very hard or even impossible for the other hosts to communicate over the net. We've seen this happen with the MS SQL worm in january of this year, and very likely the same will happen on august 16th when machines infected with the "Blaster" worm (who comes up with these silly names anyway?) start a distributed denial of service attack on the Windows Update website. Hopefully the impact will be mitigated by the advance warning.

So what can we do? Obviously, within our own networks we should make sure hosts and servers aren't vulnerable, not running and/or exposing vulnerable services and quickly fixing any and all infections. However, for service or hosting providers it isn't as simple, as there will invariably be customers that don't follow best practices. Since it very close to impossible to have a network without any places where traffic is funneled/aggregated, it's essential to have routers or switches that can handle the full load of all the hosts on the internal network sending at full speed and then filter this traffic or apply quality of service measures such as rate limiting or priority queuing.

This is what multilayer switches and layer 3 switches such as the Cisco 6500 series switches with a router module or Foundry, Extreme and Riverstone router/switches can do very well, but these are obviously significantly more expensive than regular switches. It should still be possible to use "dumb" aggregation switches, but only if the uplink capacity is equal or higher than the combined inputs. So a 48 port 10/100 switch that connects to a filter-capable router/switch with gigabit ethernet, could support 5 ports at 100 Mbps and the remaining 43 ports at 10 Mbps. In practice a switch with 24 or 48 100 Mbps ports and a gigabit uplink or 24 or 48 10 Mbps ports with a fast ethernet uplink will probably work ok, as it is unlikely that more than 40% or even 20% of all hosts are going to be infected at the same time, but the not uncommon practice of aggregating 24 100 Mbps ports into a 100 Mbps uplink is way too dangerous these days.

Stay tuned for more worm news soon.

Permalink - posted 2003-08-14

older posts - newer posts