New miminum allocation size at RIPE

The RIPE NCC has changed its policy regarding the initial allocation that new LIRs receive. The rule that efficient use for at least a /22 must be demonstrated is now off the table, and the minimum allocation is now a /21 rather than a /20. See the announcement. RIPE also maintains a list of minimum allocation and assignment sizes for their address blocks (linked from the announcement), but this is pretty much useless because filtering on allocation size is too restrictive while filtering on assignment size is isn't restrictive enough for many address blocks. So be very careful when implementing prefix length filtering.

Without the jargon, please!

Right. Most of us get our IP addresses from our ISPs, and ISPs usually have one or more blocks of IP address space of their own. Having their own address space is important for ISPs because this allows them to be independent from their ISPs by allowing them to change ISPs without having to change addresses. (Obviously this is useful to end-users as well, but this changed policy applies to ISPs.) Until now, ISPs that wanted to get address space of their own needed to show that they and/or their customers would start using 1024 addresses (a /22) immediately. In this case, they would get a block of 4096 addresses (a /20). The advantage of having such a large block is that everyone in the world is prepared to store a pointer to it in their routers, making the addresses globally usable without limitations.

Since some networks only accept routing information for the smallest address blocks that RIPE and the other Regional Internet Registries (ARIN, APNIC and LACNIC) give out to ISPs. Smaller address blocks aren't entirely useless, but they may not be globally reachable without having to depend on the ISP the addresses came from, which of course limits ISP independence.

Since RIPE is now giving out blocks of 2048 addresses (/21) from some of their address blocks, networks are expected (and pretty much forced) to accept these blocks. This is good news for small ISPs that want their own independent block: they no longer have to jump through hoops trying to show they need 1024 addresses, or make do with only semi-independent addresses.

Note that the other RIRs haven't changed their policies (or at least there are no announcements to be found). ARIN's policy for instance, is even more restrictive than the old RIPE policy: multihomed networks must show efficient use of a /21 to get a /20, single homed ISPs must even show efficient use of a full /20 to get a /20. So for now the good news only applies to ISPs in the RIPE region, which is roughly Europe, the Middle East, Africa north of the Sahara and the former Soviet Union. For more info, see the RIR policy comparison matrix.

Permalink - posted 2004-01-10

IPv6 documentation prefix and IPv6 site/host list

There is now an official IPv6 prefix set aside for documentation purposes: 2001:0DB8::/32. (Leading zero courtesy of APNIC.)

The how and why is documented at a page at APNIC. Note that there is also a prefix set aside for documentation purposes in IPv4: 192.0.2.0/24. See RFC 3330 for more information and other special IPv4 prefixes.

At prik.net there is now a list of IPv6-enabled hosts or sites. I have no idea how complete the list is, but it has more than 3000 entries so it's better than the manually maintained stuff in some other places. If the link doesn't work, this is probably because your browser doesn't understand compressed content. In that case, use the uncompressed version. The compression ratio is about 1 : 6.

Permalink - posted 2004-01-04

no ip unreachables

In an article earlier this year I talked about problems with path MTU discovery (PMTUD). (You may want to look at the links.) Quick recap:

There are network segments with widely differing maximum packet sizes connected to the internet
If a router receives a packet that is too big to forward over the next link, the router must fragment the packet
Fragmentation is hard work and should be avoided where possible
RFC 1191 says: let hosts discover the MTU for the entire path and send packets that are small enough. This is done by setting the "don't fragment" bit in the IP header and listening for ICMP messages saying the packet was too big
Although nearly all links use the ethernet 1500 byte MTU or a larger one, tunneling, which decreases the MTU, has become more and more common
The designers and implementers chose to set the DF bit in ALL packets
Result: when using an MTU smaller than 1500 bytes (which is often hard to avoid when using some form of tunneling), unrecoverable reachability problems ensue

One thing that many people may not realize is that the Cisco "no ip unreachables" directive turns off generation of ICMP "packet too big" or "fragmentation needed and DF bit set" messages for a router interface. ICMP messages have a type and a code. (IANA has the full list.) Type 3 covers different kinds of destination unreachable errors, including "packet too big" which is code 4 under type 3. Unfortunately, the Cisco documentation isn't particularly helpful. Under Configuring IP Services:

Enabling ICMP Protocol Unreachable Messages

If the Cisco IOS software receives a nonbroadcast packet destined for itself that uses an unknown protocol, it sends an ICMP protocol unreachable message back to the source. Similarly, if the software receives a packet that it is unable to deliver to the ultimate destination because it knows of no route to the destination address, it sends an ICMP host unreachable message to the source. This feature is enabled by default.

The fact that the ip unreachables command also affects sending of packet too big messages isn't even hinted at... The same goes for the description of the command in the command reference section. Only an ICMP Services Example explains a little more:

Disabling the unreachables messages will have a secondary effect—it also will disable IP Path MTU Discovery, because path discovery works by having the Cisco IOS software send Unreachables messages.

However, there still is no warning that disabling unreachables will make anything connected to links with reduced MTUs virtually unreachable, as nearly all hosts send all their packets with the DF bit set. And there are many people recommending "no ip unreachables": a Google search reveals this combination of words shows up, ironically, a little more than 1500 times.

There is a good use for this command, however: when a range of addresses is routed to the null interface, and "no ip unreachables" in configured for the null interface, any packets to the destinations in question will be dropped at the CEF level. Note that "no ip unreachables has affect on packets routed to the null interface, which is different from the behavior on other interfaces, where the command determines whether unreachables are sent back in response to packets received on the interface.

Most link types have a fairly fixed MTU (such as ethernet with its 1500 bytes) or support negotiation of the MTU (such as PPP). However, some link types make it very easy for both ends to set different MTUs. This regularly happens with Cisco's HDLC on serial links. In this case, the packets can't be received successfully at the end using the smaller MTU. Fortunately, debugging is easy: if the MTU is 1500, set it much higher, if it's larger than 1500, set it to 1500. In most cases this will clear up the problem. Or just switch to PPP... The same problem can happen on tunnels, but from what I've seen many systems just accept the too-large packets. This leads to strange path MTU discovery behavior as the link then has different MTUs in both directions, but this shouldn't be much of a problem.

I think the moral of this story is that it's probably not worth the trouble to run path MTU discovery on systems that have a 1500 byte MTU. Since all systems behind links that don't support 1500 bytes need to implement ugly hacks such as clearing the DF bit or rewriting the TCP MSS option anyway, the resulting increase in fragmentation on the network will be negligible and it should save significantly on debugging.

Can't get enough of fragments? Have a look at a CAIDA analysis.

Permalink - posted 2003-12-30

NetworkWorldFusion on fortifying BGP and IPv6 being cheap and easy

I ran into two interesting articles at NetworkWorldFusion:

A while ago, they ran an article about BGP and BGP security under the title Fortifying BGP: No quick fix. The article doesn't go into much depth, but it has some interesting quotes. Fred Baker from Cisco says S-BGP is dead in the water, while S-BGP proponent Steve Kent at BBN speaks harsh words about Cisco's soBGP, indicating there are options in soBGP that are disastrous from a security standpoint and architectural problems. (The exact nature of the problems remains unclear, though.)

BGP Security at BGPexpert.com.

The other article, IPv6 fears seen unfounded, reports that early adopters find the transition to IPv6 both cheaper and easier to do than expected. Since the protocol has been in development for so long (8 to 10 years, depending on the IPv6 epoch of choice), most hardware and software vendors have now implemented the protocol so it's available (at no additional cost) now that people start adopting IPv6. Turning on the protocol turned out to be a fairly simple affair as well for the people quoted in the article.

Permalink - posted 2003-12-15

DNS and routing of IPv6 micro allocations

Currently, there are very few people who want to run an IPv6-only network. And that's a good thing too, as presently, there is no way to do this. One of the big hurdles is the DNS. Right now, very few, if any, top level domains accept IPv6 glue records. However, there are no technical reasons why those can't be added. Unfortunately, there is a technical reason why making the existing root nameservers perform their function over IPv6 is problematic. When a nameserver starts up, it looks at a local file for root servers. However, it will only use this list of root servers for a single query: one that results in the list of current root servers. In order to avoid problems, it's important that the answer for this query contains all the addresses for the root servers as additional information. The problem is that the original DNS specifications allow a relatively short packet size (around 512 bytes). This allows for the current 13 root servers and their IPv4 addresses with little room to spare.

But in the mean time some root server operators are experimenting with making the root service available over IPv6. (See http://www.root-servers.org/ for more information.) At the time of this writing, four root servers have IPv6 addresses:

B: 2001:478:65::53
F: 2001:500::1035
H: 2001:500:1::803f:235, and
M: 2001:dc3::35

However, only B and M are reachable (for me). A closer look at the addresses used provides the following information:

B: 2001:478:65::53 -> 2001:478::/32 @ ARIN: EP.NET
F: 2001:500::1035 -> 2001:500::/48 @ ARIN -> Internet Software Consortium
H: 2001:500:1::803f:235 -> 2001:500:1::/48 @ ARIN: U.S. Army Research Laboratory
M: 2001:dc3::35 -> 2001:dc3::/32 @ APNIC: M-ROOT-DNS-IPv6-20030619

The plot thickens... Since everyone and their little sister can easily obtain a /48 worth of IPv6 address space (I have two of those for personal use), it's expected that the global IPv6 routing table will suffer a lot of pollution from /48s, much like what happens with /24s in the IPv4 routing table, only worse. So it's unavoidable to filter on prefix length and not accept /48s.

(Additionally, it looks like the H /48 isn't announced at all: the route doesn't show up on the AMS-IX IPv6 looking glass, which does show the F /48 and other more specifics.)

When this issue came up on the IETF mailinglist, Paul Vixie, operator of the F root server, indicated that he had simply followed ARIN guidelines and obtained a /48 "micro allocation" from ARIN. It turns out ARIN has set aside a some address space for internet exchanges and "critical infrastructure". This address space is given out as /48s, see List of IPv6 Micro-allocations. (RIPE has a somewhat similar page at Smallest RIPE NCC Allocation / Assignment Sizes but it doesn't mention micro allocations.) All of this seems perfectly reasonable, except for one thing:

the existence of micro allocations is never mentioned in the RIR's IPv6 policy document.

This document, which is available in slightly different layouts and versions from LACNIC, APNIC, RIPE, ARIN and, for good measure, from IANA, says:

"4.3. Minimum Allocation

RIRs will apply a minimum size for IPv6 allocations, to facilitate prefix-based filtering.

The minimum allocation size for IPv6 address space is /32."

And this is exactly what many ISPs that offer IPv6 service do: they filter on a prefix length of 32 bits as indicated above, or 35 bits, the old allocation size. Obviously someone dropped the ball big time here, and this needs to be fixed in one way or another. Watch this space for more information. In the mean time, be sure to selectively relax your filters if you do prefix based filtering in IPv6. Gert Döring maintains a set of IPv6 BGP filter recommendations.

Permalink - posted 2003-12-09

RIPE 46 Wednesday - Routing, IPv6

Wednesday

Routing

Wednesday brought sessions about my two favorite subjects: routing and IPv6. However, I didn't find most of the routing subjects very interesting: RIS Update, Verification of Zebra as a BGP Measurement Instrument, Comparative analysis of BGP update metrics. The last one sounds kind of interesting but it comes down to a long analysis of what you get when you compare BGP updates gathered at different locations such as the Amsterdam Internet Exchange looking glass and the Oregon Internet Exchange Route Views.

Yesterday's presentation in the routing wg about bidirectional forwarding detection (that I completely forgot about during all the train rerouting) was much more interesting. Daves Katz and Ward wrote an Internet Draft draft-katz-ward-bfd-01.txt for a new protocol that makes it possible for routers to check whether the other side is still forwarding. This goes beyond the link keepalives that many protocols employ, because it also tests if there is any actual forwarding happening. And the protocol works for unidrectional links and to top it all off, it works at millisecond granularity. There is a lot of interest in this protocol, so there is considerable pressure to get it finished soon.

But wednesday's routing session wasn't a complete write-off as Pascal Gloor presented the Netlantis Project. This is a collection of BGP tools. Especially the Graphical AS Matrix Tool is pretty cool: it shows you the interconnections between ASes. I'm not exactly sure how it decides which ASes to include, but it still provides a nice overview.

IPv6

In the afternoon there was the IPv6 working group session which conflicted with the Technical Security working group session which I would also have liked to attend...

Kurtis Lindqvist presented an IETF multi6 wg update.

Gert Doering talked about the IPv6 routing table. Apart from the size, there are some notable differences with IPv4: IPv6 BGP interconnection doesn't reflect business relationship or anything close to physical topology: people are still giving away free IPv6 transit and tunneling all over the place. This is getting better, though. (The problem with this is that you get lots of routes but no way to know in advance which are good. Nice to have free transit, not so nice when it's over a tunnel spanning the globe.) There are now nearly 500 entries in the global IPv6 table, which is nearly twice as much as two years ago. About half of those are /32s from the RIRs (2001::/16 space), and the rest more or less equally distributed over /35s from the RIRs and /24s, /28s and /32s from 6bone space (3ffe::/16).

I'm not sure if it was Gert, but someone remarked during a presentation: "In Asia, they run IPv6 for production. In Europe, they run it for fun. In the US, they don't run it at all."

Jeroen Massar talked about "ghost busting". When the Regional Internet Registries started giving out IPv6 space, they assigned /35s to ISPs. Later they changed this to /32s. The assignments were done in such a way that an ISP could simply change their /35 announcement to a /32 announcement "in place". However, this is not entirely without its problems as the BGP longest match first rule dictates that a longer prefix is always preferred (such as a /35 over a /32), regardless of the AS path length or other metrics. With everyone giving away free transit, there are huge amounts of potential longer paths that BGP will explore before the /35 finally disappears from the routing table and the /32 is used.

To add insult to injury, there appear to be bugs that make very long AS paths stay around when they should have disappeared. These are called "ghosts" so hence the ghost busting. See the Ghost Route Hunter page for more information.

Permalink - posted 2003-09-16

older posts - newer posts