IPv6 documentation prefix and IPv6 site/host list

There is now an official IPv6 prefix set aside for documentation purposes: 2001:0DB8::/32. (Leading zero courtesy of APNIC.)

The how and why is documented at a page at APNIC. Note that there is also a prefix set aside for documentation purposes in IPv4: 192.0.2.0/24. See RFC 3330 for more information and other special IPv4 prefixes.

At prik.net there is now a list of IPv6-enabled hosts or sites. I have no idea how complete the list is, but it has more than 3000 entries so it's better than the manually maintained stuff in some other places. If the link doesn't work, this is probably because your browser doesn't understand compressed content. In that case, use the uncompressed version. The compression ratio is about 1 : 6.

Permalink - posted 2004-01-04

no ip unreachables

In an article earlier this year I talked about problems with path MTU discovery (PMTUD). (You may want to look at the links.) Quick recap:

There are network segments with widely differing maximum packet sizes connected to the internet
If a router receives a packet that is too big to forward over the next link, the router must fragment the packet
Fragmentation is hard work and should be avoided where possible
RFC 1191 says: let hosts discover the MTU for the entire path and send packets that are small enough. This is done by setting the "don't fragment" bit in the IP header and listening for ICMP messages saying the packet was too big
Although nearly all links use the ethernet 1500 byte MTU or a larger one, tunneling, which decreases the MTU, has become more and more common
The designers and implementers chose to set the DF bit in ALL packets
Result: when using an MTU smaller than 1500 bytes (which is often hard to avoid when using some form of tunneling), unrecoverable reachability problems ensue

One thing that many people may not realize is that the Cisco "no ip unreachables" directive turns off generation of ICMP "packet too big" or "fragmentation needed and DF bit set" messages for a router interface. ICMP messages have a type and a code. (IANA has the full list.) Type 3 covers different kinds of destination unreachable errors, including "packet too big" which is code 4 under type 3. Unfortunately, the Cisco documentation isn't particularly helpful. Under Configuring IP Services:

Enabling ICMP Protocol Unreachable Messages

If the Cisco IOS software receives a nonbroadcast packet destined for itself that uses an unknown protocol, it sends an ICMP protocol unreachable message back to the source. Similarly, if the software receives a packet that it is unable to deliver to the ultimate destination because it knows of no route to the destination address, it sends an ICMP host unreachable message to the source. This feature is enabled by default.

The fact that the ip unreachables command also affects sending of packet too big messages isn't even hinted at... The same goes for the description of the command in the command reference section. Only an ICMP Services Example explains a little more:

Disabling the unreachables messages will have a secondary effect—it also will disable IP Path MTU Discovery, because path discovery works by having the Cisco IOS software send Unreachables messages.

However, there still is no warning that disabling unreachables will make anything connected to links with reduced MTUs virtually unreachable, as nearly all hosts send all their packets with the DF bit set. And there are many people recommending "no ip unreachables": a Google search reveals this combination of words shows up, ironically, a little more than 1500 times.

There is a good use for this command, however: when a range of addresses is routed to the null interface, and "no ip unreachables" in configured for the null interface, any packets to the destinations in question will be dropped at the CEF level. Note that "no ip unreachables has affect on packets routed to the null interface, which is different from the behavior on other interfaces, where the command determines whether unreachables are sent back in response to packets received on the interface.

Most link types have a fairly fixed MTU (such as ethernet with its 1500 bytes) or support negotiation of the MTU (such as PPP). However, some link types make it very easy for both ends to set different MTUs. This regularly happens with Cisco's HDLC on serial links. In this case, the packets can't be received successfully at the end using the smaller MTU. Fortunately, debugging is easy: if the MTU is 1500, set it much higher, if it's larger than 1500, set it to 1500. In most cases this will clear up the problem. Or just switch to PPP... The same problem can happen on tunnels, but from what I've seen many systems just accept the too-large packets. This leads to strange path MTU discovery behavior as the link then has different MTUs in both directions, but this shouldn't be much of a problem.

I think the moral of this story is that it's probably not worth the trouble to run path MTU discovery on systems that have a 1500 byte MTU. Since all systems behind links that don't support 1500 bytes need to implement ugly hacks such as clearing the DF bit or rewriting the TCP MSS option anyway, the resulting increase in fragmentation on the network will be negligible and it should save significantly on debugging.

Can't get enough of fragments? Have a look at a CAIDA analysis.

Permalink - posted 2003-12-30

NetworkWorldFusion on fortifying BGP and IPv6 being cheap and easy

I ran into two interesting articles at NetworkWorldFusion:

A while ago, they ran an article about BGP and BGP security under the title Fortifying BGP: No quick fix. The article doesn't go into much depth, but it has some interesting quotes. Fred Baker from Cisco says S-BGP is dead in the water, while S-BGP proponent Steve Kent at BBN speaks harsh words about Cisco's soBGP, indicating there are options in soBGP that are disastrous from a security standpoint and architectural problems. (The exact nature of the problems remains unclear, though.)

BGP Security at BGPexpert.com.

The other article, IPv6 fears seen unfounded, reports that early adopters find the transition to IPv6 both cheaper and easier to do than expected. Since the protocol has been in development for so long (8 to 10 years, depending on the IPv6 epoch of choice), most hardware and software vendors have now implemented the protocol so it's available (at no additional cost) now that people start adopting IPv6. Turning on the protocol turned out to be a fairly simple affair as well for the people quoted in the article.

Permalink - posted 2003-12-15

DNS and routing of IPv6 micro allocations

Currently, there are very few people who want to run an IPv6-only network. And that's a good thing too, as presently, there is no way to do this. One of the big hurdles is the DNS. Right now, very few, if any, top level domains accept IPv6 glue records. However, there are no technical reasons why those can't be added. Unfortunately, there is a technical reason why making the existing root nameservers perform their function over IPv6 is problematic. When a nameserver starts up, it looks at a local file for root servers. However, it will only use this list of root servers for a single query: one that results in the list of current root servers. In order to avoid problems, it's important that the answer for this query contains all the addresses for the root servers as additional information. The problem is that the original DNS specifications allow a relatively short packet size (around 512 bytes). This allows for the current 13 root servers and their IPv4 addresses with little room to spare.

But in the mean time some root server operators are experimenting with making the root service available over IPv6. (See http://www.root-servers.org/ for more information.) At the time of this writing, four root servers have IPv6 addresses:

B: 2001:478:65::53
F: 2001:500::1035
H: 2001:500:1::803f:235, and
M: 2001:dc3::35

However, only B and M are reachable (for me). A closer look at the addresses used provides the following information:

B: 2001:478:65::53 -> 2001:478::/32 @ ARIN: EP.NET
F: 2001:500::1035 -> 2001:500::/48 @ ARIN -> Internet Software Consortium
H: 2001:500:1::803f:235 -> 2001:500:1::/48 @ ARIN: U.S. Army Research Laboratory
M: 2001:dc3::35 -> 2001:dc3::/32 @ APNIC: M-ROOT-DNS-IPv6-20030619

The plot thickens... Since everyone and their little sister can easily obtain a /48 worth of IPv6 address space (I have two of those for personal use), it's expected that the global IPv6 routing table will suffer a lot of pollution from /48s, much like what happens with /24s in the IPv4 routing table, only worse. So it's unavoidable to filter on prefix length and not accept /48s.

(Additionally, it looks like the H /48 isn't announced at all: the route doesn't show up on the AMS-IX IPv6 looking glass, which does show the F /48 and other more specifics.)

When this issue came up on the IETF mailinglist, Paul Vixie, operator of the F root server, indicated that he had simply followed ARIN guidelines and obtained a /48 "micro allocation" from ARIN. It turns out ARIN has set aside a some address space for internet exchanges and "critical infrastructure". This address space is given out as /48s, see List of IPv6 Micro-allocations. (RIPE has a somewhat similar page at Smallest RIPE NCC Allocation / Assignment Sizes but it doesn't mention micro allocations.) All of this seems perfectly reasonable, except for one thing:

the existence of micro allocations is never mentioned in the RIR's IPv6 policy document.

This document, which is available in slightly different layouts and versions from LACNIC, APNIC, RIPE, ARIN and, for good measure, from IANA, says:

"4.3. Minimum Allocation

RIRs will apply a minimum size for IPv6 allocations, to facilitate prefix-based filtering.

The minimum allocation size for IPv6 address space is /32."

And this is exactly what many ISPs that offer IPv6 service do: they filter on a prefix length of 32 bits as indicated above, or 35 bits, the old allocation size. Obviously someone dropped the ball big time here, and this needs to be fixed in one way or another. Watch this space for more information. In the mean time, be sure to selectively relax your filters if you do prefix based filtering in IPv6. Gert Döring maintains a set of IPv6 BGP filter recommendations.

Permalink - posted 2003-12-09

RIPE 46 Wednesday - Routing, IPv6

Wednesday

Routing

Wednesday brought sessions about my two favorite subjects: routing and IPv6. However, I didn't find most of the routing subjects very interesting: RIS Update, Verification of Zebra as a BGP Measurement Instrument, Comparative analysis of BGP update metrics. The last one sounds kind of interesting but it comes down to a long analysis of what you get when you compare BGP updates gathered at different locations such as the Amsterdam Internet Exchange looking glass and the Oregon Internet Exchange Route Views.

Yesterday's presentation in the routing wg about bidirectional forwarding detection (that I completely forgot about during all the train rerouting) was much more interesting. Daves Katz and Ward wrote an Internet Draft draft-katz-ward-bfd-01.txt for a new protocol that makes it possible for routers to check whether the other side is still forwarding. This goes beyond the link keepalives that many protocols employ, because it also tests if there is any actual forwarding happening. And the protocol works for unidrectional links and to top it all off, it works at millisecond granularity. There is a lot of interest in this protocol, so there is considerable pressure to get it finished soon.

But wednesday's routing session wasn't a complete write-off as Pascal Gloor presented the Netlantis Project. This is a collection of BGP tools. Especially the Graphical AS Matrix Tool is pretty cool: it shows you the interconnections between ASes. I'm not exactly sure how it decides which ASes to include, but it still provides a nice overview.

IPv6

In the afternoon there was the IPv6 working group session which conflicted with the Technical Security working group session which I would also have liked to attend...

Kurtis Lindqvist presented an IETF multi6 wg update.

Gert Doering talked about the IPv6 routing table. Apart from the size, there are some notable differences with IPv4: IPv6 BGP interconnection doesn't reflect business relationship or anything close to physical topology: people are still giving away free IPv6 transit and tunneling all over the place. This is getting better, though. (The problem with this is that you get lots of routes but no way to know in advance which are good. Nice to have free transit, not so nice when it's over a tunnel spanning the globe.) There are now nearly 500 entries in the global IPv6 table, which is nearly twice as much as two years ago. About half of those are /32s from the RIRs (2001::/16 space), and the rest more or less equally distributed over /35s from the RIRs and /24s, /28s and /32s from 6bone space (3ffe::/16).

I'm not sure if it was Gert, but someone remarked during a presentation: "In Asia, they run IPv6 for production. In Europe, they run it for fun. In the US, they don't run it at all."

Jeroen Massar talked about "ghost busting". When the Regional Internet Registries started giving out IPv6 space, they assigned /35s to ISPs. Later they changed this to /32s. The assignments were done in such a way that an ISP could simply change their /35 announcement to a /32 announcement "in place". However, this is not entirely without its problems as the BGP longest match first rule dictates that a longer prefix is always preferred (such as a /35 over a /32), regardless of the AS path length or other metrics. With everyone giving away free transit, there are huge amounts of potential longer paths that BGP will explore before the /35 finally disappears from the routing table and the /32 is used.

To add insult to injury, there appear to be bugs that make very long AS paths stay around when they should have disappeared. These are called "ghosts" so hence the ghost busting. See the Ghost Route Hunter page for more information.

Permalink - posted 2003-09-16

IPv4 Address Lifetime Expectancy Revisited

At the end of the thursday plenary at the RIPE 46 meeting, Geoff Huston presented IPv4 Address Lifetime Expectancy Revisited (PDF).

If looking at the slides leaves you puzzled, have a look at one of Geoff's columns from a few months ago that (as always) explains everything both in great detail and perfect clarity: IPv4 - How long have we got? (The full archive is available at http://www.potaroo.net/papers.html.)

Geoff looks at three steps in the address usage process:

Allocation of a /8 from IANA to a Regional Internet Registry
Assignment of address space from a RIR to someone who asks for it (usually an ISP)
The addresses showing up in the global BGP routing table

It looks like the free IANA space is going to run out in 2019. But the RIRs hold a lot of address space in pools of their own, if we include this the critical date becomes 2026. Initially, projections of BGP announcements would indicate that all the regularly available address space would be announced in 2027, and if we include the class E (240.0.0.0 and higher) address space a year later. However, Geoff didn't stop there. After massaging the data, his conclusion was that the growth in BGP announcements doesn't seem exponential after all, but linear. With the surprising result:

"Re-introducing the held unannounced space into the routing system over the coming years would extend this point by a further decade, prolonging the useable lifetime of the unallocated draw pool until 2038 - 2045."

Now of course there are lots of disclaimers: whatever happened in the past isn't guaranteed to happen in the future, that kind of thing. This goes double for the BGP data, as this extrapolation is only based on three years of data. But still, many people were pretty shocked. It was a good thing this was the last presentation of the day, because there were soon lines at the microphones.

So what gives? Ten years ago the projections indicated that the IPv4 address space would be depleted by 2005. Now, an internet boom and large scale adoption of always-on internet access later, it isn't going to be another couple of years, but four decades? Seems unlikely. Obviously CIDR, VLSM, NAT and ethernet switching that allows much larger subnets have all slowed down address consumption. But I think there are some other factors that have been overlooked so far. One of those is that some of the old assignments (such as entire class A networks) are being used up right now. For instance, AT&T Worldnet holds 12.0.0.0/8, but essentially this space is used much like a RIR block, ranges are further assigned to end-users. We're probably also seeing big blocks of address space disappearing from the global routing table because announcing such a block invites too much worm scanning traffic. On the other hand there are also reports from "ISPs" that assign private address space to their customers and use NAT. So I guess there is a margin of error in both directions.

The the same time, the argument can be made that for all intents and purposes the IPv4 address space has already run out: it's way too hard to get the address space you need (let alone want). This is in line with what Alain Durand and Christian Huitema explain in RFC 3194. They argue that the logarithm of the number of actually used addresses divided by the logarithm of the number of usable addresses (the HD ratio) represents a pain level: below a ratio of 80% there is little or no pain, trouble starts at 85% and 87% represents a practical maximum. For IPv4 that would be 211 million addresses used (note that in the RFC the number is 240 million, but this is based on the full 32 bits while a little over an eighth of that isn't usable).

According to the latest Internet Domain Survey we're now at 171 million. This is counting the number of hosts that have a name in the reverse DNS, so the real number is probably higher.

I think RFC 3194 is on the right track but rather than simply do a log over the size of the address space, what we should look at is the number and flexibility of aggregation boundaries. In the RFC phone numbers are cited. Those have one aggregation boundary: area code vs local number, with a factor 10 flexibility. This means wasting a factor 10 (worst case) once. Classful IPv4 also had a single boundary, but the jumps are 8 bits, so a waste of a factor 256. With classless IPv4 we have more boundaries, but they're only one bit most of the time: IANA->RIRs, RIR->ISPs, ISP->customers and subnet. That's four times a factor two, so a factor 16 in total. That means we can use 3.7 billion addresses / 16 = 231 million IPv4 addresses without pain. Hm... But we can collapse some boundaries to achieve better utilization.

Permalink - posted 2003-09-16

older posts - newer posts