Path MTU Discovery problems

If your network has a link with an MTU that's smaller than 1500 bytes in the middle, you're in trouble. It's not the first time this came up on the NANOG list and it won't be the last.

In order to avoid wasting resources by either sending packets that are smaller than the maximum supported by the network or sending packets that are so large they must be fragmented, hosts implement Path MTU Discovery (PMTUD). By assuming a large packet size and simply transmitting them with the don't fragment (DF) bit set (in IPv4, in IPv6 DF is implied) and listening for ICMP messages that say the packet is too big, hosts can quickly determine the lowest Maximum Transmission Unit (MTU) that's in effect on a certain link.

Most of the time, that is. In RFC 1191 it is suggested that hosts quickly react to a changing path MTU. So implementors decided to simply set the DF bit on ALL packets. At the same time, many people are very suspicious of ICMP packets since they can be used in denial of service attacks or to uncover information about a network. So to be on the safe side a significant number of people filters all ICMP messages. Or routers are configured in such a way that the ICMP packet too big messages aren't generated or can't make it back to the source host. NAT really doesn't help in this regard either.

So what happens when all packets have DF set and there are no ICMP packet too big messages? Right: nothing. Since the first few packets in a session are typically small session get set up without problems but as soon as the data transfer starts the session times out. So what can we do?

The real solution would be that TCP implementators clean up their act and stop depending on ICMP messages that may or may not be generated somewhere along the way. Unfortunately, it looks like they think they don't have to do anything since:
People who configure firewalls and routers should make sure the ICMP packet too big messages are generated with correct IP addresses and passed along to the source host.

Unfortunately, this doesn't really solve the problem for a user who is behind some kind of tunneling mechanism that brings the MTU down below 1500 bytes, such as PPPoE, PPTP or GRE. Fortunately TCP has a Maximum Segment Size (MSS) option that is used to tell the other side the maximum size of the TCP segment (= without IP and TCP headers) it should send. When a host itself knows about the smaller MTU it adjusts the MSS so there is no problem. The trouble starts when there is a router connected to the network with the reduced MTU so the end hosts see the regular ethernet MTU. Many routers implement a feature where the MSS option in TCP session establishment packets is manipulated to avoid this problem. On Cisco routers this is a fairly recent feature that is enabled with the interface command ip tcp adjust-mss .... Unfortunately, it is not entirely clear what impact this feature has on fowarding performance.

RFC 2923 TCP Problems with Path MTU Discovery
The MSS Initiative
Cisco - Why Can't I Browse the Internet when Using a GRE Tunnel?

Permalink - posted 2003-05-19

RIPE DDoS

(Distributed) Denial of Service attacks continue to be a serious problem. (Well, for the victims, at least.) RIPE suffered an attack on februari 27th 2003 that almost wiped them off the net for two and a half hours.

Permalink - posted 2003-04-10

Bogon filtering

On august 7th 2002, Rob Thomas announced on his bogon list and a number of mailinglists that IANA had delegated the 69.0.0.0/8 address block to APNIC for further distribution to ISPs and end-user organizations in the Asia-Pacific region. The next day, APNIC made an announcement of their own.

Exactly seven months later, someone who had gotten addresses in the new range posted a message to the NANOG under the title "69/8...this sucks". He encountered problems reaching certain destinations from the new addresses and wrote a script to test this. It turned out a significant number of sites ("dozens") still filtered this range.

Further investigation uncovered that this wasn't so much a routing problem, but many firewall administrators also use the bogon list to create filters. And then subsequently fail to keep those filters up to date.

This sucks indeed.

Permalink - posted 2003-04-10

RIPE MS SQL worm analysis

RIPE has another analysis of the MS SQL worm. RIPE monitors performance between 49 locations on the net. 40% of the monitored pairs of hosts suffered from congestion between them, in 60% of the cases there weren't any problems. The problems were cleared up after about 8 hours.

RIPE also monitors root server performance and BGP activity. Two of the root servers suffered a good deal of packet loss. The BGP stuff is the most spectacular: there were about 30 to 60 times more updates of different kinds.

This clearly shows the need for control and data plane seperation: congestion in the actual traffic shouldn't be able to take down the routing protocols. On the other hand, having BGP and other routing protocols run "in-band" over the same circuits as the actual data makes sure there is a functioning path between two routers. There's also something to be said for that.

On NANOG there was some talk about UDP/1434 filters. I argued that they shouldn't be necessary any more by now, but the rate of reinfection (people bringing in new vulnerable boxes) remains significant. So places with Windows machines on the network will probably need to have these filters in place for the forseeable future. But this is annoying because they also block legitimate UDP traffic, such as DNS, once in a while, and many routers take a performance hit when these types of filters are enabled.

Permalink - posted 2003-02-27

BST: BGP Scalable Transport

A company called Packet Design has developed BGP Scalable Transport (BST), a transport protocol for BGP that is intended to replace TCP as the carrier potocol for BGP routing information. Packet Design isn't afraid to make bold claims; their press release about the protocol heads "Packet Design solves security, reliability problems of major internet routing protocol, BGP."

But BST really only addresses the problem that each BGP router in an organization is required to talk to every other BGP router. This gets out of hand very fast in larger networks. But two solutions have been around for years: route reflectors and confederations. When a BGP router is configured to be a route reflector, it checks for routing loops so it can safely "reflect" routing information from one router to another, eliminating the need to have every two routers communicate directly. Confederations break up large Autonomous Systems (ASes) into smaller ones and accomplish the same thing in a different way. Packet Design solves the problem by flooding BGP updates throughout the internal network, much the same way protocols such as OSPF and IS-IS do.

Packet Design claims that this will help with convergence speed and reliability and even security. Their reasoning is that using IPsec on lots of individual TCP sessions uses too much CPU so protecting BGP over TCP with IPsec is unfeasible. But even on a Pentium II @ 450 MHz with SHA-1 authentication you get 17 Mbps (see performance tests). Exchange of a full routing table takes about 8 MB and 1 minute = around 1 Mbps so with less than 17 peers receiving a full table the crypto can keep up without trouble. And that's even assuming internal BGP sessions need this level of protection. External BGP sessions are a more natural candidate for this but eBGP doesn't have the same scalability problem as iBGP. It also assumes the security problems BGP has can be fixed at the TCP level. However, the most important BGP vulnerability is that there is no scalable and reliable way to check whether the origin of a BGP route is allowed to send out the route in question to begin with and whether any information was changed en route (by on otherwise legitimate intermediate router). These issues are addressed by soBGP and S-BGP.

Permalink - posted 2003-02-19

BIRD Internet Routing Daemon

The Faculty of Math and Physics of the Charles University in Prague has created BGP capable routing software for Unix machines released under the GNU General Public License. The BIRD Internet Routing Daemon supports multiple tables with BGP and RIP for both IPv4 and IPv6 and OSPF for IPv4.

Permalink - posted 2003-02-16

older posts - newer posts