topics: BGP / IPv6 / more · settings · b&w · my business: inet⁶ consult · Twitter · Mastodon · LinkedIn · email · 🇺🇸 🇳🇱

These are all posts about BGP, including those originally published on BGPexpert.com.

April fools day RFCs

► April fools day is coming up again! Don't let it catch you by surprise. Over the years, a number of RFCs have been published on April first, such as...

Full article / permalink - posted 2002-04-01

TCP high performance and maximum usable bandwidth

The second half of Februari saw two main topics on the NANOG list: DS3 performance and satellite latency. The long round trip times for satellite connections wreak havoc on TCP performance. In order to be able to utilize the available bandwidth, TCP needs to keep sending data without waiting for an acknowledgment for at least a full round trip time. Or in other words: TCP performance is limited to the window size multiplied by the round trip time. The TCP window (amount of data TCP will send before stopping and waiting for an acknowledgment) is limited by two factors: the send buffer on the sending system and the 16 bit window size field in the TCP header. So on a 600 ms RTT satellite link the maximum TCP performance is limited to 107 kilobytes per second (850 kbps) by the size of the header field, and if a sender uses a 16 kilobyte buffer (a fairly common size) this drops to as little as 27 kilobytes per second (215 kbps). Because of the TCP slow start mechanism, it takes several seconds to reach this speed as well. Fortunately, RFC 1323, TCP Extensions for High Performance introduces a "window scale" option to increase the TCP window to a maximum of 1 GB, if both ends of the connection allocate enough buffer space.

The other subject that received a lot of attention, the maximum usable bandwidth of a DS3/T3 line, is also related to TCP performance. When the line gets close to being fully utilized, short data bursts (which are very common in IP) will fill up the send queue. When the queue is full, additional incoming packets are discarded. This is called a "tail drop". If the TCP session which loses a packet doesn't support "fast retransmit", or if several packets from the same session are dropped, this TCP session will go into "slow start" and slow down a lot. This often happens to several TCP sessions at the same time, so those now all perform slow start at the same time. So they all reach the point where the line can't handle the traffic load at the same time, and another small burst will trigger another round of tail drops.

A possible solution is to use Random Early Detect (RED) queuing rather than First In, First Out (FIFO). RED will start dropping more and more packets as the queue fills up, to trigger TCP congestion avoidance and slow down the TCP sessions more gently. But this only works if there aren't (m)any tail drops, which is unlikely if there is only limited buffer space. Unfortunately, Cisco uses a default queue size of 40 packets. Queuing theory tells us this queue will be filled entirely (on average) at 97% line utilization. So at 97%, even a one packet burst will result in a tail drop. The solution is to increase the queue size, in addition to enabling RED. On a Cisco:

interface ATM0
random-detect
hold-queue 500 out

This gives RED the opportunity to start dropping individual packets long before the queue fills up entirely and tail drops occur. The price is a somewhat longer queuing delay. At 99% utilization, there will be an average of 98 packets in the queue, but at 45 Mbps this will only introduce a delay of 9 ms.

Permalink - posted 2002-03-31

BGP and geography

CAIDA, the Cooperative Association for Internet Data Analysis, has an interesting page with geographical, demographical and many BGP-related statistics per continent and country. As expected, South America, Africa and Asia hardly show up, but there are some surprises. For instance, the US uses more than three times as many address as the whole of Europe.

Permalink - posted 2002-03-31

Radware Peerdirector

It seems more and more products that try to optimize BGP are reaching the market. One is the Radware Peerdirector. There is a whitepaper (PDF) about how the Peerdirector works on the site. This product monitors bandwidth use and reconfigures routers to change prefix advertisements to different peers to optimize how the available bandwidth is used.

Of course, such a product doesn't do anything a network manager can't do manually. However, periodically changing information in BGP introduces instabilities into the global routing system, which is undesirable at a minimum, and may even pose real harm under certain conditions.

Permalink - posted 2002-03-30

Wide scale SNMP vulnerabilities

On February 12th, CERT published "CERTxae Advisory CA-2 002-03 Multiple Vulnerabilities in Many Implementations of the Simple Network Ma nagement Protocol (SNMP)".

Details haven't been published yet, but it seems it is possible to do all kinds of bad things by firing off non-spec SNMPv1 packets to boxes from many vendors.

Cisco has a security advisory about the problem. Cisco has a bad track record when SNMP security is concerned: in older IOS versions there were "hidden" SNMP communities that enabled pretty much anyone to manage the router. It seems this problem has resurfaced in another form: when you create a trap community, this automatically enables processing of incoming SNMP messages for this community, even though this community doesn't provide read or write access. However, this is enough to open the router for denial of service attacks. It is possible to apply an access list to the trap community, but this depends on the order in which the configuration is processed, so it will not survive a reboot.

The only way to be completely secure is to turn off SNMPv1 or filter incoming SNMP packets on the interfaces rather than at the time of SNMP processing. (Remember, this is UDP so the source addresses are easily spoofed.) Upgrading your IOS software image will also do the trick, as soon as they are available. Consult a certified Cisco IOS version specialist to help you find the right one (more than half of the advisory consists of a list of IOS versions).

Permalink - posted 2002-03-30

Verio route filtering policy + NAT&DNS failover

The weeks from September 23 to October 7 saw two heated discussions on the NANOG list. The first one was on filtering BGP routes. It all started with some remarks about Verio's peering filter policy but the discussion became more general after some days, including related topics such as "sub-basement multihoming".

The second discussion was about (ab)use of the Domain Name System for failover (in combination with a NAT box) and load balancing. This discussion seems to be somewhat religious: some people think there is no problem, others nearly start to foam at the mouth just thinking about it.

Permalink - posted 2001-12-31

older posts - newer posts

RSS feed

Archives: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2010, 2011, 2013, 2014, 2015, 2016, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025