daddy, the internet is slow today
February 3, 2012 8:09 AM   Subscribe

Sometimes, adding bandwidth can actually hurt rather than help. Most people have no idea what they can do about bufferbloat.
posted by DU (25 comments total) 26 users marked this as a favorite

You probably want to start by reading Bufferbloat: Dark Buffers in the Internet, Gettys' recent overview paper for CACM.
posted by Nelson at 8:12 AM on February 3, 2012

This is really cool, I do this stuff for a living, and this is the first I've heard of it.
posted by empath at 8:14 AM on February 3, 2012

I don't do this stuff for a living and it was the first I've heard it explained, but I'm not sure I totally got the concept.

My main takeaway (in itself very valuable) was probably on how to pronounce "Gettys". I've been saying "get-eez".
posted by DU at 8:32 AM on February 3, 2012

I'm not going to lie, I'm a bit jealous of anyone who can get the CTO of Comcast (Rich Woundy), not to mention Vint Cerf, working on their my-cable-modem-is-slow problem...

But fascinating results. I've noticed the same behavior that he describes (as has anyone who has started a big download or upload and suddenly lost all other connectivity), but wasn't sure of the root cause -- it seemed like it shouldn't work that way, but I was at a loss as to how to correct it, aside from limiting the offending transfer's bandwidth until it was below the apparent threshold where it blocked everything else. That the blocking is due to full buffers 'tail-dropping' other packets makes perfect sense.
posted by Kadin2048 at 8:38 AM on February 3, 2012

Okay, just watched the video and read the paper. The main problem is this, summarized in a few sentences:

TCP adjusts its speed based on available bandwidth. It does this by looking at packets as they come in to see if there are any missing in the stream. If packets are dropped, then it starts slowing transmission speed to compensate. When buffers get filled, it can take up to a second or more before the packets reach their destination, which means that it takes longer for tcp to find out that packets are being dropped and to adjust its speed accordingly.

In addition, since the buffers prevent packets being dropped, tcp can't determine path congestion, so the buffers themselves become part of the path the tcp protocol is trying to measure. The end result being that the buffers are filled more or less permanently, causing severe ongoing latency problems.
posted by empath at 8:45 AM on February 3, 2012 [14 favorites]

Nice summary, empath. A related problem the CACM paper talks about is that none of the more advanced TCP queuing algorithms for signaling congestion have ever really been deployed. Something like Random Early Detection (RED) allows a congested node to signal congestion even with a big buffer; basically you drop packets in your buffer, you don't wait for the buffer to fill up. RED got a bad reputation early on and then everyone just sort of moved on, a lot of what Gettys is arguing is that it's time to reconsider that.

Practical takeaway for casual users of the Internet: you may get better performance on your home Internet if your router had a smaller buffer. Sadly you don't really have a way to implement that right now, but maybe Gettys' work will get some attention to the problem.
posted by Nelson at 8:50 AM on February 3, 2012 [1 favorite]

I work for a big ISP and can make config changes on edge routers, so I sent him an email asking what we can/should do about it. It seems most of the problem is with CPE equipment not edge stuff, though...
posted by empath at 8:51 AM on February 3, 2012 [1 favorite]

Their ICSI Netalyzer is a good "WTF is up with my ISP" tool. Bufferboat is like the inverse of the high performance transfer problem, there you can have a 1Gbps dedicated circuit half way around the world with >1s round trip time and not be able to fill up the pipe because the TCP buffers on the end points aren't big enough.

Buffering is suffering has been brought up in every network conference I've been to for the past five years or so, hope something finally gets worked out.
posted by zengargoyle at 9:07 AM on February 3, 2012 [2 favorites]

Additionally, TCP connections won't get any data loss due to latency from this problem (or even dropped packets), because TCP is 'connection-oriented' and retransmits dropped packets. So that means that generally, web pages will load eventually, files will be transferred, etc.

However, UDP is 'connectionless' (or 'spray and pray'). It doesn't verify that all the packets get there at all, which is useless for transferring files, but ideal for real time applications like voice and games. UDP uses less bandwidth because it doesn't have the extra overhead to track dropped packets.

What this means is that the constantly-filled buffers cause huge problems with voice. If the buffer is consistently full, you get latency, which causes long delays while talking and you end up talking all over each other or having long uncomfortable silences, or it can cause severe echo.

If the buffer isn't consistently full, you'll get severe 'jitter', which means the packets get there in bursts or out of order, which means that you'll have voice being choppy or cutting in an out (an extremely common problem with home voip connections).

So what this means for the end user (especially if you're running voip for a small business):

1) Always enable QOS on your router, and cap it to about 25% slower than your average upload speed test result. (I used to recommend this to our voip customers all the time when I worked for a hosted PBX, because I knew it worked, but I was never quite sure why you had to set the bandwidth so low until I read this article)

2) Make sure your wireless speed is faster than your download speed on your home connection.
posted by empath at 9:09 AM on February 3, 2012 [3 favorites]

Do consumer routers even have QoS options? I admit it's been years since I bought a router; I've got a carefully hoarded pile of WRT54GLs loaded with real router firmware just so I have some hope of decent networking.
posted by Nelson at 10:58 AM on February 3, 2012

On the firmware that came with my WRT54G, QoS options are under the Applications & Gaming tab. I've never touched them, but am wondering if I should. Multiple times per day I have to restart my router (this post got me to look up how to do it directly from a connected computer, which is nice. Haven't had to try it yet, but it's apparently for future reference) and this was the case in both my Time Warner apartments, but not in the Cablevision apartment I had for a few years in between. When it goes bad, sometimes a Skype call will still work but HTTP stalls out, which makes me think I might be able to deal with it through QoS settings? One day I'll form this into a sensible AskMe post.
posted by nobody at 12:01 PM on February 3, 2012

Yes, you absolutely should, if you do any voip or gaming, especially.

Just run speed test a few times. Take your average upload speed result, reduce it by 20-25% and use that as your upload cap on the qos settings.

It works miracles for ping times, voip and gaming.
posted by empath at 12:33 PM on February 3, 2012

Thanks! Just the kick in the pants I needed to actually do it.
posted by nobody at 12:48 PM on February 3, 2012

You can play around with the cap a bit. 75% is just a rough estimate...
posted by empath at 12:53 PM on February 3, 2012

If the fix is that simple why doesn't someone implement a 'run bandwidth test and automate optimal settings' button? Is there something that makes it hard, or just not a problem that most people think of?
posted by Canageek at 3:08 PM on February 3, 2012

Lots of routers do exactly that.
posted by empath at 3:10 PM on February 3, 2012

I analyze stuff like this for a living, I always called it "network engineering", but bufferbloat is a nice term for non-techies.

There are tools on the market to help identify and fix this very problem, look up Application Performance Monitoring. Tools like Extrahop, SuperAgent, Reporter Analyzer. It would be cool to have access to low-end home versions of those tools huh?

Do home routers have spanning (port-mirroring) capabilities?
posted by roboton666 at 5:07 PM on February 3, 2012

Another problem is the actual data throughput of your wireless connection. Depending on how many devices associate, (ipad, ipod,wii,laptop,glasses etc...), your actual throughput will go down. It's just due to the physics of 802.11. You may be connected at "54" Mbps, but only getting 128 Kbps per TCP stream (maybe even less).

The actual throughput depends, but once you get 5 or more devices on a home wireless router, things go downhill very fast.
posted by roboton666 at 5:11 PM on February 3, 2012

So, I'm designing a flow control algorithm right now, so this is interesting.

Our problem in a nutshell is we don't want to depend much on the intervening network, because bulk networking is optimized for doing thing fast and not right. When we try to make intermediary networks do things right (like in multicast) it ends in tears.

The problem is that we've been using the lightest of light signals -- packet drops -- to inform us about the appropriate rate of transmission for packet flows. But packets aren't necessarily dropping; they're being absorbed in larger and larger buffers. It's common on 3G networks to see pings with 4000ms lag; in that time, stacks are retransmitting, which is filling networks with even more traffic, which is increasing lag even further...

It's ugly. It's going to be interesting to see how we fix it.
posted by effugas at 6:36 PM on February 3, 2012

(Why a new flow control algo? It's a much different game when you're flooding across the entire net.)
posted by effugas at 6:37 PM on February 3, 2012

Effugas, what do you think about WAN acceleration and TCP optimization?

As a network engineer I find myself working higher and higher on the stack with each passing year. I'm becoming more and more integrated into the developers groups and finding myself getting involved in projects extremely early on in the life cycle.

It will be interesting to see how it all shakes out.
posted by roboton666 at 6:56 PM on February 3, 2012

From the other end, application engineers at places like Google get lower and lower in the stack. Even for my small projects I find it's helpful to understand stuff all the way from the high end of application caching and database access down to the low level of TCP round trips.

Related: Firefox 11 includes SPDY, Google's improved HTTP that allows multiple requests to effectively share a single TCP link. That should help with a variety of bufferbloat and congestion related problems. Back in the early days nerds used to debate whether it was "fair" for a Web browser to open 2 sockets; now it's customary to have 8+ for a single web page, but it doesn't really work great with TCP.
posted by Nelson at 7:02 PM on February 3, 2012

Nelson: this is exactly what I mean when I talk about getting higher up the stack, considering how layer 7 interacts with layer 3, and how we can optimize our network queues and buffers to ensure optimal delivery.
posted by roboton666 at 7:19 PM on February 3, 2012

I've been working in networking for something like 18 years, and when I read Getty's blog I feel like a stupid noob. I mean that in the best most flattering way possible.
posted by rmd1023 at 3:27 PM on February 4, 2012

Google has a number of proposals for how to modify TCP, including increasing the initial window to 10, and implementing TCP fast open. By some accounts they've already been doing this with their own servers for some time.
posted by Rhomboid at 3:26 AM on February 5, 2012

« Older Competitive eater, Takeru Kobayashi, set a new rec...  |  In Anoka, Minnesota, nearly te... Newer »

This thread has been archived and is closed to new comments