Gokul Blog — A conversation on VoIP, IMS, Cisco and Just about Anything

Deeper analysis of VoIP

Performance Tuning Part II – Network Strikes Back

Posted by tggokul on November 16, 2006

This is in continuation of my Performance Tuning experience. In the first part The Beginning I had stopped at the juncture where we were losing packets when I ran the test.

This is where sipp’s usability was a big plus. sipp has got a very good reporting mechanism as to how many Requests (INVITE, ACK, BYE) and responses ( Trying, Ringing, 200 OK, 200 OK for BYE) was being sent and received. We had a primitive CLI to see the number of calls, status of calls etc in the SIP Proxy and with the combination of both we were able to deduce that the packets that were being sent by any of the entities in this test ( sipp caller/callee and proxy) were getting lost sometimes. This was seen at even 2 caps after some time. So our initial benchmark  was at best 2 CAPS 😦

After endless hours of googling, I figured out it has something to do with the txqueuelength which is configured per interface on Linux. This value was set at 10 which means at any given time only 10 packets could be queued up and the rest would be dropped. Let me do a quick Math with you guys.

For one call to be setup/breakdown there are 7 messages being sent by the Proxy ( INVITE forwarding, sending 100 Trying, Forwarding 180 Ringing, Forwarding 200 OK, forwarding the ACK, forwarding the BYE, forwarding the 200 OK for BYE). The call duration was zero (no holding time. The impact of this I will delve further in the Garbage Collection section which I hope to write soon). So per second there are 14 SIP messages ( our testing was 2 CAPS) .  Now don’t think this would lead to a loss of 4 packets every second. That will not happen because the actual queuing and sending by the TCP/IP stack is in milli seconds and only sometimes ( when there is thread contention) will there be 14 packets in the queue. It is at that time that packets were getting lost. Voila, was I jubiliant or what. These were the times I used to yelp in joy and get curious looks from my colleagues.

I went ahead and increased the txqueuelength to 50000 and things started working like a charm. I increased the load to nearly 10 cps and still no problems. CPU utilization was high at around 40% but I ran it for nearly 1 hour and all calls were successful ( sipp reports if there are any failed calls). So I decided to up the ante and increased it to 30 caps. Duh!!! We were losing packets again. I did the same math explained above and there was no way that the packets in the sent queue could not be more than 50,000..

Even when I was working on the previous issue, I had a hunch that  the receive queue might get filled up ( this might get filled up if your application is whacked up as well. This I will explain in the Multi-Threading section). But I needed some proof for that. Time for Mr.netstat to get into action.

The netstat command can be run to see how many packets are waiting to be read at any port. For example if I want to monitor port 5060 a netstat -an | grep 5060 would give you the number of packets that are waiting to be sent at that port ( The -t option gives continuous ouputs). If everything was working just fine, this value has to be zero or at max 1. ( As I said before, the recv/send functionality of TCP/IP stack is very fast). I looked at this output and I saw values going above 65535 which meant that there was an overflow ( The queue can hold upto 65535 packets). So the receive buffer was overflowing this time. I went ahead and increased the receive buffer size.

Started the test ( I might have reached 1000 restarts by now) and there were no overflows in the send/recv buffer even at 50 CAPS. But the CPU utlilization was dangerously close to 92% which pretty much meant that the networking part was ok and the onus was now on our application and the way we had written it. The next section I will talk about the Threading mechanism we were using, the drawbacks of that model and how  we circumvented that.

If you guys want to know more about the actual UDP tuning configuations and the optimal values for each of the parameters, I can send you the sysctl.conf file. Just ask for it.

5 Responses to “Performance Tuning Part II – Network Strikes Back”

  1. Deepansh Agrawal said

    This is good stuff, I’m an undergrad, and am working on stuff similar to this. It was very informative for me. I would be grateful if you could send me the sysctl.conf file you used.

  2. Deepansh Agrawal said

    So .. when’s the next part coming out … or is it at all 😉

  3. tggokul said

    Well,

    I have enough information for couple more of these. But I just don’t get the time these days to write in detail. Will try though.

    Gokul

  4. Mr.Anderson said

    I just came from google looking for information about tweaking the linux networking stack (especially in kernel 2.6 and specifically with Fedora). I am hoping you see this comment and are still willing to share your sysctl.conf optimizations because I am trying to do the same! Were they just buffer increases? Did you try prioritization with iptables (or the like)? Anyway, if you see this I would really appreciate any information you are willing to impart as I am having a hard time with this one. Thanks.

  5. tggokul said

    Hi Mr.Anderson,

    I have sent that file to your email id. See whether it helps. And oh, if your name is reference to Matrix, I think you need to take a red pill before you can access the file 🙂

    Gokul

Leave a comment