This is in continuation of my Performance Tuning experience. In the first part The Beginning I had stopped at the juncture where we were losing packets when I ran the test.
This is where sipp’s usability was a big plus. sipp has got a very good reporting mechanism as to how many Requests (INVITE, ACK, BYE) and responses ( Trying, Ringing, 200 OK, 200 OK for BYE) was being sent and received. We had a primitive CLI to see the number of calls, status of calls etc in the SIP Proxy and with the combination of both we were able to deduce that the packets that were being sent by any of the entities in this test ( sipp caller/callee and proxy) were getting lost sometimes. This was seen at even 2 caps after some time. So our initial benchmark was at best 2 CAPS 😦
After endless hours of googling, I figured out it has something to do with the txqueuelength which is configured per interface on Linux. This value was set at 10 which means at any given time only 10 packets could be queued up and the rest would be dropped. Let me do a quick Math with you guys.
For one call to be setup/breakdown there are 7 messages being sent by the Proxy ( INVITE forwarding, sending 100 Trying, Forwarding 180 Ringing, Forwarding 200 OK, forwarding the ACK, forwarding the BYE, forwarding the 200 OK for BYE). The call duration was zero (no holding time. The impact of this I will delve further in the Garbage Collection section which I hope to write soon). So per second there are 14 SIP messages ( our testing was 2 CAPS) . Now don’t think this would lead to a loss of 4 packets every second. That will not happen because the actual queuing and sending by the TCP/IP stack is in milli seconds and only sometimes ( when there is thread contention) will there be 14 packets in the queue. It is at that time that packets were getting lost. Voila, was I jubiliant or what. These were the times I used to yelp in joy and get curious looks from my colleagues.
I went ahead and increased the txqueuelength to 50000 and things started working like a charm. I increased the load to nearly 10 cps and still no problems. CPU utilization was high at around 40% but I ran it for nearly 1 hour and all calls were successful ( sipp reports if there are any failed calls). So I decided to up the ante and increased it to 30 caps. Duh!!! We were losing packets again. I did the same math explained above and there was no way that the packets in the sent queue could not be more than 50,000..
Even when I was working on the previous issue, I had a hunch that the receive queue might get filled up ( this might get filled up if your application is whacked up as well. This I will explain in the Multi-Threading section). But I needed some proof for that. Time for Mr.netstat to get into action.
The netstat command can be run to see how many packets are waiting to be read at any port. For example if I want to monitor port 5060 a netstat -an | grep 5060 would give you the number of packets that are waiting to be sent at that port ( The -t option gives continuous ouputs). If everything was working just fine, this value has to be zero or at max 1. ( As I said before, the recv/send functionality of TCP/IP stack is very fast). I looked at this output and I saw values going above 65535 which meant that there was an overflow ( The queue can hold upto 65535 packets). So the receive buffer was overflowing this time. I went ahead and increased the receive buffer size.
Started the test ( I might have reached 1000 restarts by now) and there were no overflows in the send/recv buffer even at 50 CAPS. But the CPU utlilization was dangerously close to 92% which pretty much meant that the networking part was ok and the onus was now on our application and the way we had written it. The next section I will talk about the Threading mechanism we were using, the drawbacks of that model and how we circumvented that.
If you guys want to know more about the actual UDP tuning configuations and the optimal values for each of the parameters, I can send you the sysctl.conf file. Just ask for it.