Gokul Blog — A conversation on VoIP, IMS, Cisco and Just about Anything

Deeper analysis of VoIP

Archive for the ‘Performance’ Category

Performance Tuning Part II – Network Strikes Back

Posted by tggokul on November 16, 2006

This is in continuation of my Performance Tuning experience. In the first part The Beginning I had stopped at the juncture where we were losing packets when I ran the test.

This is where sipp’s usability was a big plus. sipp has got a very good reporting mechanism as to how many Requests (INVITE, ACK, BYE) and responses ( Trying, Ringing, 200 OK, 200 OK for BYE) was being sent and received. We had a primitive CLI to see the number of calls, status of calls etc in the SIP Proxy and with the combination of both we were able to deduce that the packets that were being sent by any of the entities in this test ( sipp caller/callee and proxy) were getting lost sometimes. This was seen at even 2 caps after some time. So our initial benchmark  was at best 2 CAPS 😦

After endless hours of googling, I figured out it has something to do with the txqueuelength which is configured per interface on Linux. This value was set at 10 which means at any given time only 10 packets could be queued up and the rest would be dropped. Let me do a quick Math with you guys.

For one call to be setup/breakdown there are 7 messages being sent by the Proxy ( INVITE forwarding, sending 100 Trying, Forwarding 180 Ringing, Forwarding 200 OK, forwarding the ACK, forwarding the BYE, forwarding the 200 OK for BYE). The call duration was zero (no holding time. The impact of this I will delve further in the Garbage Collection section which I hope to write soon). So per second there are 14 SIP messages ( our testing was 2 CAPS) .  Now don’t think this would lead to a loss of 4 packets every second. That will not happen because the actual queuing and sending by the TCP/IP stack is in milli seconds and only sometimes ( when there is thread contention) will there be 14 packets in the queue. It is at that time that packets were getting lost. Voila, was I jubiliant or what. These were the times I used to yelp in joy and get curious looks from my colleagues.

I went ahead and increased the txqueuelength to 50000 and things started working like a charm. I increased the load to nearly 10 cps and still no problems. CPU utilization was high at around 40% but I ran it for nearly 1 hour and all calls were successful ( sipp reports if there are any failed calls). So I decided to up the ante and increased it to 30 caps. Duh!!! We were losing packets again. I did the same math explained above and there was no way that the packets in the sent queue could not be more than 50,000..

Even when I was working on the previous issue, I had a hunch that  the receive queue might get filled up ( this might get filled up if your application is whacked up as well. This I will explain in the Multi-Threading section). But I needed some proof for that. Time for Mr.netstat to get into action.

The netstat command can be run to see how many packets are waiting to be read at any port. For example if I want to monitor port 5060 a netstat -an | grep 5060 would give you the number of packets that are waiting to be sent at that port ( The -t option gives continuous ouputs). If everything was working just fine, this value has to be zero or at max 1. ( As I said before, the recv/send functionality of TCP/IP stack is very fast). I looked at this output and I saw values going above 65535 which meant that there was an overflow ( The queue can hold upto 65535 packets). So the receive buffer was overflowing this time. I went ahead and increased the receive buffer size.

Started the test ( I might have reached 1000 restarts by now) and there were no overflows in the send/recv buffer even at 50 CAPS. But the CPU utlilization was dangerously close to 92% which pretty much meant that the networking part was ok and the onus was now on our application and the way we had written it. The next section I will talk about the Threading mechanism we were using, the drawbacks of that model and how  we circumvented that.

If you guys want to know more about the actual UDP tuning configuations and the optimal values for each of the parameters, I can send you the sysctl.conf file. Just ask for it.

Posted in Performance | 5 Comments »

Performance Tuning. The Beginning – Part I

Posted by tggokul on November 16, 2006

As part of my last assignment I was asked to benchmark/performance tune a SIP Proxy ( written in Core java) and arguably this was the best learning experience I had in years. It changed me so much  that I had to do everything faster than I normally used to do. Example, if it was taking me 30 minutes to come to work, I was always looking for ways to do it in 25 mins or lesser. Needless to say it was driving everybody around me crazy.

When I got this assignment, we had never done any kind of significant load testing on the proxy and all the information I had was ‘It has been running in customer place for 10 days and has not crashed’. The customer in question must have had like maybe a BHCA ( Busy Hour Call Attempts) of 2000-3000.

Before I get into the specifics of the actual Load testing, a small note on the SIP Proxy. As mentioned above it was written in core java loosely based on JAIN SIP developed by NIST . At the time I started this testing, JAIN SIP had been load tested to provide 90 caps (324000 BHCA).

So I started the mission without any huge expectations.

Luckily though unknowlingly, I had followed the first cardinal rule in performance benchmarking, which is you should always do benchmarking with an open mind and no expectations, because otherwise it might lead to unwittingly skewing up the tests to get better results. This rule is even more important when you are a developer ( and not a tester). Developers are known to screw up any kind of testing.

It was then I came across the greatest open source tool for VoIP( ok, I  know this is going to start a controversy. But I stand my ground on this). sipp, the testing tool ( for protocol conformance as well as performance testing) developed by HP. I am sure everbody who has worked in SIP must have used sipp at some point or another. It is a fantastic tool. Easy to run, gives the best damn results and you can be rest assured that if the testing fails, it is because of your SIP entity ( Proxy/UA,Registrar) and not sipp’s fault. Setting up the test bed was easy.( like 5 mins. In an entirely different vein, once a test bed is setup and a bug is seen or reproduced, any developer worth his salt would be able to fix the problem in two hours. It is always the reproduction of the bug that is most difficult). The testbed consisted of one sipp behaving like a caller, one sipp acting as the callee and our SIP Proxy between them.

Now sipp callee can act as either an actual endpoint or as a gateway.  So we don’t have to register sipp as a client in our Registrar and they can act as an internal gateway ( for the caller) and termination gateway ( for the callee). So this is how the test started. sipp caller will make 2 caps (7200 BHCA) and we would see how it goes and take it on from there.

We used two machines. sipp caller/callee were run on one single machine and the SIP proxy was run on another one. Both of these servers were running Red Hat 8.0, single processor machines with a speed of 2.4Ghz, 1 GB RAM.

The first time I ran this test, CPU shot upto 46% for just 2 CAPS. Oops… I looked at the configuration and saw the log level was set at a very high level and turned it down to something lower and started the test. Oops again. Looks like even though we had a Log level scheme, some of the developers (damn them) had not used it and were writing the logs at the lowest level. So the first exercice I did (I was the only one in this assignment. Later when we started doing amazing numbers, two other people were assigned to this task) was changing all the Debug statements accordingly. It was then that I  noticed some performance degrading code . The Debug statements were java static functions in the class Debug.java which looked in the following way

/* Debug.java */

int DEFAULT_LOG_LEVEL = 7; /* This can be changed when the proxy is running */

public static print(int loglevel, String str) {

                          if(loglevel < DEFAULT_LOGLEVEL)

                                        System.out.println(str);

}

So everywhere we wanted to print logs we were making a call to Debug.print(3,”hello”) and inside this function we were checking whether this needs to printed or not. Function calls take a longer time ( because of stack pointer manipulation) and the better way to do this is to check whether it needs to be printed and then call this function like say

                                               if( DEFAULT_LEVEL > 3)

                                                         Debug.print(“hello”);

But then this was benchmarking time and not enhancement time. So I just noted down this as something that needs to be done and I went ahead benchmarking the same.

After reducing the log, CPU utilization went down but after a few minutes of starting the test ( I must have run restarted the test close to 5000 times in the this whole process) it was noticed that packets were getting lost ( Our SIP Proxy used UDP as transport) and this was the next thing I needed to fix. What could be the problem?

The whole process of benchmarking/improve was close to 4 months and I can’t write all my findings in one blog. So this series will be divided into sub-parts and I shall write more about this in the coming days. In the subsequent parts I will be writing about Network latency, UDP/TCP stacks in different OS, threading model which improves/affects performance, java performance techniques and the greatest of them the Java Garbage collection. Trust me Garbage collection will be the most important factor when it comes to high speed, high throughput java applications and I hope to write it in steps all the pain I went through and things you can avoid. Sounds interesting?  Check in here regularly.

Posted in Performance | 5 Comments »