Gokul Blog — A conversation on VoIP, IMS, Cisco and Just about Anything

Deeper analysis of VoIP

Performance Tuning. The Beginning – Part I

Posted by tggokul on November 16, 2006

As part of my last assignment I was asked to benchmark/performance tune a SIP Proxy ( written in Core java) and arguably this was the best learning experience I had in years. It changed me so much  that I had to do everything faster than I normally used to do. Example, if it was taking me 30 minutes to come to work, I was always looking for ways to do it in 25 mins or lesser. Needless to say it was driving everybody around me crazy.

When I got this assignment, we had never done any kind of significant load testing on the proxy and all the information I had was ‘It has been running in customer place for 10 days and has not crashed’. The customer in question must have had like maybe a BHCA ( Busy Hour Call Attempts) of 2000-3000.

Before I get into the specifics of the actual Load testing, a small note on the SIP Proxy. As mentioned above it was written in core java loosely based on JAIN SIP developed by NIST . At the time I started this testing, JAIN SIP had been load tested to provide 90 caps (324000 BHCA).

So I started the mission without any huge expectations.

Luckily though unknowlingly, I had followed the first cardinal rule in performance benchmarking, which is you should always do benchmarking with an open mind and no expectations, because otherwise it might lead to unwittingly skewing up the tests to get better results. This rule is even more important when you are a developer ( and not a tester). Developers are known to screw up any kind of testing.

It was then I came across the greatest open source tool for VoIP( ok, I  know this is going to start a controversy. But I stand my ground on this). sipp, the testing tool ( for protocol conformance as well as performance testing) developed by HP. I am sure everbody who has worked in SIP must have used sipp at some point or another. It is a fantastic tool. Easy to run, gives the best damn results and you can be rest assured that if the testing fails, it is because of your SIP entity ( Proxy/UA,Registrar) and not sipp’s fault. Setting up the test bed was easy.( like 5 mins. In an entirely different vein, once a test bed is setup and a bug is seen or reproduced, any developer worth his salt would be able to fix the problem in two hours. It is always the reproduction of the bug that is most difficult). The testbed consisted of one sipp behaving like a caller, one sipp acting as the callee and our SIP Proxy between them.

Now sipp callee can act as either an actual endpoint or as a gateway.  So we don’t have to register sipp as a client in our Registrar and they can act as an internal gateway ( for the caller) and termination gateway ( for the callee). So this is how the test started. sipp caller will make 2 caps (7200 BHCA) and we would see how it goes and take it on from there.

We used two machines. sipp caller/callee were run on one single machine and the SIP proxy was run on another one. Both of these servers were running Red Hat 8.0, single processor machines with a speed of 2.4Ghz, 1 GB RAM.

The first time I ran this test, CPU shot upto 46% for just 2 CAPS. Oops… I looked at the configuration and saw the log level was set at a very high level and turned it down to something lower and started the test. Oops again. Looks like even though we had a Log level scheme, some of the developers (damn them) had not used it and were writing the logs at the lowest level. So the first exercice I did (I was the only one in this assignment. Later when we started doing amazing numbers, two other people were assigned to this task) was changing all the Debug statements accordingly. It was then that I  noticed some performance degrading code . The Debug statements were java static functions in the class Debug.java which looked in the following way

/* Debug.java */

int DEFAULT_LOG_LEVEL = 7; /* This can be changed when the proxy is running */

public static print(int loglevel, String str) {

                          if(loglevel < DEFAULT_LOGLEVEL)

                                        System.out.println(str);

}

So everywhere we wanted to print logs we were making a call to Debug.print(3,”hello”) and inside this function we were checking whether this needs to printed or not. Function calls take a longer time ( because of stack pointer manipulation) and the better way to do this is to check whether it needs to be printed and then call this function like say

                                               if( DEFAULT_LEVEL > 3)

                                                         Debug.print(“hello”);

But then this was benchmarking time and not enhancement time. So I just noted down this as something that needs to be done and I went ahead benchmarking the same.

After reducing the log, CPU utilization went down but after a few minutes of starting the test ( I must have run restarted the test close to 5000 times in the this whole process) it was noticed that packets were getting lost ( Our SIP Proxy used UDP as transport) and this was the next thing I needed to fix. What could be the problem?

The whole process of benchmarking/improve was close to 4 months and I can’t write all my findings in one blog. So this series will be divided into sub-parts and I shall write more about this in the coming days. In the subsequent parts I will be writing about Network latency, UDP/TCP stacks in different OS, threading model which improves/affects performance, java performance techniques and the greatest of them the Java Garbage collection. Trust me Garbage collection will be the most important factor when it comes to high speed, high throughput java applications and I hope to write it in steps all the pain I went through and things you can avoid. Sounds interesting?  Check in here regularly.

5 Responses to “Performance Tuning. The Beginning – Part I”

  1. Madhu said

    Hey,

    Didn’t know that it involved so long a process 🙂

    Will wait for the rest of it – I can also slip in some technical jargons to customers after this 😉

  2. […] This is in continuation of my Performance Tuning experience. In the first part The Beginning I had stopped with us losing packets when I ran the test. […]

  3. […] There was a complaint that part 1 and part 2 of my blogs on Performance were too long. Some of my usual readers said that a long article is fine as long as it isn’t too technical. If the article is technical, then the best bet would be to have a smaller sized blog. But then I couldn’t split my posts lest it affects continuity. […]

  4. […] 1) Performance ( blogs here and here ). This kind of makes me feel bad since I have still not got around to my subsequent experiences in Multi Threading and Java optimizations. I promise to do that in the next few days when I get some free time. 2) Cisco – Anything related to Cisco these days seems to raise a lot of interest. My blogs just rode that wave. 3) VoIP/IMS – Say SIP and people are ready to take a dip ( blogs here) […]

  5. madhusudhan said

    I want know key performance indicator (KPI) for sipp tool

Leave a comment