Jose Sandoval Google
 Resume     Software     Writing     Drawings     Photos     Home Search WWW Search josesandoval.com


"Why is this application so slow?"

-- DRAFT - REVISION PENDING --


very Software Developer (or Engineer whichever term you prefer) building large scale software has come face to face with the question: "Why is it so slow?" This question could not be more vague. What does "slow" mean? Slow in comparison to what?

This entry is about performance tunning and how to solve this elusive "slow" problem.

Now a days, performance tuning is even harder to implement in our convoluted N-tier systems. We build N-tier application, which run on more than 2 Application servers to better serve the users of our applications. The abstraction that takes place to serve one single report into a users web browser, is, but a work of art in my opinion.

The facility to expand and load balanced these servers is amazing. We can prepare for increase in user operation load by splitting the resources into multiple servers in different tiers (This without touching our base code).

Needless to say that extendibility comes with a price measured in complexity. Complexity grows in every aspect of the building process:

  • Complexity of the Information System
  • Complexity of the Software Architecture
  • Complexity to measure performance
Following is an example where performance tuning paid off.

There is the famous saying of developers: "Build it so that it works first, and optimize later." Following this advice will get you out of trouble (almost) every time.

As I mentioned, our applications have grown in complexity and just making the whole system work is a challenge in itself and performance tuning is a different type of challenge. What do we need to optimize? Our systems are N-tiers system, and there is definitely something that can be fine tuned in of those Ns.

For example, this is a typical system for Business Intelligence analysis:
N-tier system generating reports to be distributed to thousands of users via thin client (Web browser).

This sentence alone, abstracts the whole architecture behind the whole solutions, which could be comprise of:
  • Oracle 9i database system
  • Brio On Demand Server
  • Brio Authentication Server
  • IIS Web Server
  • PlumTree portal Application
  • Brio Intelligence Report builder
  • IE web browser
  • Brio Intelligence Plugin
  • VBS scripting language

So, you are probably asking "All that to generate one report?"

Indeed, all that to generate one report. Of course, this reports are vital to any business and this one report needs to be distributed to tens of thousands of people. Sometimes hourly, daily, weekly, monthly, etc. And all this users view the same report almost at the same time. It's is pretty common to have a set of reports ready to view at the beginning of each week (Usually Monday mornings). As a tactical business head honcho, you are relying in the delivery of this reports so your co-workers make you more money. If one of this reports fails to generate, the bottom line suffers. So, yes. All that infrastructure is needed to server "one" report.

Note that the power of the Information Systems is not how many more machines and software components we can throw into the mix. The power is in the questions that are being answered by that "one" report.

So, now that the whole infrastructure is in place to serve the "one" report, one or two stake holders says: "The system is great and everything, but, but why is it so slow?"

If you are trying to solve this "slow" problem, first you need to understand what "slow" really means.

Assume the system is brand new, so the user has no concept of slow or fast. The user has only attention span gaps. To a normal computer user, 3 seconds waiting for results is long enough. More than that, and our user is lost in la-la land. There are studies supporting the statement I just made, this is nothing new. Jacob Nielsen has done some studies about it.

In our particular case, slow means, downloading a report through the internal portal (100 Mbps network) and taking over 3 minutes. Obviously, this won't do in the current setting: thousands of user viewing and interacting with the same report (It was a Brio Scorecard application).

Note that in a different situation, 3 minutes would have been acceptable - For example, downloading an image from an outer space satellite - 3 minutes is not long at all.

Back to our "slow" report. The report in question was 9 MB in size. It was designed and implemented to work first and the darn report kept growing and growing. There was no mistake made on the design nor implementation. The system was built to work first, but, now it was time to optimize - Here is where I came in the picture for this particular issue.

Ok, so what do we need to optimize?
As explained above, there are many components in the equation (Remember all those N tiers). However, optimizing one variable will not solve the whole problem. For example, it is not viable to throw in a faster web server. We can't suggest to change the pipes of delivery to a fiber optic channel. Nor, we can tell the user to click on the "view report" button, go for lunch and come back 15 minutes later.

First of all, changes of infrastructure are very expensive and if you (as a Software solution giver) suggests this route after some consideration, you are missing a big chunk of the existing picture (Albeit, there are times that the only way to improve "perceive performance" is through a change of delivery infrastructure. I.e. faster server, bigger pipes, or faster client machines). It is very unrealistic to change work habits of workers. Software, is easier adapted to the needs of the user. Hence, there must be something else that can be done.

I had a specific problem, and I had many components to look at. I spent between 5-10 days looking at the current solution and understanding what each component did - In the current environment - Tools are tools and are used differently - I did know what a web server does :)

Anyway, I didn't design the system, I was a new comer and I had to spend the time to understand what others had done with those tools. Let me tell you, if you are technical manager who pays consultants to do this type of work, this is money and time well spent. To fix something, one must understand what is being fixed and the only way to wrap your mind around a problem is by looking at it and tinkering with it.

I put the proposal for the solution together. I had proposed, to leave all components alone and pre-generate all the reports and publish them to the web portal for user consumption. Nothing new here, this is batch computing.

My solution called for a Java Application to reuse existing databases to automated the generation of all reports. Minimum maintenance. If an employee was added/removed, my solution would pick the changes at run time. We would schedule the application to run monthly (Via preferred method: cron, Windows scheduler, Brio Scheduler, etc), and all reports would be pre-generated.

That was the proposal, and the solution was accepted.

Of course, I spent the first week understanding their environment, and an "enhanced" proposal popped in my head: Don't use a Java Application, use VB Script and ActiveX to instantiate the executable generating the reports, and have a master file (.BQY file) generate all reports from within itself.

What was the impact?
I had less design and implementation to do. No Java Application was required. I was to use all existing technology to solve the problem. Which by the way, was delivering a 9 MB file to thousands of user, who, potentially, could have been looking at the same report at the same time.

So, pre-generating the report was clever, but now all that performance gain in the user eyes, had to be lost somewhere else. I was reminded of the conservation of energy law: Energy cannot be created, nor destroyed, only transformed.

In our case, it meant that the processing of this report was to be done monthly on some hidden and dark computer room in well secured computing facilities. After making the optimization work, it took around 8 hours to generated X amount of reports (Sorry no details - But, the number was in the thousands). It is not a long time if you think about it, but, the stake holder once again, asked the question: "Why is it so slow?"

It is typical that after one optimizing step, which solved the issue of serving the original 9 MB report in 3 minutes (Each report was server under 15 seconds after I was completed), we had more optimization to do.

The question now becomes of economics and the primordial query: "Is it good enough for our purposes?"

I'm of the point of view that anything and everything is possible. It's all a matter of how much money you have and how much time you have to spend. Obviously, this two resources are in short supply in some cases, so an executive decision has to be made: Do we go through more optimizing steps, or not?

Since the environment was composed of many components and one of them was a RDBMS (Oracle 9i), my consulting services were out of the picture. I'm well versed with DB matters and optimization of queries issues, however there was a full time DBA to handle the situation. The next optimization step is of course, to look at the report generation. Analyze DB queries at each processing instance and investigate if more gains can be made by optimizing the raw SQL. In this case, there was an instance and a minor change to the processing had to be made. Each report can now be generated every 5 seconds. This is not bad at all, considering that this reports get generated monthly, spending 5 hours per month is a non-issue.

Of course, if the number of reports to be generated grows to be millions, the current solution could still be used with minor configuration changes. I.e. Distribute the load among different machines. The open ended architecture of the optimization solutions, lends itself for distributed computing.

As I mentioned, there could be many optimizing steps, however, this was the end of this cycle. The solution was: cost effective, re-used of existing components was maximized, impact on concurrent development efforts was non existent, and most importantly my solution solved the problem of serving the "one" report to thousands of internal users.

In summary,

  • Optimizing is good whenever there is a "working" solution
  • There is no point in optimizing development code - Wait until the components are "production" quality and (sometimes) deployed into a real environment
  • Code with performance in mind. I.e. Do not create more objects than you need. Use the most efficient algorithm you can design - Sometimes, the most elegant algorithm is more complex than it needs to be and it doesn't yield optimal performance
  • First understand your domain (I.e. Data inputs) and then decide on your methods, if you are thinking of changing working code
  • Set an attainable and measurable goal: "Serve a 1 MB file in 10 to 15 seconds after the user clicks the link." Don't use: "Sort array A of n elements and better than O(n log n) time." - This is not possible.
  • Never make performance promises you can't fulfil
  • Last, but not least, every situation is different. Your performance improving methodology will be different from case to case. I.e. Sometimes fine tuning an application server is enough. Sometimes source code needs to open and one particular algorithm needs to be revisited or change entirely. Sometimes, the only way to increase performance is by putting new hardware in place, etc, etc, etc.
This is by no means a complete guide on how to tackle optimization problems. There are volumes of works relating to the performance topic, in some web site/library near you. Google for example, had this to say.

Finally, solving performance issues is fun. I for one, don't cringe when I hear a stake holder ask the question: "Why is it so slow?" I look directly in the eyes and ask back: "What do you mean slow? Relative to what baseline is the current solution slow? Aha! if you have no base line, lets create one, measure our results and let performance tune this baby - There are many steps to optimization. Lets start with ...blah...blah...blah..." I think you get the picture.


Guestbook
Copyright © Jose Sandoval 2005 - jose@josesandoval.com