Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /nfs/c02/h02/mnt/21229/domains/perfectresolution.com/html/wp-includes/pomo/plural-forms.php on line 210
Over 5 Minutes » 2017 » September

Archive for September, 2017

Troubleshooting Performance in Complex Python Applications

At one of my jobs I helped build an application that processing 10s of thousands of reports on a nightly basis about a life insurance policy. Whenever you start running any code at scale you invariably end up running into performance bottlenecks. Below is an example of how I triaged improving performance in a complicated python application.

Often finding the place with the bottleneck is the hardest part. I started out using cProfile.

Our application is a django application, written in python that takes pdf’s and warehouses data into postgresql. To triage this I took a report, and ran it through our report processing pipeline with cProfile enabled. The command I ran was similar to this:

python -m cProfile -o myscript.cprof ./process_report.py

One thing I immediately noticed was that all the profiling had an incredibly huge impact on the performance of the application. This task that normally run in a minute or so took over 24 hours to run–to the point where I canceled the job before it was complete. Fortunately the profiler had collected enough information for me to start diagnosing the issue.

I grabbed the .cprof file created above and loaded it into Intelij for analysis (Tools->Open CProfile Snapshot), which gave a visualization of the code below:

This gives a sense of how complex our application is–each of those colored blocks is a distinct function call. We ran close to 50 distinct functions to extract, transform and load this data, with about half of them being our code, and the other half being part of the django framework.

The most notable thing about the timing information is that sql is taking ~99% of our processing time. This should come as no surprise to someone familiar with databases. Queries have a number of qualities that make them very expensive, including network connections, as well as the transaction time to commit data.

While sql being a substantial performance issue should come as no surprise, I noticed something odd in a key part of the application highlighted here:

In our application we have a function populate that is responsible for writing out a single record. I noticed that for every populate function call we were calling save to write our data 3 times! No good! Taking this information as a hint we were able to track down some unintended triplicate writes in our application, speeding up this process by ~30%.

Another aspect worth pointing out is that in our application we have some queries that are responsible for reading and validating the integrity of our data. This is visually represented by the get_gl_account function in the above screenshot, though there are many other functions like these in our application depending on the data. It was often conjectured that these queries were happening at such a high frequency that they were negatively impacting performance. Taking a look at that function we can see that despite running ~500 times, it was taking an infinitesimally small (0%) of the time in this batch process. Using the power of cProfile, that conjecture was put to reset.

In summary while a little knowledge of the application was necessary to tune the code, cProfile is an incredibly powerful tool for optimizing python code in the right hands.

September 29, 2017 • Tags: , • Posted in: Technology • No Comments

Tag Cloud of Review Comments

I was just poking around on my linked in profile looking at my reviews and I started wondering how I could better visualize this. So I threw together a quick tag cloud and oh man was I happy about peoples perception of me. Take a quick look below.

Read the rest of this post »

September 23, 2017 • Posted in: Opinions • No Comments