Troubleshooting Performance in Complex Python Applications

At one of my jobs I helped build an application that processing 10s of thousands of reports on a nightly basis. Whenever you start running any code at scale you invariably end up running into performance bottlenecks. Below is an example of how I triaged improving performance in a complicated python application.

Often finding the place with the bottleneck is the hardest part. I started out using cProfile.

Our application is a django application, written in python that takes pdf’s and warehouses data into postgresql. To triage this I took a report, and ran it through our report processing pipeline with cProfile enabled. The command I ran was similar to this:

python -m cProfile -o myscript.cprof ./process_report.py

One thing I immediately noticed was that all the profiling had an incredibly huge impact on the performance of the application. This task that normally run in a minute or so took over 24 hours to run–to the point where I canceled the job before it was complete. Fortunately the profiler had collected enough information for me to start diagnosing the issue.

I grabbed the .cprof file created above and loaded it into Intelij for analysis (Tools->Open CProfile Snapshot), which gave a visualization of the code below:

This gives a sense of how complex our application is–each of those colored blocks is a distinct function call. We ran close to 50 distinct functions to extract, transform and load this data, with about half of them being our code, and the other half being part of the django framework.

The most notable thing about the timing information is that sql is taking ~99% of our processing time. This should come as no surprise to someone familiar with databases. Queries have a number of qualities that make them very expensive, including network connections, as well as the transaction time to commit data.

While sql being a substantial performance issue should come as no surprise, I noticed something odd in a key part of the application highlighted here:

In our application we have a function populate that is responsible for writing out a single record. I noticed that for every populate function call we were calling save to write our data 3 times! No good! Taking this information as a hint we were able to track down some unintended triplicate writes in our application, speeding up this process by ~30%.

Another aspect worth pointing out is that in our application we have some queries that are responsible for reading and validating the integrity of our data. This is visually represented by the get_gl_account function in the above screenshot, though there are many other functions like these in our application depending on the data. It was often conjectured that these queries were happening at such a high frequency that they were negatively impacting performance. Taking a look at that function we can see that despite running ~500 times, it was taking an infinitesimally small (0%) of the time in this batch process. Using the power of cProfile, that conjecture was put to reset.

In summary while a little knowledge of the application was necessary to tune the code, cProfile is an incredibly powerful tool for optimizing python code in the right hands.

September 29, 2017 • Tags: , • Posted in: Technology • No Comments

Tag Cloud of Review Comments

I was just poking around on my linked in profile looking at my reviews and I started wondering how I could better visualize this. So I threw together a quick tag cloud and oh man was I happy about peoples perception of me. Take a quick look below.

Read the rest of this post »

September 23, 2017 • Posted in: Opinions • No Comments

How to Use the Python Memory Profiler

Recently one of my coworkers was having an issue where some of our code running over at heroku was consuming a massive amount of memory. One of the tools I was looking at to help troubleshoot this was the python memory profiler. While I was loading a newer copy of our data set my co-worker identified the root cause of the issue, so we didn’t end up using this tool.  That said, I found the information in provided while profiling code incredibly interesting, and potentially useful in the future.  Read the rest of this post »

October 7, 2016 • Tags: , • Posted in: Technology • No Comments

Set Up a Multi User Install of WordPress

Let’s say you’ve been rolling around the country bartering some light web work for whatever favor fancies you.  Several months later you find yourself in maintenance hell with a million wordpress installations that all need to be updated.  Let me help you.  Set up a multiuser install of WordPress and save yourself some headache. Read the rest of this post »

October 23, 2014 • Tags: , • Posted in: Technology • No Comments

How to Bulk Delete Your Twitter Followers

This post starts out with a little bit of an embarrassing story, followed with a script you can use to delete everyone you are following on twitter.  A couple years ago, I signed up for twitter because hey, someone once squatted my liyanage@hotmail.com email address and I wanted to reserve my twitter name.  I had no interest in actually doing twitter.  I just wanted the virtual real estate (and why isn’t that virtual estate?).  Fast forward a couple years, and I check out my twitter account, and low and behold I am following close to 2,000 people.  Someone had hacked my account, and was selling my following off to the masses.  Unfortunately, there is no bulk delete option on twitter, and it would take an incredibly long time to delete that many people. Read the rest of this post »

August 27, 2014 • Tags: , , • Posted in: Technology • No Comments

River Safari

This morning we milled around the house getting over our jet lag.  In the afternoon we walked down to the river close to the house with Dhanarathna.  It was incredibly hot.  I think Sri Lanka is hotter now then when I was last here in December–or it’s also possible I’ve gotten more accustomed to AC living, or was almost exclusively at the beach last time.

Read the rest of this post »

July 18, 2014 • Tags: , • Posted in: Vacation • 2 Comments

Trip Planning Excercises

We spend most of the morning planning the rest of our trip.  Emma brought the 2012 Sri Lanka Lonely Planet guide book. Read the rest of this post »

July 17, 2014 • Tags: , • Posted in: Vacation • No Comments

Touchdown in Sri Lanka

We landed in Colombo today. Read the rest of this post »

July 16, 2014 • Tags: , • Posted in: Vacation • One Comment

Thoughts on a Vistage SEO Presentation

Recently I saw Evan Bailyn from First Page Sage speak at our CEO’s local Vistage meeting about SEO.  I found that his presentation was in line with what I have come to understand about SEO from working with various other SEO vendors: content is king; don’t deceive google, or your customers; and get people to link you. Read the rest of this post »

January 14, 2014 • Posted in: Opinions • No Comments

Engaging your Customer Base through Web Services

In 2010 I did a presentation  at the Epicor Perspectives conference.  In the presentation I talked a little bit about our  how we  used their back end web services to build a front end website.

Included in the presentation are some slides:

You can download my Epicor Perspectives 2010 Presentation here.  This presentation is a bit sparse with out the talking points behind it–feel free to contact me if you have any questions.

June 22, 2013 • Tags:  • Posted in: Technology • No Comments