Technology

Troubleshooting Performance in Complex Python Applications

At one of my jobs I helped build an application that processing 10s of thousands of reports on a nightly basis. Whenever you start running any code at scale you invariably end up running into performance bottlenecks. Below is an example of how I triaged improving performance in a complicated python application.

Often finding the place with the bottleneck is the hardest part. I started out using cProfile.

Our application is a django application, written in python that takes pdf’s and warehouses data into postgresql. To triage this I took a report, and ran it through our report processing pipeline with cProfile enabled. The command I ran was similar to this:

python -m cProfile -o myscript.cprof ./process_report.py

One thing I immediately noticed was that all the profiling had an incredibly huge impact on the performance of the application. This task that normally run in a minute or so took over 24 hours to run–to the point where I canceled the job before it was complete. Fortunately the profiler had collected enough information for me to start diagnosing the issue.

I grabbed the .cprof file created above and loaded it into Intelij for analysis (Tools->Open CProfile Snapshot), which gave a visualization of the code below:

This gives a sense of how complex our application is–each of those colored blocks is a distinct function call. We ran close to 50 distinct functions to extract, transform and load this data, with about half of them being our code, and the other half being part of the django framework.

The most notable thing about the timing information is that sql is taking ~99% of our processing time. This should come as no surprise to someone familiar with databases. Queries have a number of qualities that make them very expensive, including network connections, as well as the transaction time to commit data.

While sql being a substantial performance issue should come as no surprise, I noticed something odd in a key part of the application highlighted here:

In our application we have a function populate that is responsible for writing out a single record. I noticed that for every populate function call we were calling save to write our data 3 times! No good! Taking this information as a hint we were able to track down some unintended triplicate writes in our application, speeding up this process by ~30%.

Another aspect worth pointing out is that in our application we have some queries that are responsible for reading and validating the integrity of our data. This is visually represented by the get_gl_account function in the above screenshot, though there are many other functions like these in our application depending on the data. It was often conjectured that these queries were happening at such a high frequency that they were negatively impacting performance. Taking a look at that function we can see that despite running ~500 times, it was taking an infinitesimally small (0%) of the time in this batch process. Using the power of cProfile, that conjecture was put to reset.

In summary while a little knowledge of the application was necessary to tune the code, cProfile is an incredibly powerful tool for optimizing python code in the right hands.

September 29, 2017 • Tags: , • Posted in: Technology • No Comments

How to Use the Python Memory Profiler

Recently one of my coworkers was having an issue where some of our code running over at heroku was consuming a massive amount of memory. One of the tools I was looking at to help troubleshoot this was the python memory profiler. While I was loading a newer copy of our data set my co-worker identified the root cause of the issue, so we didn’t end up using this tool.  That said, I found the information in provided while profiling code incredibly interesting, and potentially useful in the future.  Read the rest of this post »

October 7, 2016 • Tags: , • Posted in: Technology • No Comments

Set Up a Multi User Install of WordPress

Let’s say you’ve been rolling around the country bartering some light web work for whatever favor fancies you.  Several months later you find yourself in maintenance hell with a million wordpress installations that all need to be updated.  Let me help you.  Set up a multiuser install of WordPress and save yourself some headache. Read the rest of this post »

October 23, 2014 • Tags: , • Posted in: Technology • No Comments

How to Bulk Delete Your Twitter Followers

This post starts out with a little bit of an embarrassing story, followed with a script you can use to delete everyone you are following on twitter.  A couple years ago, I signed up for twitter because hey, someone once squatted my liyanage@hotmail.com email address and I wanted to reserve my twitter name.  I had no interest in actually doing twitter.  I just wanted the virtual real estate (and why isn’t that virtual estate?).  Fast forward a couple years, and I check out my twitter account, and low and behold I am following close to 2,000 people.  Someone had hacked my account, and was selling my following off to the masses.  Unfortunately, there is no bulk delete option on twitter, and it would take an incredibly long time to delete that many people. Read the rest of this post »

August 27, 2014 • Tags: , , • Posted in: Technology • No Comments

Engaging your Customer Base through Web Services

In 2010 I did a presentation  at the Epicor Perspectives conference.  In the presentation I talked a little bit about our  how we  used their back end web services to build a front end website.

Included in the presentation are some slides:

You can download my Epicor Perspectives 2010 Presentation here.  This presentation is a bit sparse with out the talking points behind it–feel free to contact me if you have any questions.

June 22, 2013 • Tags:  • Posted in: Technology • No Comments

Dynamically Creating a CSR & Private Key in .NET

This one was a bit tricky–it took me two days to figure this out, and when I figured it out I didn’t even realize I was close to the solution.  When I initially started working on this, I was looking into using an OpenSSL port to windows called OpenSSL.NET.  The pure ASCII look of this page should be a good indication of how many other alternatives there are out there.  Eventually I found The Legion of Bouncy Castle, and stumbled onto a solution.  Initially I discredited looking at this option too thoroughly due to the name–but again, because of the lack of how many good alternatives out there it became a steady contender.

Read the rest of this post »

October 31, 2011 • Tags: , , , , • Posted in: Technology • 3 Comments

Enabling SSL in Epicor ITSM

We’ve recently granted access to Epicor to an outside company.  After opening up access over SSL for the company, we found that our setup was not quite right.  In addition to hitting a checkbox, there are a couple XML files you need to edit.

Read the rest of this post »

November 1, 2010 • Tags: , • Posted in: Technology • No Comments

Programatically Accessing the Forms Authentication Timout from the web.config

I’ve been working on creating a session time out popup for our upcoming portal release.  While I wanted to run the script through my own javascript, I had a little trouble finding the actual property.  Eventually I found a dead page cached with the code that will retrieve the Forms Authentication Timout variable.  I’m reposting the code for prosperities sake.

Read the rest of this post »

October 4, 2010 • Tags: , , , • Posted in: Technology • No Comments

Brilliant SEO Tactics

I went looking for some Credit Card logos the other day to display on some web forms, and I found this awesome scam of a site.  If you’re not paying attention, and you copy and paste their code you are secretly improving the SEO of some random law firm, http://www.criminalattorneys.com.

Read the rest of this post »

October 1, 2010 • Tags: , , • Posted in: Technology • No Comments

Getting AnkhSVN to Work with an Umbraco Solution

Recently I have been developing a .NET portal, using the Umbraco CMS.  Having gotten a couple parts of the application set up, I decided that it would be a good idea to get SVN set up with my code.  In order to get my Umbraco installation set up, I created several masterpages, a slightly modified version of the CSS Friendly web controls, a couple custom controls to be used in their embeded WYSIWYG editor, and a couple static  images and CSS files.  Suffice to say, I hit a snag.

Read the rest of this post »

June 8, 2010 • Tags: , , , • Posted in: Technology • No Comments