A guide to analyzing Python performance

来源:http://www.huyng.com/posts/python-performance-analysis/

While it’s not always the case that every Python program you write will require a rigorous performance analysis, it is reassuring to know that there are a wide variety of tools in Python’s ecosystem that one can turn to when the time arises.

Analyzing a program’s performance boils down to answering 4 basic questions:

  1. How fast is it running?
  2. Where are the speed bottlenecks?
  3. How much memory is it using?
  4. Where is memory leaking?

Below, we’ll dive into the details of answering these questions using some awesome tools.

Coarse grain timing with time

Let’s begin by using a quick and dirty method of timing our code: the good old unix utility time.

$ time python yourprogram.py

real    0m1.028s
user    0m0.001s
sys     0m0.003s

The meaning between the three output measurements are detailed in thisstackoverflow article, but in short

  • real - refers to the actual elasped time
  • user - refers to the amount of cpu time spent outside of kernel
  • sys - refers to the amount of cpu time spent inside kernel specific functions

You can get a sense of how many cpu cycles your program used up regardless of other programs running on the system by adding together the sys and user times.

If the sum of sys and user times is much less than real time, then you can guess that
most your program’s performance issues are most likely related to IO waits.

Fine grain timing with a timing context manager

Our next technique involves direct instrumentation of the code to get access to finer grain timing information. Here’s a small snippet I’ve found invaluable for making ad-hoc timing measurements:

timer.py

import time

class Timer(object):
    def __init__(self, verbose=False):
        self.verbose = verbose

    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.secs = self.end - self.start
        self.msecs = self.secs * 1000  # millisecs
        if self.verbose:
            print ‘elapsed time: %f ms‘ % self.msecs

In order to use it, wrap blocks of code that you want to time with Python’s withkeyword and this Timer context
manager. It will take care of starting the timer when your code block begins execution and stopping the timer when your code block ends.

Here’s an example use of the snippet:

from timer import Timer
from redis import Redis
rdb = Redis()

with Timer() as t:
    rdb.lpush("foo", "bar")
print "=> elasped lpush: %s s" % t.secs

with Timer() as t:
    rdb.lpop("foo")
print "=> elasped lpop: %s s" % t.secs

I’ll often log the outputs of these timers to a file in order to see how my program’s performance evolves over time.

Line-by-line timing and execution frequency with a profiler

Robert Kern has a nice project called line_profiler which I often use to see how fast and how often each line of code is running in my scripts.

To use it, you’ll need to install the python package via pip:

$ pip install line_profiler

Once installed you’ll have access to a new module called “line_profiler” as well as an executable script “kernprof.py”.

To use this tool, first modify your source code by decorating the function you want to measure with the @profile decorator. Don’t worry, you don’t have to import anyting
in order to use this decorator. The kernprof.py script automatically injects it into your script’s runtime during execution.

primes.py

@profile
def primes(n):
    if n==2:
        return [2]
    elif n<2:
        return []
    s=range(3,n+1,2)
    mroot = n ** 0.5
    half=(n+1)/2-1
    i=0
    m=3
    while m <= mroot:
        if s[i]:
            j=(m*m-3)/2
            s[j]=0
            while j<half:
                s[j]=0
                j+=m
        i=i+1
        m=2*i+3
    return [2]+[x for x in s if x]
primes(100)

Once you’ve gotten your code setup with the @profile decorator, usekernprof.py to
run your script.

$ kernprof.py -l -v fib.py

The -l option tells kernprof to inject the @profile decorator into your script’s
builtins, and -v tells kernprof to display timing information once you’re script finishes. Here’s one the output should look like for the above script:

Wrote profile results to primes.py.lprof
Timer unit: 1e-06 s

File: primes.py
Function: primes at line 2
Total time: 0.00019 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     2                                           @profile
     3                                           def primes(n):
     4         1            2      2.0      1.1      if n==2:
     5                                                   return [2]
     6         1            1      1.0      0.5      elif n<2:
     7                                                   return []
     8         1            4      4.0      2.1      s=range(3,n+1,2)
     9         1           10     10.0      5.3      mroot = n ** 0.5
    10         1            2      2.0      1.1      half=(n+1)/2-1
    11         1            1      1.0      0.5      i=0
    12         1            1      1.0      0.5      m=3
    13         5            7      1.4      3.7      while m <= mroot:
    14         4            4      1.0      2.1          if s[i]:
    15         3            4      1.3      2.1              j=(m*m-3)/2
    16         3            4      1.3      2.1              s[j]=0
    17        31           31      1.0     16.3              while j<half:
    18        28           28      1.0     14.7                  s[j]=0
    19        28           29      1.0     15.3                  j+=m
    20         4            4      1.0      2.1          i=i+1
    21         4            4      1.0      2.1          m=2*i+3
    22        50           54      1.1     28.4      return [2]+[x for x in s if x]

Look for lines with a high amount of hits or a high time interval. These are the areas where optimizations can yield the greatest improvements.

How much memory does it use?

Now that we have a good grasp on timing our code, let’s move on to figuring out how much memory our programs are using. Fortunately for us, Fabian Pedregosa has implemented a nice memory
profiler
 modeled after Robert Kern’s line_profiler.

First install it via pip:

$ pip install -U memory_profiler
$ pip install psutil

(Installing the psutil package here is recommended because it greatly improves the performance of the memory_profiler).

Like line_profiler, memory_profiler requires that you decorate your function of interest with an @profile decorator like so:

@profile
def primes(n):
    ...
    ...

To see how much memory your function uses run the following:

$ python -m memory_profiler primes.py

You should see output that looks like this once your program exits:

Filename: primes.py

Line #    Mem usage  Increment   Line Contents
==============================================
     2                           @profile
     3    7.9219 MB  0.0000 MB   def primes(n):
     4    7.9219 MB  0.0000 MB       if n==2:
     5                                   return [2]
     6    7.9219 MB  0.0000 MB       elif n<2:
     7                                   return []
     8    7.9219 MB  0.0000 MB       s=range(3,n+1,2)
     9    7.9258 MB  0.0039 MB       mroot = n ** 0.5
    10    7.9258 MB  0.0000 MB       half=(n+1)/2-1
    11    7.9258 MB  0.0000 MB       i=0
    12    7.9258 MB  0.0000 MB       m=3
    13    7.9297 MB  0.0039 MB       while m <= mroot:
    14    7.9297 MB  0.0000 MB           if s[i]:
    15    7.9297 MB  0.0000 MB               j=(m*m-3)/2
    16    7.9258 MB -0.0039 MB               s[j]=0
    17    7.9297 MB  0.0039 MB               while j<half:
    18    7.9297 MB  0.0000 MB                   s[j]=0
    19    7.9297 MB  0.0000 MB                   j+=m
    20    7.9297 MB  0.0000 MB           i=i+1
    21    7.9297 MB  0.0000 MB           m=2*i+3
    22    7.9297 MB  0.0000 MB       return [2]+[x for x in s if x]

IPython shortcuts for line_profiler and memory_profiler

A little known feature of line_profiler and memory_profiler is that both programs
have shortcut commands accessible from within IPython. All you have to do is type the following within an IPython session:

%load_ext memory_profiler
%load_ext line_profiler

Upon doing so you’ll have access to the magic commands %lprun and %mprunwhich behave
similarly to their command-line counterparts. The major difference here is that you won’t need to decorate your to-be-profiled functions with the@profile decorator. Just
go ahead and run the profiling directly within your IPython session like so:

In [1]: from primes import primes
In [2]: %mprun -f primes primes(1000)
In [3]: %lprun -f primes primes(1000)

This can save you a lot of time and effort since none of your source code needs to be modified in order to use these profiling commands.

Where’s the memory leak?

The cPython interpreter uses reference counting as it’s main method of keeping track of memory. This means that every object contains a counter, which is incremented when a reference to the object is stored somewhere, and decremented when a reference to it
is deleted. When the counter reaches zero, the cPython interpreter knows that the object is no longer in use so it deletes the object and deallocates the occupied memory.

A memory leak can often occur in your program if references to objects are held even though the object is no longer in use.

The quickest way to find these “memory leaks” is to use an awesome tool calledobjgraph written by Marius Gedminas. This tool allows you to see the number of objects in memory and also locate
all the different places in your code that hold references to these objects.

To get started, first install objgraph:

pip install objgraph

Once you have this tool installed, insert into your code a statement to invoke the debugger:

import pdb; pdb.set_trace()
Which objects are the most common?

At run time, you can inspect the top 20 most prevalent objects in your program by running:

(pdb) import objgraph
(pdb) objgraph.show_most_common_types()

MyBigFatObject             20000
tuple                      16938
function                   4310
dict                       2790
wrapper_descriptor         1181
builtin_function_or_method 934
weakref                    764
list                       634
method_descriptor          507
getset_descriptor          451
type                       439
Which objects have been added or deleted?

We can also see which objects have been added or deleted between two points in time:

(pdb) import objgraph
(pdb) objgraph.show_growth()
.
.
.
(pdb) objgraph.show_growth()   # this only shows objects that has been added or deleted since last show_growth() call

traceback                4        +2
KeyboardInterrupt        1        +1
frame                   24        +1
list                   667        +1
tuple                16969        +1
What is referencing this leaky object?

Continuing down this route, we can also see where references to any given object is being held. Let’s take as an example the simple program below:

x = [1]
y = [x, [x], {"a":x}]
import pdb; pdb.set_trace()

To see what is holding a reference to the variable x, run theobjgraph.show_backref() function:

(pdb) import objgraph
(pdb) objgraph.show_backref([x], filename="/tmp/backrefs.png")

The output of that command should be a PNG image stored at/tmp/backrefs.png and it should look something like this:

The box at the bottom with red lettering is our object of interest. We can see that it’s referenced by the symbol x once and by the list y three
times. If x is the object causing a memory leak, we can use this method to see why it’s not automatically being deallocated by tracking down all of its references.

So to review, objgraph allows us to:

  • show the top N objects occupying our python program’s memory
  • show what objects have been deleted or added over a period of time
  • show all references to a given object in our script

Effort vs precision

In this post, I’ve shown you how to use several tools to analyze a python program’s performance. Armed with these tools and techniques you should have all the information required to track down most memory leaks as well as identify speed bottlenecks in a Python
program.

As with many other topics, running a performance analysis means balancing the tradeoffs between effort and precision. When in doubt, implement the simplest solution that will suit your current needs.

Refrences

I build computer vision software at Flickr. You can get updates on new essays by subscribing to my rss feed. Occassionally, I
will send out interesting links on twitter so follow me if you like this kind stuff.

Related Posts

26 Mar 2014

Similarity search 101 - Part 2 (search with vp-trees)

19 Mar 2014

Similarity search 101 - Part 1 (overview)

14 Dec 2013

Faster numpy dot product for multi-dimensional arrays

时间: 2024-10-18 09:31:58

A guide to analyzing Python performance的相关文章

Python 性能分析入门指南

在岭南六少博客找到的好东西. 注: 本文的原作者是 Huy Nguyen ,原文地址为 A guide to analyzing Python performance 虽然并非你编写的每个 Python 程序都要求一个严格的性能分析,但是让人放心的是,当问题发生的时候,Python 生态圈有各种各样的工具可以处理这类问题. 分析程序的性能可以归结为回答四个基本问题: 正运行的多快 速度瓶颈在哪里 内存使用率是多少 内存泄露在哪里 下面,我们将用一些神奇的工具深入到这些问题的答案中去. 用 tim

网址大全(IT之类的)

Python : codes sources, forum, tutoriaux et astucesWriting Idiomatic Python BookPython for ArcGIS | ArcGIS Resource CenterPython Code Examples | YEI Technologypython - Code Golf: Finite-state machine! - Stack OverflowHow to Write a Spelling Corrector

python公司面试题集锦 python面试题大全

问题一:以下的代码的输出将是什么? 说出你的答案并解释. class Parent(object): x = 1 class Child1(Parent): pass class Child2(Parent): pass print Parent.x, Child1.x, Child2.x Child1.x = 2 print Parent.x, Child1.x, Child2.x Parent.x = 3 print Parent.x, Child1.x, Child2.x 答案 以上代码的

Python学习资料下载地址(转)

[转]Python学习资料和教程pdf 开发工具: Python语言集成开发环境 Wingware WingIDE Professional v3.2.12 Python语言集成开发环境 Wingware WingIDE Professional v3.2.9.1 高效Python/Django开发工具:JetBrains PyCharm v1.1.1 (附注册机) Python和Django开发工具:JetBrains PyCharm v1.1 学习资料: Python 3程序开发指南 (第二

python 装饰器语法糖(@classmethod @staticmethod @property @name.)原理剖析和运用场景

引用:http://blog.csdn.net/slvher/article/details/42497781 这篇文章系统的介绍这几者之间的关系和区别.有兴趣的朋友可以到上面的链接查看原文,这里我把原文拷贝如下(如有侵权,通知马上删除) ==================================================================== 在阅读一些开源Python库的源码时,经常会看到在某个类的成员函数前,有类似于@staticmethod或@classme

python属性查找(attribute lookup)

在Python中,属性查找(attribute lookup)是比较复杂的,特别是涉及到描述符descriptor的时候. 在上一文章末尾,给出了一段代码,就涉及到descriptor与attribute lookup的问题.而get系列函数(__get__, __getattr__, __getattribute__) 也很容易搞晕,本文就这些问题简单总结一下. 首先,我们知道: python中一切都是对象,"everything is object",包括类,类的实例,数字,模块

Python Foundation - part one

Useful Tips 在windows下安装python包出现错误,缺少setuptool, 此时先下载安装ez_setup.py, 然后就可安装所需包 Development Now, I changed to Linux, forget windows, it really sucks to do development not .NET on Windows, So I nuked up my windows and installed solely Ubuntu14.04, and p

Python descriptor

动态添加对象属性 一次偶然发现,Python的对象竟然可以在运行期动态添加类定义时没有的属性,这又颠覆了我对Python OO机制的理解.Google了一把,顺着__dict__属性一路找到descriptor,揭开了隐藏在Python对象之后的内幕. 本文主要记录Python的descriptor机制,以及其在Python对象的属性.方法绑定上的作用. 先从本文的始作俑者,运行期动态添加对象属性开始讲起. class A: def __init__(self): self.value = 'v

Nim连接到Python

原文:https://akehrer.github.io/nim/2015/01/24/connecting-nim-to-python.html 在以前的文章中在最后询问了关于Nim连接Python接口的代码,在经过一些实验后我能够做一些工作了.所以,让我们一起看一看. Compiling a library 首先我们要谈到的是Nim 编译器.在大多数情况下你把Nim代码编译成可执行文件要运行这个命令. nim c projectfile.nim 但是要想把代码写到一个库中,我们将需要看编译器