[daily][optimize] 去吃面 (python类型转换函数引申的性能优化)(未完待续)










我复制了一个统一样本输出的程序 generator.py ,用以测试如下:

#! /usr/bin/python2

import random
import time

def main():
        maxv = 1<<16 - 1
        f = open(‘base.txt‘, ‘w‘)
        for i in range(0, 600000):
                length = random.randint(10, 50)
                a = ""
                for i in range(0, length):
                        v = random.randint(32, 126)
                        a += chr(v)
                for i in range(0, 11):
                        v = random.randint(0, maxv)
                        a += ‘,‘
                        a += str(v)
                a += ‘\n‘

if ‘__main__‘ == __name__:



#! /usr/bin/python2

def main():
        f = open(‘base.txt‘, ‘r‘)
        result = [‘HOST‘, 0,0,0,0,0,0,0,0,0,0,0]
        for line in f.readlines():
                l = line.strip().split(‘,‘)
                length = len(l)
                if length < 12 :
                        raise Exception, "HUGE_TONG: Wrong format"
                skip = length - 12
                for i in range(1,12):
                        result[i] += int(l[i+skip])
        print result

if __name__ == ‘__main__‘:
        r = main()


[[email protected] chimian]$ time ./run.py
[‘HOST‘, 9829498952, 9833094366, 9827153757, 9835131709, 9843502266, 9836377986, 9825044768, 9833232152, 9841073437, 9833009489, 9833147503]

real    0m3.787s
user    0m3.747s
sys     0m0.037s

共花了 3.7 秒,顶尖生说这实在太慢了,要优化! 优化个毛线啊,我巴拉巴拉随便说了点思路,他说不对,我就投降了,因为性能不满意就用C好了!从来没想过python还要玩这么高深的优化,其实对于这种狗屁面试题我很生气。他说耗性能的是int()函数,可以用generator什么什么之类的。回来之后和张老师以及胡老师发了发牢骚,他们还挺兴趣的这个题。索性我也就再重新好好研究一下。


[[email protected] chimian]$ time ./run-md.py
[‘HOST‘, 600000, 600000, 600000, 600000, 600000, 600000, 600000, 600000, 600000, 600000, 600000]

real    0m1.208s
user    0m1.180s
sys     0m0.023s
[[email protected] chimian]$ diff run.py run-md.py
<                       result[i] += int(l[i+skip])
>               #       result[i] += int(l[i+skip])
>                       result[i] += 1
[[email protected] chimian]$ 


读了官方文档,关于调试与性能优化的章节 但是输出只能查看到用户逻辑部分,即使使用了高级功能,并不能看见python内部调用的统计,也不能查看逐行的统计,当然或许时因为我没看太懂所以不会。

>>> c=cProfile.Profile()
>>> c.enable()
>>> run.main()
[‘HOST‘, 9829498952, 9833094366, 9827153757, 9835131709, 9843502266, 9836377986, 9825044768, 9833232152, 9841073437, 9833009489, 9833147503]
>>> c.disable()
>>> ps = pstats.Stats(c)
>>> ps.sort_stats(‘cumulative‘).print_stats()
         2400010 function calls in 4.283 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.283    4.283 <stdin>:1(<module>)
        1    3.669    3.669    4.283    4.283 run.py:3(main)
   600000    0.318    0.000    0.318    0.000 {method ‘split‘ of ‘str‘ objects}
   600000    0.140    0.000    0.140    0.000 {range}
   600000    0.069    0.000    0.069    0.000 {method ‘strip‘ of ‘str‘ objects}
        1    0.053    0.053    0.053    0.053 {method ‘readlines‘ of ‘file‘ objects}
   600000    0.033    0.000    0.033    0.000 {len}
        2    0.000    0.000    0.000    0.000 /usr/lib/python2.7/encodings/utf_8.py:15(decode)
        2    0.000    0.000    0.000    0.000 {_codecs.utf_8_decode}
        1    0.000    0.000    0.000    0.000 {open}
        1    0.000    0.000    0.000    0.000 {method ‘close‘ of ‘file‘ objects}
        1    0.000    0.000    0.000    0.000 {method ‘disable‘ of ‘_lsprof.Profiler‘ objects}

<pstats.Stats instance at 0x7f6ef2c8e6c8>


求助google,发现了 line_profiler 可以做行统计

[[email protected] ~]$ sudo pip2 install line_profiler


1. 在需要分析的函数前增加修饰行 @profile

def main():
        f = open(‘base.txt‘, ‘r‘)

2. 使用如下命令

[[email protected] chimian]$ kernprof -v -l run.py
[‘HOST‘, 9829498952, 9833094366, 9827153757, 9835131709, 9843502266, 9836377986, 9825044768, 9833232152, 9841073437, 9833009489, 9833147503]
Wrote profile results to run.py.lprof
Timer unit: 1e-06 s

Total time: 10.9936 s
File: run.py
Function: main at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
     3                                           @profile
     4                                           def main():
     5         1            8      8.0      0.0         f = open(‘base.txt‘, ‘r‘)
     6         1            1      1.0      0.0         result = [‘HOST‘, 0,0,0,0,0,0,0,0,0,0,0]
     7    600001       315209      0.5      2.9         for line in f.readlines():
     8    600000       723354      1.2      6.6                 l = line.strip().split(‘,‘)
     9    600000       273985      0.5      2.5                 length = len(l)
    10    600000       264674      0.4      2.4                 if length < 12 :
    11                                                                  raise Exception, "HUGE_TONG: Wrong format"
    12    600000       263102      0.4      2.4                 skip = length - 12
    13   7200000      3191710      0.4     29.0                 for i in range(1,12):
    14   6600000      5961474      0.9     54.2                         result[i] += int(l[i+skip])
    15         1           14     14.0      0.0         f.close()
    16         1           37     37.0      0.0         print result


[[email protected] chimian]$ kernprof -v -l run.py
[‘HOST‘, 9829498952, 9833094366, 9827153757, 9835131709, 9843502266, 9836377986, 9825044768, 9833232152, 9841073437, 9833009489, 9833147503]
Wrote profile results to run.py.lprof
Timer unit: 1e-06 s

Total time: 20.4831 s
File: run.py
Function: main at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
     3                                           @profile
     4                                           def main():
     5         1            7      7.0      0.0         f = open(‘base.txt‘, ‘r‘)
     6         1            1      1.0      0.0         result = [‘HOST‘, 0,0,0,0,0,0,0,0,0,0,0]
     7    600001       343923      0.6      1.7         for line in f.readlines():
     8    600000       772508      1.3      3.8                 l = line.strip().split(‘,‘)
     9    600000       305425      0.5      1.5                 length = len(l)
    10    600000       281454      0.5      1.4                 if length < 12 :
    11                                                                  raise Exception, "HUGE_TONG: Wrong format"
    12    600000       283245      0.5      1.4                 skip = length - 12
    13   7200000      3414854      0.5     16.7                 for i in range(1,12):
    14                                                          #       result[i] += int(l[i+skip])
    15   6600000      3044661      0.5     14.9                         index = i + skip
    16   6600000      3001742      0.5     14.7                         value_str = l[index]
    17   6600000      5589397      0.8     27.3                         value_int = int(value_str)
    18   6600000      3445814      0.5     16.8                         result[i] += value_int
    19         1           13     13.0      0.0         f.close()
    20         1           37     37.0      0.0         print result


#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>

int main()
        FILE* fp = fopen("./base.txt", "r");
        char* buf = malloc(1024);
        size_t len = 0;
        int slen = 0;
        long array[11] = {0,0,0,0,0,0,0,0,0,0,0};

        while (1) {
                int r = getline(&buf, &len, fp);
                if (r < 0) {
                        if (errno != 0)
                                perror("HUGE_TONG read(): ");
                slen = strlen(buf);
                if (buf[slen-1] != ‘\n‘) {
                        printf("do not supporting format: %c \n", buf[slen-1]);
                char* str;
                buf[slen-1] = 0;
                int index = 10;
                for (int i = slen - 1; i >= 0; i--) {
                        if (buf[i] == ‘,‘) {
                                str = buf + (i+1);
                                buf[i] = 0;
                                int value = atoi(str);
                                array[index--] += value;
                                if (index < 0) {
        for (int i = 0; i < 11; i++) {
                printf("%ld\t", array[i]);
        return 0;


[[email protected] chimian]$ gcc -Wall -O3 crun.c
[[email protected] chimian]$ time ./a.out
9829498952      9833094366      9827153757      9835131709      9843502266      9836377986      9825044768      9833232152      9841073437      9833009489      9833147503

real    0m0.288s
user    0m0.277s
sys     0m0.010s


[[email protected] chimian]$ python2 -O3 -m py_compile run.py
[[email protected] chimian]$ time python2 run.pyo
[‘HOST‘, 9829498952, 9833094366, 9827153757, 9835131709, 9843502266, 9836377986, 9825044768, 9833232152, 9841073437, 9833009489, 9833147503]

real    0m3.865s
user    0m3.833s
sys     0m0.030s


[[email protected] chimian]$ diff run.py run-md.py
<                       value_int = int(value_str)
>                       value_int = 23298
[[email protected] chimian]$ vim run-md.py
[[email protected] chimian]$ vim run-md.py
[[email protected] chimian]$ time ./run-md.py
[‘HOST‘, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000]

real    0m1.449s
user    0m1.420s
sys     0m0.027s
[[email protected] chimian]$ vim run-md.py
[[email protected] chimian]$ kernprof -v -l run-md.py
[‘HOST‘, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000, 13978800000]
Wrote profile results to run-md.py.lprof
Timer unit: 1e-06 s

Total time: 17.2632 s
File: run-md.py
Function: main at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
     3                                           @profile
     4                                           def main():
     5         1            7      7.0      0.0         f = open(‘base.txt‘, ‘r‘)
     6         1            1      1.0      0.0         result = [‘HOST‘, 0,0,0,0,0,0,0,0,0,0,0]
     7    600001       335628      0.6      1.9         for line in f.readlines():
     8    600000       728998      1.2      4.2                 l = line.strip().split(‘,‘)
     9    600000       293883      0.5      1.7                 length = len(l)
    10    600000       278300      0.5      1.6                 if length < 12 :
    11                                                                  raise Exception, "HUGE_TONG: Wrong format"
    12    600000       276842      0.5      1.6                 skip = length - 12
    13   7200000      3302175      0.5     19.1                 for i in range(1,12):
    14                                                          #       result[i] += int(l[i+skip])
    15   6600000      2965126      0.4     17.2                         index = i + skip
    16   6600000      2943092      0.4     17.0                         value_str = l[index]
    17   6600000      2808623      0.4     16.3                         value_int = 23298
    18   6600000      3330465      0.5     19.3                         result[i] += value_int
    19         1           12     12.0      0.0         f.close()
    20         1           39     39.0      0.0         print result


[[email protected] chimian]$ time ./run-md.py
[‘HOST‘, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

real    0m0.960s
user    0m0.923s
sys     0m0.033s
[[email protected] chimian]$ cat run-md.py
#! /usr/bin/python2

def main():
        f = open(‘base.txt‘, ‘r‘)
        result = [‘HOST‘, 0,0,0,0,0,0,0,0,0,0,0]
        for line in f.readlines():
                l = line.strip().split(‘,‘)
                length = len(l)
                if length < 12 :
                        raise Exception, "HUGE_TONG: Wrong format"
                skip = length - 12
                a = 0
                for i in range(1,12):
        #               result[i] += int(l[i+skip])
        #               index = i + skip
        #               value_str = l[index]
        #               value_int = 23298
        #               result[i] += value_int
        #               result[i] += 23298
                        a += 23298
        print result

if __name__ == ‘__main__‘:
        r = main()
#       exit(r)
[[email protected] chimian]$ 


[[email protected] chimian]$ time ./run-test.py 

real    0m0.372s
user    0m0.293s
sys     0m0.077s
[[email protected] chimian]$ cat run-test.py
#! /usr/bin/python2

def main():
        a = 0;
        for i in range(0, 6600000):
                a += 235678

if __name__ == ‘__main__‘:
[[email protected] chimian]$ 

并没有太多的思路,查到一篇 文章,写的很好。



时间: 2024-12-26 07:37:49

[daily][optimize] 去吃面 (python类型转换函数引申的性能优化)(未完待续)的相关文章


一.函数声明中函数名是必须的:函数表达式中则是可选的 //函数声明 function sum(a, b) { return a + b; } alert(sum(1, 2)); //函数表达式 /* var s = function sum(a, b) { return a + b; } alert(s(1, 2)); */ var s = function(a, b) { return a + b; } alert(s(1, 2)); //以上两种都可以 二.用函数声明定义的函数,函数可以在函


1.创建项目 2.创建models from django.db import models # Create your models here. # 主机表 class Host(models.Model): # 主机名 hostname = models.CharField(max_length=128,unique=True) # 主机key key = models.TextField() # 主机状态 # 等待同意 status_choices = ((0,'Waiting Appro


1.bisect:使用二分法,在一个已排序的序列查找合适的插入位置. >>>import bisect >>>l = [10,19,88,90] >>>bisect.bisect(l, 22) >>>2 #适合插入的位置为2 bisect.bisect_left(l, 22) #如果已经在列表中存在,返回左边的位置 bisect.insort_left(l,22) #插入 2.heapq:完全平衡二叉树,所有节点都小于其子节点. fr


转自:http://blog.csdn.net/gzlaiyonghao/article/details/1483728 本文最初发表于恋花蝶的博客http://blog.csdn.net/lanphaday,欢迎转载,但请务必保留原文完整,并保留本声明. [python]用profile协助程序性能优化 上帝说:“选择了脚本,就不要考虑性能.”我是很支持这句话的,使用脚本要的就是开发速度.良好的扩展性以及可维护性.可惜到了最后,我们的程序难免会运行得太慢,我们的客户不能忍受,这时候,我们就不得

Python核心编程(第二版) 第二章习题答案 未完待续

2-2.程序输出.阅读下面的Python脚本.#!/usr/bin/env python1 + 2 * 4(a)你认为这段脚本是用来做什么的?(b)你认为这段脚本会输出什么?(c)输入以上代码,并保存为脚本,然后运行它,它所做的与你的预期一样吗?为什么一样/不一样?(d)这段代码单独执行和在交互解释器中执行有何不同?试一下,然后写出结果.(e)如何改进这个脚本,以便它能和你想象的一样工作?答:(a)这段脚本是用来计算表达式的值(b)脚本会输出9(c)保存为脚本,运行后没有输出.和自己预期不一样.

Python开发 標準內建方法 (未完代補)

abs(number)  絕對值  The abs() method takes a single argument: num - number whose absolute value is to be returned. The number can be: integer floating number complex number any(iterable)   The any() method returns True if any element of an iterable is

python 类型转换函数

python提供了一些可将某个值从一种类型转换为另一种类型的内置函数. 1. int函数可以把任何可以转换为整型的值转换为整型.int可以将浮点数转换为整数,但不会做四舍五入操作,而是直接丢弃小数部分. >>> int('32') 32 >>> int(3.9999) 3 2. float函数将整数和字符串转换为浮点数. >>> float(32) 32.0 >>> float('3.14159') 3.14159 3. str函数将

python 函数(未完待续)

什么是函数? 在程序中,函数就具备某一功能的工具 事先将工具准备好即函数的定义 遇到应用场景拿来就用即函数的调用 所以务必记住:#函数的使用必须遵循先定义,后调用的原则 拿来加()就能用 用函数的好处减少代码冗余增强程序的扩展性增强程序的结构性与可读性 函数分为2种: 1 无参函数 2 有参函数 什么是返回值? 返回值是一个函数的处理结果,如果我们需要在程序中拿到函数的处理结果做进一步的处理,则需要函数必须有返回值 函数的返回值用return去定义格式为:return 值 注意: 1.retur


接下来将记录我一步一步写一个非官方API的过程,因为一些条件的约束,最后的成品可能很粗暴简陋 现在介绍要准备的所有工具: 系统:ubuntu 14.04 语言:python 2.7 需要自行安装的库:flask,BeautifulSoup4,requests,selenium,pinyin,phantomjs-1.9.8 服务器:Sina App Engine 因为成本原因我选择了Sina App Engine,因为免费,但是免费也带来了一定的麻烦就是功能不全,虽然Sina App Engine