Potential Pythonic Pitfalls

Potential Pythonic Pitfalls

Monday, 11 May 2015

Table of Contents

Python is a very expressive language. It provides us with a large standard library and many builtins to get the job done quickly. However, many can get lost in the power that it provides, fail to make full use of the standard library, value one liners over clarity and misunderstand its basic constructs. This is a non-exhaustive list of a few of the pitfalls programmers new to Python fall into.

Not Knowing the Python Version

This is a recurring problem in StackOverflow questions. Many write perfectly working code for one version but they have a different version of Python installed on their system.[1] Make sure that you know the Python version you‘re working with. You can check via the following:

$ python --version
Python 2.7.9

Not using Pyenv

pyenv is a great tool for managing different Python versions. Unfortunately, it only works on *nix systems. On Mac OS, one can simply install it via brew install pyenv and on Linux, there is an automatic installer.

Obsessing Over One-Liners

Some get a real kick out of one liners. Many boast about their one-liner solutions even if they are less efficient than a multi-line solution.

What this essentially means in Python is convoluted comprehensions having multiple expressions. For example:

l = [m for a, b in zip(this, that) if b.method(a) != b for m in b if not m.method(a, b) and reduce(lambda x, y: a + y.method(), (m, a, b))]

To be perfectly honest, I made the above example up. But, I‘ve seen plenty of people write code like it. Code like this will make no sense in a week‘s time. If you‘re trying to do something a little more complex that simply adding an item to a list or set with a condition then you‘re probably making a mistake.

One-Liners are not achievements, yes they can seem very clever but they are not achievements. Its like thinking that shoving everything into your closet is an actual attempt at cleaning your room. Good code is clean, easy to read and efficient.

Initializing a set the Wrong Way

This is a more subtle problem that can catch you off guard. set comprehensions are a lot like list comprehensions.

>>> { n for n in range(10) if n % 2 == 0 }
{0, 8, 2, 4, 6}
>>> type({ n for n in range(10) if n % 2 == 0 })
<class ‘set‘>

The above is one such example of a set comprehension. Sets are like lists in that they are containers. The difference is that a set cannot have any duplicate values and sets are unordered. Seeing set comprehensions people often make the mistake of thinking that {} initializes an empty set. It does not, it initializes an empty dict.

>>> {}
{}
>>> type({})
<class ‘dict‘>

If we wish to initialize an empty set, then we simply call set().

>>> set()
set()
>>> type(set())
<class ‘set‘>

Note how an empty set is denoted as set() but a set containing something is denoted as items surrounded by curly braces.

>>> s = set()
>>> s
set()
>>> s.add(1)
>>> s
{1}
>>> s.add(2)
>>> s
{1, 2}

This is rather counter intuitive, since you‘d expect something like set([1, 2]).

Misunderstanding the GIL

The GIL (Global Interpreter Lock) means that only one thread in a Python program can be running at any one time. This implies that when we create a thread and expect to run in parallel it doesn‘t. What the Python interpreter is actually doing is quickly switching between different running threads. But this is an oversimplified version of what is actually happening. There are many instances in which things do run in parallel, like when using libraries that are essentially C extensions. But when running Python code, you don‘t get parallel execution most of the time. In other words, threads in Python are not like Threads in Java or C++.

Many will try to defend Python by saying that these are real threads.[2] This is indeed true, but does not change the fact that how Python handles threads is different from what you‘d generally expect. This is the same case for a language like Ruby (which also has an interpreter lock).

The prescribed solution to this is using the multiprocessing module. The multiprocessing module provides you with the Process class which is basically a nice cover over a fork. However, a fork is much more expensive than a thread, so you might not always see the performance benefits since now the different processes have to do a lot of work to co-ordinate with each other.

However, this problem does not exist every implementation of Python. PyPy-stm for example is an implementation of Python that tries to get rid of the GIL (still not stable yet). Implementations built on top of other platforms like the JVM (Jython) or CLR (IronPython) do not have GIL problems.

All in all, be careful when using the Thread class, what you get might not be what you expect.

Using Old Style Classes

In Python 2 there are two types of classes, there‘s the "old style" classes, and there‘s the "new style" classes. If you‘re using Python 3, then you‘re using the "new style" classes by default. In order to make sure that you‘re using "new style" classes in Python 2, you need to inherit from object for any new class you create that isn‘t already inheriting from a builtin like int or list. In other words, your base class, the class that isn‘t inheriting from anything else, should always inherit from object.

class MyNewObject(object):
    # stuff here

These "new style" classes fix some very fundamental flaws in the old style classes that we really don‘t need to get into. However, if anyone is interested they can find the information in the related documentation.

Iterating the Wrong Way

Its very common to see the following code from users who are relatively new to the language:

for name_index in range(len(names)):
    print(names[name_index])

There is no need to call len in the above example, since iterating over the list is actually much simpler:

for name in names:
    print(name)

Furthermore, there are a whole host of other tools at your disposal to make iteration easier. For example, zip can be used to iterate over two lists at once:[3]

for cat, dog in zip(cats, dogs):
    print(cat, dog)

If we want to take into consideration both the index and the value list variable, we can use enumerate:[4]

for index, cat in enumerate(cats):
    print(cat, index)

There are also many useful functions to choose from in itertools. Please note however, that using itertools functions is not always the right choice. If one of the functions in itertools offers a very convenient solution to the problem you‘re trying to solve, like flattening a list or creating a getting the permutations of the contents of a given list, then go for it. But don‘t try to fit it into some part of your code just because you want to.

The problem with itertools abuse happens so often that one highly respected Python contributor on StackOverflow has dedicated a significant part of their profile to it.[5]

Using Mutable Default Arguments

I‘ve seen the following quite a lot:

def foo(a, b, c=[]):
    # append to c
    # do some more stuff

Never use mutable default arguments, instead use the following:

def foo(a, b, c=None):
    if c is None:
        c = []
    # append to c
    # do some more stuff

Instead of explaining what the problem is, its better to show the effects of using mutable default arguments:

In[2]: def foo(a, b, c=[]):
...     c.append(a)
...     c.append(b)
...     print(c)
...
In[3]: foo(1, 1)
[1, 1]
In[4]: foo(1, 1)
[1, 1, 1, 1]
In[5]: foo(1, 1)
[1, 1, 1, 1, 1, 1]

The same c is being referenced again and again every time the function is called. This can have some very unwanted consequences.

Takeaway

These are just some of the problems that one might run into when relatively new at Python. Please note however, that this is far from a comprehensive list of the problems that one might run into. The other pitfalls however are largely to do with people using Python like Java or C++ and trying to use Python in a way that they are familiar with. So, as a continuation of this, try diving into things like Python‘s super function. Take a look at classmethodstaticmethod and __slots__.

Update

Last Updated on 12 May 2015 4:50 PM (GMT +6)


[1] Most people are taught Python using Python 2. However, when they go home and try things out themselves, they install Python 3 (quite a natural thing to install the latest version).
[2] When people talk about real threads what they essentially mean is that these threads are real CPU threads, which are scheduled by the OS (Operating System).
[3] https://docs.python.org/3/library/functions.html#zip
[4] enumerate can be further configured to produce the kind of index you want.https://docs.python.org/3/library/functions.html#enumerate
[5] http://stackoverflow.com/users/908494/abarnert
时间: 2024-10-04 04:40:23

Potential Pythonic Pitfalls的相关文章

SPA UI-router

------------------------------------------------------------------------------------ SPA SPA(单页面应用):A single-page application (SPA) is a web application or web site that fits on a single web page with the goal of providing a user experience similar t

【转载】COMMON PITFALLS IN MACHINE LEARNING

COMMON PITFALLS IN MACHINE LEARNING JANUARY 6, 2015 DN 3 COMMENTS Over the past few years I have worked on numerous different machine learning problems. Along the way I have fallen foul of many sometimes subtle and sometimes not so subtle pitfalls wh

Visibility Graph Analysis of Geophysical Time Series: Potentials and Possible Pitfalls

Tasks: invest papers  3 篇. 研究主动权在我手里.  I have to.  1. the benefit of complex network: complex network theory has been particularly successful in providing unifying统一的 concepts and methods for understanding the structure and dynamics of complex system

dwr的A request has been denied as a potential CSRF attack.错误

虽然DWR是个很早就出现的Ajax框架,但一直都没去使用过,今天正好没事就看了一下并参照文档照做了个demo, 在其中碰到一个问题: 后台打印出错误信息:"严重: A request has been denied as a potential CSRF attack." 在网上google一把 之后,出现此错误的原因大都是说"请求被拒绝,可能存在csrf(cross-site request forgeries,跨站请求伪造)攻击. 页面URL可能被跨站了的服务所调用之类的

机器人学 —— 轨迹规划(Artificial Potential)

今天终于完成了机器人轨迹规划的最后一次课了,拜拜自带B - BOX 的 Prof. TJ Taylor. 最后一节课的内容是利用势场来进行轨迹规划.此方法的思路非常清晰,针对Configration Space 里面的障碍物进行 DT变换,用DT变换值作为罚函数的输入,让机器人尽可能的远离障碍物,同时再终点设计抛物面函数,让机器人有向终点靠近的趋势.最后所获得的就是机器人的一种可行运动轨迹.由于此轨迹是梯度下降的,并且罚函数是连续的,所以如果机器人不陷入局部最优,那么就可以获得全局最优路径(我本

Pythonic论坛怪怪的’居民’显示[已解决]

之前使用Pythonic搭建的论坛people界面显示有点问题 第一个用户不显示,问了下作者,作者回复说这个Link只有~/people/而没有用户名 就点到为止了. 按说我使用syncdb同步时注册的管理员应该显示在最前面的.但是却不是这样.前段时间也挺忙的,就先放一放了. ------ 今天继续顺着url读源码.people的view还有people_list模板还是挺易读的,似乎没有什么问题. 于是想看看数据库中的用户是什么样子的. 这CMD的显示我也是醉了…上网找了一圈,发现个Navic

python gui之tkinter界面设计pythonic设计

ui的设计,控件id的记录是一件比较繁琐的事情. 此外,赋值和读取数据也比较繁琐,非常不pythonic. 有没有神马办法优雅一点呢?life is short. 鉴于控件有name属性,通过dir(Entry_obj)得知,存放在一个_name的属性里面.于是就有了以下代码: Entry(frame,name='your_id1').grid(row=x1,column=y1) Entry(frame,name='your_id2').grid(row=x2,column=y2) ... En

为Pythonic论坛添加一个“专题”功能(续)

上篇博文<为Pythonic论坛添加一个“专题”功能>,在模板的层次上对发帖进行了限制.也就是根据用户是否拥有权限来决定是否显示发帖框. 但是自从这么“投机取巧”的写完模板后,整夜辗转反侧,不知道用户能否通过其它节点在不能够发帖的节点本地添加发帖框实现发帖. 最终,我还是觉得不靠谱…需要在服务端也进行下验证.简单的处理,终于填了坑 翻看\apps\topic\views.py文件找到def topic_create(request, node_slug):函数定义. if node.categ

字典对象的 Pythonic 用法(上篇)

字典对象在Python中作为最常用的数据结构之一,和数字.字符串.列表.元组并列为5大基本数据结构,字典中的元素通过键来存取,而非像列表一样通过偏移存取.笔者总结了字典的一些常用Pyhonic用法,这是字典的Pythonic用法的上篇 0. 使用 in/not in 检查 key 是否存在于字典 判断某个 key 是否存在于字典中时,一般初学者想到的方法是,先以列表的形式把字典所有键返回,再判断该key是否存在于键列表中: dictionary = {} keys = dictionary.ke