为Python3作准备

在开始添加Python 3的支持前，为了能够尽可能地顺利过度到Python 3，你应该通过修改对2to3来说很难苦的东西来给你的代码做一些准备。即使你现在不打算迁移到Python 3，有一些事你也可以现在就做，这些事在一些情况下它们甚至会加快你的代码在Python 2下的运行。

你可能想要读在I用现代的用句来改善你的代码 上包含许多其他一些你能够用到你的代码中的改进的章节。

在Python 2.7下运行

这个过程的第一步是让你的代码在Python 2.6或者2.7下运行。在这里你用的是什么版本不重要，但是明显最后的Python 2是很有意义的，所以如果你能用Python 2.7的话，就用吧。

大多数代码不用修改就可以直接运行，但是从Python 2.5到2.6有几个变化。在Python 2.6 as 和 with 是关键字，所以如果你使用这些作为变量就需要修改他们。最简单的办法是在变量的尾部加上下划线。

>>> with_ = True
>>> as_ = False

你还需要摆脱字符串异常。使用字串符来抛出异常已经不被推荐很长时间了，主要是因为他们非常不灵活，例如你不能继承他们。

>>> raise "Something went wrong!"
Traceback (most recent call last):
...
Something went wrong!

在Python 3字符串异常已经完全消失了。在Python 2.6中你不能发出他们，但是为了向后兼容你还是可能捕捉到他们。在一些情况下你需要在你的代码中移除所有字符串异常的使用并且在做任何事之前先让他在Python 2.6下运行。

>>> raise Exception("Something went wrong!")
Traceback (most recent call last):
...
Exception: Something went wrong!

The next step is to run your code under Python 2.6 or Python 2.7 with the-3 option. This will warn about things that are not supported in Python 3 and which 2to3 will not convert. It’s mostly ways of doing things that have long been deprecated and have newer alternative ways to be done, or modules removed from the standard library. For example the support for Classic Mac OS has been removed in Python 3, only OS X is supported now and for that reasons the modules that support specific things about Classic Mac OS have been removed.

You will get warnings for many of the changes listed below (but not all), as well as for some of the library reorganization. The library reorganization changes are simple and need no explanation, the warnings will tell you the new name of the module.

Use // instead of / when dividing integers

In Python 2 dividing two integers returns an integer. That means five divided by two will return two.

>>> 5/22

However, under Python 3 it returns two and a half.

>>> 5/22.5

Many who use the division operator in Python 2 today rely on the integer division to always return integers. But the automatic conversion with 2to3will not know what types the operands are and therefore it doesn’t know if the division operator divides integers or not. Therefore it can not do any conversion here. This means that if you are using the old integer division, your code may fail under Python 3.

Since this change has been planned already since Python 2.2, it and all later versions include a new operator, calledfloor division, written with two slashes. It always returns whole integers, even with floats. Any place in your code where you really do want to have the floor division that returns whole numbers, you should change the division operator to the floor division operator.

>>> 5//22>>> 5.0//2.02.0

Often the Python 2 integer division behavior is unwanted. The most common way to get around that problem is to convert one of the integers to a float, or to add a decimal comma to one of the numbers.

>>> 5/2.02.5>>> a = 5>>> b = 2>>> float(a)/b2.5

However, there is a neater way to do this and that is to enable the Python 3 behavior. This is done via a __future__ import also available since Python 2.2.

>>> from __future__ import division>>> 5/22.5

Although converting one of the operands to a float before division will work fine it is unnecessary in Python 3 and by using the __future__import you can avoid it.

Running Python 2.6 with the -3 option will warn you if you use the old integer division.

Use new-style classes

In Python 2 there are two types of classes, “old-style” and “new”. The “old-style” classes have been removed in Python 3, so all classes now subclass from object, even if they don’t do so explicitly.

There are many differences between new and old classes, but few of them will cause you any problems with Python 3. If you use multiple inheritance you are probably going to encounter problems because of the different method resolution orders.[4]

If you use multiple inheritance you should therefore switch to using new-style classes before adding Python 3 support. This is done by making sure all objects subclass from object, and you will probably have to change the order you list the super-classes in the class definitions.

Separate binary data and strings

In Python 2, you use str objects to hold binary data and ASCII text, while text data that needs more characters than what is available in ASCII is held inunicode objects. In Python 3, instead of str and unicode objects, you use bytes objects for binary data and str objects for all kinds of text data, Unicode or not. The str type in Python 3 is more or less the same as the unicode type in Python 2 and the bytes type is quite similar to Python 2’s str type, even though there are significant differences.

The first step in preparing for this is to make sure you don’t use the same variable name for both binary and text data. In Python 2 this will not cause you much trouble, but in Python 3 it will, so try to keep binary data and text separated as much as possible.

In Python 2 the ‘t‘ and ‘b‘ file mode flags changes how newlines are treated on some platforms, for example Windows. But the flag makes no difference on Unix, so many programs that are developed for Unix tend to ignore that flag and open binary files in text mode. However, in Python 3 the flags also determine if you get bytes or unicode objects as results when you read from the file. So make sure you really use the text and binary flags when you open a file. Although the text flag is default, add it anyway, as you then show that the text mode is intentional and not just because you forgot to add the flag.

Running Python 2.6 with the -3 option willnotwarn about this problem, as there simply is no way for Python 2 to know if the data is text or binary data.

When sorting, use key instead of cmp

In Python 2 sorting methods take a cmp parameter that should be a function that returns -1, 0 or 1 when comparing two values.

>>> def compare(a, b):...     """Comparison that ignores the first letter"""...     return cmp(a[1:], b[1:])>>> names = [‘Adam‘, ‘Donald‘, ‘John‘]>>> names.sort(cmp=compare)>>> names[‘Adam‘, ‘John‘, ‘Donald‘]

Since Python 2.4 .sort() as well as the new sorted() function (seeUse sorted() instead of .sort()) take a key parameter which should be a function that returns a sorting key.

>>> def keyfunction(item):...     """Key for comparison that ignores the first letter"""...     return item[1:]>>> names = [‘Adam‘, ‘Donald‘, ‘John‘]>>> names.sort(key=keyfunction)>>> names[‘Adam‘, ‘John‘, ‘Donald‘]

This is easier to use and faster to run. When using the cmp parameter, the sorting compares pairs of values, so the compare-function is called multiple times for every item. The larger the set of data, the more times the compare-function is called per item. With the key function the sorting instead keeps the key value for each item and compares those, so the key function is only called once for every item. This results in much faster sorts for large sets of data.

The key function is often so simple that you can replace it with a lambda:

>>> names = [‘Adam‘, ‘Donald‘, ‘John‘]>>> names.sort(key=lambda x: x[1:])>>> names[‘Adam‘, ‘John‘, ‘Donald‘]

Python 2.4 also introduced a reverse parameter.

>>> names = [‘Adam‘, ‘Donald‘, ‘John‘]>>> names.sort(key=lambda x: x[1:], reverse=True)>>> names[‘Donald‘, ‘John‘, ‘Adam‘]

There is one case where using key is less obvious than using cmp and that’s when you need to sort on several values. Let’s say we want the result to be sorted with the longest names first and names of the same length should be sorted alphabetically. Doing this with a key function is not immediately obvious, but the solution is usually to sort twice, with the least important sorting first.

>>> names = [‘Adam‘, ‘Donald‘, ‘John‘]>>> # Alphabetical sort>>> names.sort()>>> # Long names should go first>>> names.sort(key=lambda x: len(x), reverse=True)>>> names[‘Donald‘, ‘Adam‘, ‘John‘]

This works because since Python 2.3 the timsort sorting algorithm is used[1]. It’s a stable algorithm, meaning that if two items are sorted as equal it will preserve the order of those items.

You can also make a key function that returns a value that combines the two keys and sort in one go. This is surprisingly not always faster, you will have to test which solution is faster in your case, it depends on both the data and the key function.

>>> def keyfunction(item):...     """Sorting on descending length and alphabetically"""...     return -len(item), item>>> names = [‘Adam‘, ‘Donald‘, ‘John‘]>>> names.sort(key=keyfunction)>>> names[‘Donald‘, ‘Adam‘, ‘John‘]

The key parameter was introduced in Python 2.4, so if you need to support Python 2.3 you can’t use it. If you need to do a lot of sorting using the key function, the best thing is to implement a simple sorted() function for Python 2.3 and use that conditionally instead of the sorted() builtin in with Python 2.4 and later.

>>> import sys>>> if sys.version_info < (2, 4):...    def sorted(data, key):...        mapping = {}...        for x in data:...            mapping[key(x)] = x...        keys = mapping.keys()...        keys.sort()...        return [mapping[x] for x in keys]>>> data = [‘ant‘, ‘Aardvark‘, ‘banana‘, ‘Dingo‘]>>> sorted(data, key=str.lower)[‘Aardvark‘, ‘ant‘, ‘banana‘, ‘Dingo‘]

Python 2.4 is over five years old now, so it is quite unlikely that you would need to support Python 2.3.

Warning

Running Python with the -3 option will only warn you if you use thecmp parameter explicitly:

>>> l.sort(cmp=cmpfunction)__main__:1: DeprecationWarning: the cmp argument is notsupported in 3.x

But it will not warn if you use it like this:

>>> l.sort(cmpfunction)

So this syntax may slip through. In these cases you get a TypeError: mustuse keyword argument for key function when running the code under Python 3.

In Python 2.7 and Python 3.2 and later there is a function that will convert a comparison function to a key function via a wrapper class. It is very clever, but will make the compare function even slower, so use this only as a last resort.

>>> from functools import cmp_to_key>>> def compare(a, b): return cmp(a[1:], b[1:])>>> sorted([‘Adam‘, ‘Donald‘, ‘John‘], key=cmp_to_key(compare))[‘Adam‘, ‘John‘, ‘Donald‘]

Use rich comparison operators

In Python 2 the most common way to support comparison and sorting of your objects is to implement a __cmp__() method that in turn uses the builtincmp() function, like this class that will sort according to lastname:

>>> class Orderable(object):......     def __init__(self, firstname, lastname):...         self.first = firstname...         self.last = lastname......     def __cmp__(self, other):...         return cmp("%s, %s" % (self.last, self.first),...                    "%s, %s" % (other.last, other.first))......     def __repr__(self):...         return "%s %s" % (self.first, self.last)...>>> sorted([Orderable(‘Donald‘, ‘Duck‘),...         Orderable(‘Paul‘, ‘Anka‘)])[Paul Anka, Donald Duck]

However, you can have objects, for example colors, that are neither “less than” nor “greater than”, but still can be “equal” or “not equal”, so since Python 2.1 there has also been support for rich comparison methods where each method corresponds to one comparison operator. They are __lt__ for <,__le__ for <=, __eq__ for ==, __ne__ for !=, __gt__for > and __ge__ for >=.

Having both the rich comparison methods and the __cmp__() method violates the principle that there should be only one obvious way to do it, so in Python 3 the support for __cmp__() has been removed. For Python 3 you therefore must implement all of the rich comparison operators if you want your objects to be comparable. You don’t have to do this before supporting Python 3 but doing so makes the experience a bit smoother.

Comparatively tricky

Making comparison methods can be quite tricky, since you also need to handle comparing different types. The comparison methods should return theNotImplemented constant if it doesn’t know how to compare with the other object. Returning NotImplemented works as a flag for Pythons comparisons that makes Python try the reverse comparison. So if your __lt__() method returns NotImplemented then Python will try to ask the other objects__gt__() method instead.

Attention

This means that you should never in your rich comparison methods call the other objects comparison operator! You’ll find several examples of rich comparison helpers that will convert a greater than call likeself.__gt__(other) into return other < self. But then you are calling other.__lt__(self) and if it returns NotImplemented then Python will try self.__gt__(other) again and you get infinite recursion!

Implementing a good set of rich comparison operators that behave properly in all cases is not difficult once you understand all the cases, but getting to grips with that is not entirely trivial. You can do it in many different ways, my preferred way is this mixin, which works equally well in Python 2 and Python 3.

class ComparableMixin(object):
    def _compare(self, other, method):
        try:
            return method(self._cmpkey(), other._cmpkey())
        except (AttributeError, TypeError):
            # _cmpkey not implemented, or return different type,
            # so I can‘t compare with "other".
            return NotImplemented

    def __lt__(self, other):
        return self._compare(other, lambda s, o: s < o)

    def __le__(self, other):
        return self._compare(other, lambda s, o: s <= o)

    def __eq__(self, other):
        return self._compare(other, lambda s, o: s == o)

    def __ge__(self, other):
        return self._compare(other, lambda s, o: s >= o)

    def __gt__(self, other):
        return self._compare(other, lambda s, o: s > o)

    def __ne__(self, other):
        return self._compare(other, lambda s, o: s != o)

The previously mentioned functools.total_ordering() class decorator from Python 3.2 is a nice solution as well, and it can be copied and used in other Python versions as well. But since it uses class decorators it will not work in versions below Python 2.6.

To use the mixin above you need to implement a _cmpkey() method that returns a key of objects that can be compared, similar to the key()function used when sorting. The implementation could look like this:

>>> from mixin import ComparableMixin>>> class Orderable(ComparableMixin):......     def __init__(self, firstname, lastname):...         self.first = firstname...         self.last = lastname......     def _cmpkey(self):...         return (self.last, self.first)......     def __repr__(self):...         return "%s %s" % (self.first, self.last)...>>> sorted([Orderable(‘Donald‘, ‘Duck‘),...         Orderable(‘Paul‘, ‘Anka‘)])[Paul Anka, Donald Duck]

The above mixin will return NotImplemented if the object compared with does not implement a _cmpkey() method, or if that method returns something that isn’t comparable with the value that self._cmpkey()returns. This means that every object that has a _cmpkey() that returns a tuple will be comparable with all other objects that also have a _cmpkey()that returns a tuple and most importantly, if it isn’t comparable, the operators will fall back to asking the other object if it knows how to compare the two objects. This way you have an object which has the maximum chance of meaningful comparisons.

Implementing hash()

In Python 2, if you implement __eq__() you should also override__hash__(). This is because two objects that compare equal should also have the same hash-value. If the object is mutable, you should set__hash__ to None, to mark it as mutable. This will mean you can’t use it as a key in dictionaries for example, and that’s good, only immutable objects should be dictionary keys.

In Python 3, __hash__ will be set to None automatically if you define__eq__(), and the object will become unhashable, so for Python 3 you don’t need to override __hash__() unless it is an immutable object and you want to be able to use it as a key value.

The value returned by __hash__() needs to be an integer, and two objects that compare equal should have the same hash value. It must stay the same over the whole lifetime of the object, which is why mutable objects should set __hash__ = None to mark them as unhashable.

If you are using the _cmpkey() method of implementing comparison operators mentioned above, then implementing __hash__() is very easy:

>>> from mixin import ComparableMixin>>> class Hashable(ComparableMixin):...     def __init__(self, firstname, lastname):...         self._first = firstname...         self._last = lastname......     def _cmpkey(self):...         return (self._last, self._first)......     def __repr__(self):...         return "%s(%r, %r)" % (self.__class__.__name__,...                                self._first, self._last)......     def __hash__(self):...         return hash(self._cmpkey())...>>> d = {Hashable(‘Donald‘, ‘Duck‘): ‘Daisy Duck‘}>>> d{Hashable(‘Donald‘, ‘Duck‘): ‘Daisy Duck‘}

The attributes of this class are marked as internal by the convention of using a leading underscore, but they are not strictly speaking immutable. If you want a truly immutable class in Python the easiest way is subclassingcollections.namedtuple, but that is out of scope for this book.

Make sure you aren’t using any removed modules

Many of the modules in the standard library have been dropped from Python 3. Most of them are specific to old operating systems that aren’t supported any more and others have been supplanted by new modules with a better interface.

Running Python 2.6 with the -3 option will warn you if you use some of the more commonly used modules. It’s quite unlikely that you are using any of the modules that Python 2.6 will not warn about, but if you are and you are planning to support both Python 2 and Python 3, you should replace them with their modern counterparts, if any.

See Removed modules for a list of the removed modules.

Testing coverage and tox

Having a good set of tests is always valuable for any project. When you add Python 3 support, having tests is going to speed up the process a lot, because you will need to run the tests over and over and testing an application by hand takes a lot of time.

It’s always a good idea to increase the test coverage with more tests. The most popular Python tool for getting a report on the test coverage of your modules is Ned Batchelder’s coverage module.[2] Many test runner frameworks like zope.testing, nose and py.test include support for thecoverage module, so you may have it installed already.

If you are developing a module that supports many versions of Python, running the tests for all these versions quickly becomes a chore. To solve this Holger Krekel has created a tool called tox[3] that will install a virtualenv for each version you want to support, and will run your tests with all these versions with one simple command. It seems like a small thing, and it is, but it makes the experience just a little bit more pleasant. If you plan to support both Python 2 and Python 3 you should try it out.

Optional: Use the iterator-methods on dictionaries

Since Python 2.2 the built-in Python dictionary type has had the methodsiterkeys(), itervalues() and iteritems(). They yield the same data as keys(), values() and items() do, but instead of returning lists they return iterators, which saves memory and time when using large dictionaries.

>>> dict = {‘Adam‘: ‘Eve‘, ‘John‘: ‘Yoko‘, ‘Donald‘: ‘Daisy‘}>>> dict.keys()[‘Donald‘, ‘John‘, ‘Adam‘]>>> dict.iterkeys() <dictionary-keyiterator object at 0x...>

In Python 3 the standard keys(), values() and items() return dictionary views, which are very similar to the iterators of Python 2. As there is no longer any need for the iterator variations of these methods they have been removed.

2to3 will convert the usage of the iterator methods to the standard methods. By explicitly using the iterator methods you make it clear that you don’t need a list, which is helpful for the 2to3conversion, which otherwise will replace your dict.values() call with alist(dict.values()) just to be safe.

Python 2.7 also has the new view iterators available on dictionaries as.viewitems(), .viewkeys() and .viewvalues(), but since they don’t exist in earlier Python versions they are only useful once you can drop support for Python 2.6 and earlier.

Also note that if your code is relying on lists being returned, then you are probably misusing the dictionary somehow. For example, in the code below, you can’t actually rely on the order of the keys being the same every time, with the result that you can’t predict exactly how the code will behave. This can lead to some troublesome debugging.

>>> dict = {‘Adam‘: ‘Eve‘, ‘John‘: ‘Yoko‘, ‘Donald‘: ‘Daisy‘}>>> dict.keys()[0]‘Donald‘

Remember, if all you want to do is loop over the dictionary, use for x indict and you will use iterators automatically in both Python 2 and Python 3.

>>> dict = {‘Adam‘: ‘Eve‘, ‘John‘: ‘Yoko‘, ‘Donald‘: ‘Daisy‘}>>> for x in dict:...     print ‘%s + %s == True‘ % (x, dict[x])Donald + Daisy == TrueJohn + Yoko == TrueAdam + Eve == True

Footnotes

[1]	http://en.wikipedia.org/wiki/Timsort

[2]	https://pypi.python.org/pypi/coverage

[3]	http://testrun.org/tox/latest/

[4]	See http://www.python.org/download/releases/2.2.3/descrintro/#mro

在湖闻樟注：原文http://python3porting.com/preparing.html

时间： 2024-08-08 11:53:07

Supporting Python 3(支持python3)——为Python 3做准备

为Python3作准备

在Python 2.7下运行

Use // instead of / when dividing integers

Use new-style classes

Separate binary data and strings

When sorting, use key instead of cmp

Use rich comparison operators

Comparatively tricky

Implementing hash()

Make sure you aren’t using any removed modules

Testing coverage and tox

Optional: Use the iterator-methods on dictionaries

Supporting Python 3(支持python3)——为Python 3做准备的相关文章

Supporting Python 3(支持python3)——使用你自己的固定器扩展2to3

Supporting Python 3(支持python3)——关于本书

Supporting Python 3(支持python3)——使用现代的风格改善你的代码

Supporting Python 3(支持python3)——迁移策略

Supporting Python 3(支持python3)——2to3

Supporting Python 3(支持python3)——欢迎来到Python 3

Supporting Python 3(支持python3)——前言

让python cookie支持特殊字符

debian6 更新python版本到python3.3

Supporting Python 3(支持python3)——为Python 3做准备

为Python3作准备

在Python 2.7下运行

Use // instead of / when dividing integers

Use new-style classes

Separate binary data and strings

When sorting, use key instead of cmp

Use rich comparison operators

Comparatively tricky

Implementing __hash__()

Make sure you aren’t using any removed modules

Testing coverage and tox

Optional: Use the iterator-methods on dictionaries

Supporting Python 3(支持python3)——为Python 3做准备的相关文章

Implementing hash()