python开源工具包:scikit-learn 是关于机器学习的开发包,主页:http://scikit-learn.org/stable/index.html
这个包把经典的机器学习算法都利用python进行了实现,是学习机器学习很好理论与实践结合材料,但是在安装scikit-learn 出现各种奇怪问题,这里做一个总结。
为了方便以后python各类工具包安装,可以先安装python easy_install
下载安装python安装工具
下载地址:http://pypi.python.org/pypi/setuptools 可以找到正确的版本进行下载。win7 32位可以下载setuptools-0.6c11.win32-py2.7.exe 。
我安装在:D:\pytho27\Scripts 下,可以个这个路径配置path ,这样方便cmd 中直接调用 ,类似下图:
检验是否安装成功如下图:
安装了easy_install 之后安装python的库就很简单了,以后需要安装python的库的话则直接在命令行使用
easy_install + python库的名字 如:easy_install numpy
scikit-learn需要以下包或者工具:
- Python (>= 2.6 or >= 3.3),
- NumPy (>= 1.6.1),
- SciPy (>= 0.9).
但是我在安装后发现出现了一下几种错误:
I
cannot import datetime from a python script,
ValueError:
numpy.ufunc has the wrong size, try recompiling
ImportError: cannot import name check_build
后面看到http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling
Numpy developers follow in general a policy of keeping a backward compatible binary interface (ABI). However, the ABI is not forward compatible.
What that means:
A package, that uses numpy in a compiled extension, is compiled against a specific version of numpy. Future version of numpy will be compatible with the compiled extension of the package (for exception see below). Distributers of those other packages do not
need to recompile their package against a newer versions of numpy and users do not need to update these other packages, when users update to a newer version of numpy.
However, this does not go in the other direction. If a package is compiled against a specific numpy version, say 1.7, then there is no guarantee that the binaries of that package will work with older numpy versions, say 1.6, and very often or most of the time
they will not.
The binary distribution of packages like pandas and statsmodels, that are compiled against a recent version of numpy, will not work when an older version of numpy is installed. Some packages, for example matplotlib, if I remember correctly, compile their extensions
against the oldest numpy version that they support. In this case, users with the same old or any more recent version of numpy can use those binaries.
The error message in the question is a typical result of binary incompatibilities.
The solution is to get a binary compatible version, either by updating numpy to at least the version against which pandas or statsmodels were compiled, or to recompile pandas and statsmodels against the older version of numpy that is already installed.
Breaking the ABI backward compatibility:
Sometimes improvements or refactorings in numpy break ABI backward compatibility. This happened (unintentionally) with numpy 1.4.0. As a consequence, users that updated numpy to 1.4.0, had binary incompatibilities with all other compiled packages, that were
compiled against a previous version of numpy. This requires that all packages with binary extensions that use numpy have to be recompiled to work with the ABI incompatible version.
大意就是我的numpy版本和scikit-learn版本不搭配,然后我卸载了numpy ,从numpy1.6 一直尝试到1.8 发现1.8安装后冲突消失。真让人蛋疼安装,推荐大家直接用集成的环境如:WinPython 之类的简单配置环境,工具帮你匹配好各种包。
Windows下的安装过程简便安装
巨硬公司真是人类的希望。在Windows下安装scikit只需要安装一个“十全大补包”(Cocoa命名)即可完成所有依赖库的安装。具体过程如下:
- 安装Python2.7.6:下载地址,如果没什么要求的话Python2就可以了。不过要注意有64位和32位的区别。
- 安装十全大补包:下载地址,包含了所有scikit所需的库,并且有分别对应Python2、Python3以及64位、32位的版本,实在是太方便了。
- 安装scikit:下载地址
- 打完收工