scrapy框架需要在Python 2.7,lxml,OpenSSL,pip程序或库的基础之上创建。
因此 ,首先,先检查下是否包含上述四个程序或库
[email protected]:~$ python Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>
可以看到Ubuntu16.04已经预置了python 2.7
接着查看是否还有lxml
>>> import lxml Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named lxml >>> >>> import OpenSSL >>>
发现没有lxml,有OpenSSL,于是先安装lxml
[email protected]:~$ sudo apt-get install python-lxml
重新进入
[email protected]:~$ python Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import lxml >>>
没有报错。为保证后续正确安装,需依次执行
[email protected]:~$ sudo apt-get install python-dev
[email protected]:~$ sudo apt-get install libevent-dev
[email protected]:~$ sudo apt-get install python-pip
[email protected]:~$ sudo pip install --upgrade pip
最后执行
[email protected]:~$ pip install Scrapy
即可完成Scrapy的安装,查看下scrapy相关命令
[email protected]:~$ scrapy
然后可以创建一个新的项目
[email protected]:~$ scrapy startproject newproject New Scrapy project ‘newproject‘, using template directory ‘/usr/local/lib/python2.7/dist-packages/scrapy/templates/project‘, created in: /home/kuku/newproject You can start your first spider with: cd newproject scrapy genspider example example.com [email protected]:~$ [email protected]:~$ sudo apt install tree
使用tree看下newproject文件目录
[email protected]:~$ tree newproject/
接下来可以根据自己需要对里面文件进行相应的编辑,以满足自己需求。
时间: 2024-10-21 09:28:40