【Python】从0开始写爬虫——开发环境

  

  python小白,稍微看了点语法而已, 连字典的切片都永不顺的那种。本身是写java的,其实java也写得菜, 每天下了班不是太想写java。所以下班总是乱搞,什么都涉猎一点,也没什么太实际的收获。现在打算慢慢写个python爬虫玩

  1. python环境搭建。我在windows上也是搭了python环境的,很久了。但是这个我在windows用pip安装的第三方库用起来总是报错。所以我一般都不用。我时用pycharm的python环境的。

   在pycharm上安装需要的包,新建项目后,在左上角 File ->> Settings,然后弹出如下界面。点击红色箭头处添加,然后搜索就行了。不推荐自己在windows装,没必要浪费时间搞windows的环境

  

  2. linux上,我租的阿里服务器,装的是CentOS7, linux上安装python3我就不介绍了。主要提醒一下CentOS是自带python2.7的,而且有一些功能是要用的这个版本的python,比如yum, 所以不要轻易卸载。

   我安装的python3。在控制台输入 python2 就进入python2.7的shell, 输入python3就进入python3的shell。如下

[[email protected] ~]# python2
Python 2.7.5 (default, Jul 13 2018, 13:06:57)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> print ‘hello, world‘
hello, world
>>>
[1]+  Stopped                 python2

[[email protected] ~]# python3
Python 3.6.2 (default, Jul  8 2018, 11:17:50)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> print(‘Hello, World‘)
Hello, World
>>> 

但是在用 pip 安装第三方库的时候,只有python2能用。比如我安装个pandas。

[[email protected] ~]# pip install pandas
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting pandas
  Downloading http://mirrors.aliyun.com/pypi/packages/65/b2/8c3a7fc10f581d0ef196e54ba13248e09b25012ab3b213cda83f8f5e7678/pandas-0.23.3-cp27-cp27mu-manylinux1_x86_64.whl (8.9MB)
    100% |████████████████████████████████| 8.9MB 75.9MB/s
Collecting pytz>=2011k (from pandas)
  Downloading http://mirrors.aliyun.com/pypi/packages/30/4e/27c34b62430286c6d59177a0842ed90dc789ce5d1ed740887653b898779a/pytz-2018.5-py2.py3-none-any.whl (510kB)
    100% |████████████████████████████████| 512kB 81.3MB/s
Collecting numpy>=1.9.0 (from pandas)
  Downloading http://mirrors.aliyun.com/pypi/packages/85/51/ba4564ded90e093dbb6adfc3e21f99ae953d9ad56477e1b0d4a93bacf7d3/numpy-1.15.0-cp27-cp27mu-manylinux1_x86_64.whl (13.8MB)
    100% |████████████████████████████████| 13.8MB 75.1MB/s
Collecting python-dateutil>=2.5.0 (from pandas)
  Downloading http://mirrors.aliyun.com/pypi/packages/cf/f5/af2b09c957ace60dcfac112b669c45c8c97e32f94aa8b56da4c6d1682825/python_dateutil-2.7.3-py2.py3-none-any.whl (211kB)
    100% |████████████████████████████████| 215kB 85.7MB/s
Requirement already satisfied: six>=1.5 in /usr/lib/python2.7/site-packages (from python-dateutil>=2.5.0->pandas) (1.11.0)
Installing collected packages: pytz, numpy, python-dateutil, pandas
Successfully installed numpy-1.15.0 pandas-0.23.3 python-dateutil-2.7.3 pytz-2018.5

然后我分别在python2和python3去使用它, 会发现python2可以用而python3不能用

[[email protected] ~]# python2
Python 2.7.5 (default, Jul 13 2018, 13:06:57)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pandas import DataFrame
/usr/lib64/python2.7/site-packages/pandas/_libs/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
/usr/lib64/python2.7/site-packages/pandas/__init__.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (hashtable as _hashtable,
/usr/lib64/python2.7/site-packages/pandas/core/dtypes/common.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos, lib
/usr/lib64/python2.7/site-packages/pandas/core/util/hashing.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import hashing, tslib
/usr/lib64/python2.7/site-packages/pandas/core/indexes/base.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (lib, index as libindex, tslib as libts,
/usr/lib64/python2.7/site-packages/pandas/tseries/offsets.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.tslibs.offsets as liboffsets
/usr/lib64/python2.7/site-packages/pandas/core/ops.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos as libalgos, ops as libops
/usr/lib64/python2.7/site-packages/pandas/core/indexes/interval.py:32: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs.interval import (
/usr/lib64/python2.7/site-packages/pandas/core/internals.py:14: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import internals as libinternals
/usr/lib64/python2.7/site-packages/pandas/core/sparse/array.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.sparse as splib
/usr/lib64/python2.7/site-packages/pandas/core/window.py:36: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.window as _window
/usr/lib64/python2.7/site-packages/pandas/core/groupby/groupby.py:68: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (lib, reduction,
/usr/lib64/python2.7/site-packages/pandas/core/reshape/reshape.py:30: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos as _algos, reshape as _reshape
/usr/lib64/python2.7/site-packages/pandas/io/parsers.py:45: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.parsers as parsers
/usr/lib64/python2.7/site-packages/pandas/io/pytables.py:50: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos, lib, writers as libwriters
>>> data={}
>>> data[‘a‘] = [1,2,3,4,5]
>>> data[‘b‘] = [6,7,8,9,0]
>>> data[‘c‘] = [11,12,13,14,15]
>>> df = DataFrame(data)
>>> print df
   a  b   c
0  1  6  11
1  2  7  12
2  3  8  13
3  4  9  14
4  5  0  15
>>> 

[8]+  Stopped                 python2
[[email protected] ~]#
[[email protected] ~]#
[[email protected] ~]#
[[email protected] ~]#
[[email protected] ~]# python3
Python 3.6.2 (default, Jul  8 2018, 11:17:50)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pandas import DataFrame
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named ‘pandas‘
>>> 

因为pip默认用的是python2的。 所以如果我们要给python3 安装第三方库。不能直接用pip。应该用pip3.

[[email protected] ~]#
[[email protected] ~]# pip3 install pandas
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting pandas
  Downloading http://mirrors.aliyun.com/pypi/packages/f4/cb/a801eaf624e36fffaa6cf1f4597a1e4b0742c200ed928e689c58fb3cb811/pandas-0.23.3-cp36-cp36m-manylinux1_x86_64.whl (8.9MB)
    100% |████████████████████████████████| 8.9MB 73.6MB/s
Collecting pytz>=2011k (from pandas)
  Downloading http://mirrors.aliyun.com/pypi/packages/30/4e/27c34b62430286c6d59177a0842ed90dc789ce5d1ed740887653b898779a/pytz-2018.5-py2.py3-none-any.whl (510kB)
    100% |████████████████████████████████| 512kB 68.8MB/s
Collecting numpy>=1.9.0 (from pandas)
  Downloading http://mirrors.aliyun.com/pypi/packages/88/29/f4c845648ed23264e986cdc5fbab5f8eace1be5e62144ef69ccc7189461d/numpy-1.15.0-cp36-cp36m-manylinux1_x86_64.whl (13.9MB)
    100% |████████████████████████████████| 13.9MB 75.1MB/s
Collecting python-dateutil>=2.5.0 (from pandas)
  Downloading http://mirrors.aliyun.com/pypi/packages/cf/f5/af2b09c957ace60dcfac112b669c45c8c97e32f94aa8b56da4c6d1682825/python_dateutil-2.7.3-py2.py3-none-any.whl (211kB)
    100% |████████████████████████████████| 215kB 81.7MB/s
Requirement already satisfied: six>=1.5 in /usr/local/python3/lib/python3.6/site-packages (from python-dateutil>=2.5.0->pandas) (1.11.0)
Installing collected packages: pytz, numpy, python-dateutil, pandas
Successfully installed numpy-1.15.0 pandas-0.23.3 python-dateutil-2.7.3 pytz-2018.5
You are using pip version 10.0.1, however version 18.0 is available.
You should consider upgrading via the ‘pip install --upgrade pip‘ command.
[[email protected] ~]# python3
Python 3.6.2 (default, Jul  8 2018, 11:17:50)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pandas import DataFrame
/usr/local/python3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
>>> data={}
>>> data[‘b‘] = [6,7,8,9,0]
>>> data[‘b‘] = [6,7,8,9,0]
>>> data[‘c‘] = [11,12,13,14,15]
>>> df = DataFrame(data)
>>> print(df)
   b   c
0  6  11
1  7  12
2  8  13
3  9  14
4  0  15
>>> 

这样就ok了。

3. 我先安装了几个包

  bs4 用BeautifulSoup来解析html

  PyMySQL用来把数据存到数据库

4. 目前的打算是

  1. 用 urllib 来获取html数据

  2. 用 BeautifulSoup来解析html爬取要得信息。

  3. 用PyMySQL来存储数据

  4. 单页面都测试成功了考虑用线程池。放到服务器上跑个一天两天?

  5. 然后会做一点数据分析。。。emmmm这都是后话了

原文地址:https://www.cnblogs.com/yeyeck/p/9392418.html

时间: 2024-11-06 03:43:14

【Python】从0开始写爬虫——开发环境的相关文章

Visual Studio 完全AI手册 - 从0开始搭建macOS开发环境

Visual Studio 完全AI手册 - 从0开始搭建macOS开发环境 本视频配套的视频教程请访问:https://www.bilibili.com/video/av24368929/ 零.前提条件 一台能联网的电脑,使用macOS操作系统 请确保鼠标.键盘.显示器都是好的 一.工具介绍 Viusal Studio code Visual Studio Code 是微软继Visual Studio 宇宙第一IDE后出品的又一利器,是一款完全免费的文本编辑器. Visual Studio C

ArcGIS Runtime for Android开发教程V2.0(2)开发环境配置

原文地址: ArcGIS Runtime for Android开发教程V2.0(2)开发环境配置 - ArcGIS_Mobile的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/arcgis_mobile/article/details/8113948   2.开发环境配置 2.1 系统要求 1)      支持的操作系统 A.      Windows XP(32位).Vista(32/64位).Windows 7(32/64位) B.      Mac

[.net 面向对象程序设计深入](5).NET MVC 6.0 —— 构建跨平台.NET开发环境(Windows/Mac OS X/Linux)

[.net 面向对象程序设计深入](5).NET MVC 6.0 —— 构建跨平台.NET开发环境(Windows/Mac OS X/Linux) 1.关于跨平台 上篇中介绍了MVC的发展历程,说到.NET 5.0之后更名为 Core 1.0,同样MVC 6.0也是运行在Core 1.0(.NET 5.0)之下. 我们要进行开发和部署基于MVC 6.0的项目,首先要搭建他的开发环境. Core 1.0 是一个支持跨平台框架,下面分别介绍如何在Windows/Mac Os X/Linux下搭建开发

Python网络编程实战之一个人开发环境搭建

本节介绍在Debian下利用Python进行网络编程时,需要安装的一些实用的工具包. 0x01  安装开发必备软件包 $ sudo aptitude -y install build-essential     ##"-y"的作用是:在安装过程中,如果遇到Y或N的提问,一律以Yes作为默认的答案 $ sudo aptitude -y install libsqlite3-dev $ sudo aptitude -y install libreadline6-dev $ sudo apt

Python 2.7 GUI 编程集成开发环境的搭建 Python 2.7 + PyQt 4 + Eric 6 环境搭建

需求:搭建Python 2.7 GUI 编程集成开发环境 所需软件:Python 2.7 + PyQt 4 + Eric 6 步骤如下: 1.下载Pyhon2.7 32位安装包python-2.7.11.msi ,配置环境变量. https://www.python.org/ftp/python/2.7.11/python-2.7.11.msi 2.下载PyQt4 32位安装包PyQt4-4.11.4-gpl-Py2.7-Qt4.8.7-x32.exe http://sourceforge.ne

Python自动化测试 (一) Eclipse+Pydev 搭建开发环境(转)

原文:http://www.cnblogs.com/TankXiao/archive/2013/05/29/3033640.html C#之所以容易让人感兴趣,是因为安装完Visual Studio, 就可以很简单的直接写程序了,不需要做如何配置. 对新手来说,这是非常好的"初体验", 会激发初学者的自信和兴趣. 而有些语言的开发环境的配置非常麻烦, 这让新手有挫败感,没有好的"初体验",可能会对这门语言心存敬畏, 而失去兴趣. 作为一个.NET程序员, 用惯了Vi

【转】Python自动化测试 (一) Eclipse+Pydev 搭建开发环境

原文网址:http://www.cnblogs.com/TankXiao/archive/2013/05/29/3033640.html C#之所以容易让人感兴趣,是因为安装完Visual Studio, 就可以很简单的直接写程序了,不需要做如何配置. 对新手来说,这是非常好的“初体验”, 会激发初学者的自信和兴趣. 而有些语言的开发环境的配置非常麻烦, 这让新手有挫败感,没有好的“初体验”,可能会对这门语言心存敬畏, 而失去兴趣. 作为一个.NET程序员, 用惯了Visual Studio. 

python3.0语言简介及开发环境搭建

 Python(英国发音:/?pa?θ?n/ 美国发音:/?pa?θɑ?n/), 是一种面向对象的解释型计算机程序设计语言,由荷兰人Guido van Rossum于1989年发明,第一个公开发行版发行于1991年. Python是纯粹的自由软件, 源代码和解释器CPython遵循 GPL(GNU General Public License)协议 .Python语法简洁清晰,特色之一是强制用空白符(white space)作为语句缩进. Python具有丰富和强大的库.它常被昵称为胶水语言,能

【爬虫系列之一】爬虫开发环境的搭建

当前python分为2.x版本,以及3.x版本,这两个版本相互直接是不兼容的,但是当前世面的主流web或者程序还是2.x偏多,所以我这边主要是2.x版本为基础,确切地说,是2.7版本. 下面来说说 如何安装开发环境以及开发工具 一.环境的安装 MAC上安装python 如果你正在使用Mac,系统是10.13.3,系统自带了Python 2.7.这边也可以在直接在终端输入如下命令,看是否存在python以及python版本 ? ~ python --version Python 2.7.10 ?