Python的数据处理学习(二)

本文参考Paul Barry所著的《Head First Python》一书,参考代码均可由http://python.itcarlow.ie/站点下载。本文若有任何谬误希望不吝赐教~

二. 代码模块

1. 准备学习

(1)数据读取

with open(james.txt) as jaf: #打开文件

data = jaf.readline() #读数据行

james =data.strip().split(‘,‘) #将数据转换为列表

说明：data.strip().split(‘,‘)叫做方法串链，strip()应用到data中的数据行，去除字符串中所有的空白符，处理后的结果由第二个方法split(‘,‘)处理，split(‘,‘)表示将结果以,形式分割开，返回列表。

(2)数据清理

定义函数sanitize()，将各个选手成绩的列表格式统一为mins.secs格式

def sanitize(time_string):

if ‘-‘ in time_string:

splitter = ‘-‘

if ‘:‘ in time_string:

splitter = ‘:‘

else:

return(time_string)

(mins,secs) = time_string.split(splitter)

return(mins + ‘.‘ + secs)

说明：split是内置函数，表示字符串的分解

(3) 转换列表---推导列表

分别举例普通列表转换方法与利用推导列表的方式：

clean_mikey = [] #列表创建

for each_t in mikey: #迭代

clean_mikey.append(sanitize(each_t)) #转换与追加

等价于

clean_mikey = [sanitize(each_t) for each_t in mikey]

说明：sanitize()为自定义的一个数据清理函数,内置函数sorted是对整个列表排序

(4) 删除重复数据--not in

列表操作方法：

unique_james = []

for each_t in james:

if each_t not in unique_james:

unique_james.append(each_t)

集合操作方法：(python集合突出特点，就是集合中数据项的无序性，且不允许重复)

示例：

distances = set(james)

(5)“分片”，访问列表中多个列表项

print(sorted(set([sanitize(t)] for t in james]))[0:3])

(6)将多个重复代码改为函数

def get_coach_data(filename):

try:

with open(filename) as af:

return(data.strip().split(‘,‘))

except IOError as ioerr:

print(‘File error:‘ + str(ioerr))

return(None)

2. 定制数据对象

(1)新数据格式,James2.txt,Julie2.txt,Mikey2.txt,Sarah2.txt，文件分别打开如下：(全名, 出生日期, 训练成绩)

James Lee,2002-3-14,2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22,2-01,2.01,2:16

Julie Jones,2002-8-17,2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21,3.01,3.02,2:59

Sarah Sweeney,2002-6-17,2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55,2:22,2-21,2.22

Mikey McManus,2002-2-24,2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38,2:40,2.22,2-31

(2)数据抽取：(以Sarah为例)

Sarah = get_coach_data(‘sarah2.txt‘)

(sarah_name,sarah_dob) = sarah.pop(0),sarah.pop(0)

pop(0)调用将删除并返回列表最前面的数据项，并赋值给指定变量姓名和出生日期

(3)使用字典关联数据，字典是一种内置的数据结构，允许将数据和键而不是数字关联，这样可以使内存中的数据与实际数据的结构保持一致。

比如，键关联的数据

Name ——> Sarah Sweeney

DOB ——> 2002-6-17

Times ——> 2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55,2:22,2-21,2.22

创建字典的方式：

大括号创建：cleese = {}

工厂函数： palin = dict()

增加数据两种方式：

cleese[‘Name‘] = ‘John Cleese‘

palin = {‘Name‘: ‘Michael Palin‘}

(4)应用：

sarah_data = {}

sarah_data[‘Name‘] = sarah.pop(0)

sarah_data[‘DOB‘] = sarah.pop(0)

sarah_data[‘Times‘] = sarah

print(sarah_data[‘Name‘ + "‘s fastest times are: " + str(sorted(set[sanitize(t) for t in sarah_data[‘Times‘]]))[0:3]))

(5)一次性完成字典的创建，并返回字典

def get_coach_data(filename):

try:

with open(filename) as f:

data = f.readline()

templ = data.strip().split(‘,‘)

return({‘Name‘:templ.pop(0),

‘DOB‘:templ.pop(0),

‘Times‘:str(sorted(set([sanitize(t) for t in templ]))[0:3])})

except IOError as ioerr:

print(‘File error:‘ + str(ioerr))

return(None)

(6)将代码及其数据打包在类中

class Athlete:

def __init__(self,a_name,a_dob,a_times=[]):

self.name = a_name

self.dob=a_dob

self.times=a_times

def top3(self):

return(sorted(set([sanitize(t) for t in self.times]))[0:3])

def get_coach_data(filename):

try:

with open(filename) as f:

data = f.readline()

templ = data.strip().split(‘,‘)

return(Athlete(templ.pop(0),templ.pop(0),templ)

except IOError as ioerr:

print(‘File error:‘ + str(ioerr))

return(None)

(7)类调用与结果输出

james = get_coach_data(‘james2.txt‘)

结果输出:

James Lee‘s fastest times are: [‘2.01‘,‘2.16‘,‘2.22‘]

下一节课讲类的继承

时间： 2024-10-10 17:33:07

Python的数据处理学习(二)的相关文章

Python 源码学习二（SocketServer）

SocketServer这个模块中定义的类比较多,但是设计比较清晰,我们以TCPServer为主线分析,先脉络再细节. 总体脉络将相关类分为两组,如图: 服务器相关(上) BaseServer是server基础类,定义server的基本处理运行与request处理机制,TCPServer直接继承它. Request处理类RequestHandler(下) BaseRequestHandler是request处理的基础类,TCPServer的request处理类StreamRequestHand

Python的数据处理学习(一)

本文参考Paul Barry所著的<Head First Python>一书,参考代码均可由http://python.itcarlow.ie/站点下载本文有任何谬误可以直接联系本人[email protected] 一. 背景了解 1. 基本需求: Kelly教练负责James,Sarah,Julie,Mikey四人的训练,细心的Kelly教练为每个选手建立了以姓名命名的txt文件来保存训练成绩,但是Kelly教练本来就忙于训练同学,对于数据管理更是焦头烂额,需要一种快捷的方法迅速了解每

【Rollo的Python之路】Python 爬虫系统学习 (二) Requests 模块学习

Requests模块学习: 1.0 Requests 初识 Requests 模块是一个第三方的库,首先我们要安装Requests.用pip安装,先看一下pip是哪个python 的版本. pip --version 然后用pip安装就OK pip install requests 开始要导入 Requests 模块 import requests 然后我们试一下: import requests results = requests.get('https://www.baidu.com')

Python OS模块学习(二)

4.进程的相关处理 system( )给当前进程输入系统shell命令 import osif os.name == "nt": command = "dir"else: command = "ls -l" os.system(command) execvp 开始一个新进程, 以取代目前进程 import osimport sysprogram = "python"arguments = [&qu

Python Request库学习(二）

1.文件上传 Requests除了Get方法外,还可以使用Post方法.如果网站允许上传文件,则可以使用此方法来上传文件. 示例: 主要是使用requests的files参数来完成. file.py import requests files = {'file' : open('Top250.txt','rb')} upload_file = requests.post('http://192.168.137.128/upload_file.php',files= files) print(up

python 中cPickle学习二

写入: import cPickle as p shoplistfile = 'data.data' shoplist = ['meili',['current_account',[100000,1222],'basis_account',[5555555,888]], 'qinshan',['current_account',[1089000,12292],'basis_account',[55555955,888]], 'jiayou',['current_account',[10000,1

python数据处理技巧二

python数据处理技巧二(掌控时间) 首先简单说下关于时间的介绍其中重点是时间戳的处理,时间戳是指格林威治时间1970年01月01日00时00分00秒(北京时间1970年01月01日08时00分00秒)起至现在的总秒数.这里这个知识只做了解,接下来会用python三个关于时间的模块来定位时间,计算时间等. 首先让我们来验证下时间戳及怎么换算时间戳 1.要使用time方法首先要导入方法包import time 2.获取当前时间戳的方法是print time.time()就可以得到当前执行这个方法

[Python 学习] 二、在Linux平台上使用Python

这一节,主要介绍在Linux平台上如何使用Python 1. Python安装. 现在大部分的发行版本都是自带Python的,所以可以不用安装.如果要安装的话,可以使用对应的系统安装指令. Fedora系统:先以root登入,运行 yum install python Ubuntu系统:在root组的用户, 运行 sudo apt-get install python 2. 使用的Python的脚本 Linux是一个以文件为单位的系统,那么我们使用的Python是哪一个文件呢? 这个可以通过指令

OpenCV for Python 学习 (二事件与回调函数)

今天主要看了OpenCV中的事件以及回调函数,这么说可能不准确,主要是下面这两个函数(OpenCV中还有很多这些函数,可以在 http://docs.opencv.org/trunk/modules/highgui/doc/user_interface.html 找到,就不一一列举了),然后自己做了一个简单的绘图程序函数如下: cv2.setMouseCallback(windowName, onMouse[, param]) cv2.createTrackbar(trackbarName,