pandas-09 pd.groupby()的用法

在pandas中的groupby和在sql语句中的groupby有异曲同工之妙，不过也难怪，毕竟关系数据库中的存放数据的结构也是一张大表罢了，与dataframe的形式相似。

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

df = pd.read_csv('./city_weather.csv')
print(df)
'''
          date city  temperature  wind
0   03/01/2016   BJ            8     5
1   17/01/2016   BJ           12     2
2   31/01/2016   BJ           19     2
3   14/02/2016   BJ           -3     3
4   28/02/2016   BJ           19     2
5   13/03/2016   BJ            5     3
6   27/03/2016   SH           -4     4
7   10/04/2016   SH           19     3
8   24/04/2016   SH           20     3
9   08/05/2016   SH           17     3
10  22/05/2016   SH            4     2
11  05/06/2016   SH          -10     4
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5
14  17/07/2016   GZ           10     2
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
17  28/08/2016   GZ           25     4
18  11/09/2016   SZ           20     1
19  25/09/2016   SZ          -10     4
'''

g = df.groupby(df['city'])
# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x7f10450e12e8>

print(g.groups)

# {'BJ': Int64Index([0, 1, 2, 3, 4, 5], dtype='int64'),
# 'GZ': Int64Index([14, 15, 16, 17], dtype='int64'),
# 'SZ': Int64Index([18, 19], dtype='int64'),
# 'SH': Int64Index([6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')}

print(g.size()) # g.size() 可以统计每个组 成员的 数量
'''
city
BJ    6
GZ    4
SH    8
SZ    2
dtype: int64
'''

print(g.get_group('BJ')) # 得到 某个 分组
'''
         date city  temperature  wind
0  03/01/2016   BJ            8     5
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
3  14/02/2016   BJ           -3     3
4  28/02/2016   BJ           19     2
5  13/03/2016   BJ            5     3
'''

df_bj = g.get_group('BJ')
print(df_bj.mean()) # 对这个 分组 求平均
'''
temperature    10.000000
wind            2.833333
dtype: float64
'''

# 直接使用 g 对象，求平均值
print(g.mean()) # 对 每一个 分组， 都计算分组
'''
      temperature      wind
city
BJ         10.000  2.833333
GZ          8.750  4.000000
SH          4.625  3.625000
SZ          5.000  2.500000
'''

print(g.max())
'''
            date  temperature  wind
city
BJ    31/01/2016           19     5
GZ    31/07/2016           25     5
SH    27/03/2016           20     5
SZ    25/09/2016           20     4
'''

print(g.min())
'''
            date  temperature  wind
city
BJ    03/01/2016           -3     2
GZ    14/08/2016           -1     2
SH    03/07/2016          -10     2
SZ    11/09/2016          -10     1
'''

# g 对象还可以使用 for 进行循环遍历
for name, group in g:
    print(name)
    print(group)

# g 可以转化为 list类型， dict类型
print(list(g)) # 元组第一个元素是 分组的label，第二个是dataframe
'''
[('BJ',          date city  temperature  wind
0  03/01/2016   BJ            8     5
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
3  14/02/2016   BJ           -3     3
4  28/02/2016   BJ           19     2
5  13/03/2016   BJ            5     3),
('GZ',           date city  temperature  wind
14  17/07/2016   GZ           10     2
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
17  28/08/2016   GZ           25     4),
('SH',           date city  temperature  wind
6   27/03/2016   SH           -4     4
7   10/04/2016   SH           19     3
8   24/04/2016   SH           20     3
9   08/05/2016   SH           17     3
10  22/05/2016   SH            4     2
11  05/06/2016   SH          -10     4
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5),
('SZ',           date city  temperature  wind
18  11/09/2016   SZ           20     1
19  25/09/2016   SZ          -10     4)]
'''
print(dict(list(g))) # 返回键值对，值的类型是 dataframe
'''
{'SH':           date city  temperature  wind
6   27/03/2016   SH           -4     4
7   10/04/2016   SH           19     3
8   24/04/2016   SH           20     3
9   08/05/2016   SH           17     3
10  22/05/2016   SH            4     2
11  05/06/2016   SH          -10     4
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5,
'SZ':           date city  temperature  wind
18  11/09/2016   SZ           20     1
19  25/09/2016   SZ          -10     4,
'GZ':           date city  temperature  wind
14  17/07/2016   GZ           10     2
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
17  28/08/2016   GZ           25     4,
'BJ':          date city  temperature  wind
0  03/01/2016   BJ            8     5
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
3  14/02/2016   BJ           -3     3
4  28/02/2016   BJ           19     2
5  13/03/2016   BJ            5     3}
'''

原文地址：https://www.cnblogs.com/wenqiangit/p/11252765.html

时间： 2024-11-13 18:55:36

pandas-09 pd.groupby()的用法的相关文章

pandas-16 pd.merge()的用法

pandas-16 pd.merge()的用法使用过sql语言的话,一定对join,left join, right join等非常熟悉,在pandas中,merge的作用也非常类似. 如:pd.merge(df1, df2) 找到一个外键,然后将两条数据合并成一条. 直接上例子: import numpy as np import pandas as pd from pandas import Series, DataFrame df1 = DataFrame({'key':['X', 'Y

pandas.DataFrame的groupby()方法的基本使用

pandas.DataFrame的groupby()方法是一个特别常用和有用的方法.让我们快速掌握groupby()方法的基础使用,从此数据分析又多一法宝. 首先导入package: import pandas as pd import numpy as np groupby的最基本操作 df = pd.DataFrame({'A':[1,2,3,1],'B':[2,3,3,6],'C':[3,1,5,7]}) df 按照A列来进行分组(其实说白了就是将A列中重复的值和成同一个值,然后把A当成索

pandas中pd.read_excel()方法中的converters参数

最近用pandas的pd.read_excel()方法读取excel文件时,遇到某一列的数据前面包含0(如010101)的时候,pd.read_excel()方法返回的DataFrame会将这一列视为int类型,即010101变成10101. 这种情况下,如果想要保持数据的完整性,可以以str类型来读取这一列,具体的实现如下: 1 df = pd.read_excel ("test.xlsx" , converters={'类别编码':str}) 如上代码,即可将"test.

pandas中groupby的用法

原文地址:https://www.cnblogs.com/GouQ/p/12609809.html

GroupBy,Apply用法笔记

GroupBy针对DataFrame将其按照某个准则分组 1.常见的调用形式为: df['a'].GroupyBy(df['b']) df.GroupyBy(df['b','c'])#层次化的索引 df.GroupyBy(['b','c'])#直接将columns名称作为索引键进行索引以上可理解为将Series作为分组键,y此外还可以将任何适当长度的array作为分组键,目前未尝试过 2.常用的方法: df.GroupyBy(df['b']).mean()#非数值列数据直接跳过 df.Grou

总结2018.09.25 函数基本用法

# 储备知识:# 函数的使用应该分为两个明确的阶段# 1. 定义阶段:只检测语法,不执行函数体代码def func(): print('from func')# 2. 调用阶段:会触发函数体代码的执行# func() #先定义后调用# 示范一# def foo():# print('from foo')# bar()# foo() # # 示范二:# def bar():# print('from bar')## def foo():# print('from foo')# bar()## fo

【Python数据挖掘课程】六.Numpy、Pandas和Matplotlib包基础知识

前面几篇文章采用的案例的方法进行介绍的,这篇文章主要介绍Python常用的扩展包,同时结合数据挖掘相关知识介绍该包具体的用法,主要介绍Numpy.Pandas和Matplotlib三个包.目录: 一.Python常用扩展包二.Numpy科学计算包三.Pandas数据分析包四.Matplotlib绘图包前文推荐: [Python数据挖掘课程]一.安装Python及爬虫入门介绍 [Python数据挖掘课程]二.K

python之pandas用法大全

python之pandas用法大全更新时间:2018年03月13日 15:02:28 投稿:wdc 我要评论本文讲解了python的pandas基本用法,大家可以参考下一.生成数据表1.首先导入pandas库,一般都会用到numpy库,所以我们先导入备用:?12import numpy as npimport pandas as pd2.导入CSV或者xlsx文件:?12df = pd.DataFrame(pd.read_csv('name.csv',header=1))df = pd.D

Pandas 用法总结

Pandas 用法总结一.生成数据表 1.首先导入pandas库,一般都会用到numpy库,先导备用: ##### import numpy as np ? ##### import pandas as pd?#### 2.导入CSV或者xlsx文件: data = pd.DataFrame(pd.read_csv('name.csv',header=1)) data = pd.DataFrame(pd.read_exce('name.xlsx')) 3.用pandas创建数据表 df = p