pandas的学习6-合并concat

import  pandas as pd
import  numpy as  np

‘‘‘
pandas处理多组数据的时候往往会要用到数据的合并处理,使用 concat是一种基本的合并方式.
而且concat中有很多参数可以调整,合并成你想要的数据形式.
‘‘‘

# todo axis (合并方向)

# axis=0是预设值，因此未设定任何参数时，函数默认axis=0。

#定义资料集
df1 = pd.DataFrame(np.ones((3,4))*0, columns=[‘a‘,‘b‘,‘c‘,‘d‘])
df2 = pd.DataFrame(np.ones((3,4))*1, columns=[‘a‘,‘b‘,‘c‘,‘d‘])
df3 = pd.DataFrame(np.ones((3,4))*2, columns=[‘a‘,‘b‘,‘c‘,‘d‘])

#concat纵向合并
res = pd.concat([df1, df2, df3], axis=0)  #vertical stack

#打印结果
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 0  1.0  1.0  1.0  1.0
# 1  1.0  1.0  1.0  1.0
# 2  1.0  1.0  1.0  1.0
# 0  2.0  2.0  2.0  2.0
# 1  2.0  2.0  2.0  2.0
# 2  2.0  2.0  2.0  2.0

# todo 仔细观察会发现结果的index是0, 1, 2, 0, 1, 2, 0, 1, 2，若要将index重置，请看例子二。

# ignore_index (重置 index)

#承上一个例子，并将index_ignore设定为True
res = pd.concat([df1, df2, df3], axis=0, ignore_index=True)

#打印结果
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 3  1.0  1.0  1.0  1.0
# 4  1.0  1.0  1.0  1.0
# 5  1.0  1.0  1.0  1.0
# 6  2.0  2.0  2.0  2.0
# 7  2.0  2.0  2.0  2.0
# 8  2.0  2.0  2.0  2.0
# 结果的index变0, 1, 2, 3, 4, 5, 6, 7, 8

‘‘‘
join (合并方式)
join=‘outer‘为预设值，因此未设定任何参数时，函数默认join=‘outer‘。
此方式是依照column来做纵向合并，有相同的column上下合并在一起，其他独自的column个自成列，原本没有值的位置皆以NaN填充。
‘‘‘
#定义资料集
df1 = pd.DataFrame(np.ones((3,4))*0, columns=[‘a‘,‘b‘,‘c‘,‘d‘], index=[1,2,3])
df2 = pd.DataFrame(np.ones((3,4))*1, columns=[‘b‘,‘c‘,‘d‘,‘e‘], index=[2,3,4])

#纵向"外"合并df1与df2
res = pd.concat([df1, df2], axis=0, join=‘outer‘)

print(res)
#     a    b    c    d    e
# 1  0.0  0.0  0.0  0.0  NaN
# 2  0.0  0.0  0.0  0.0  NaN
# 3  0.0  0.0  0.0  0.0  NaN
# 2  NaN  1.0  1.0  1.0  1.0
# 3  NaN  1.0  1.0  1.0  1.0
# 4  NaN  1.0  1.0  1.0  1.0

#todo 原理同上个例子的说明，但只有相同的column合并在一起，其他的会被抛弃。

#承上一个例子

#纵向"内"合并df1与df2
res = pd.concat([df1, df2], axis=0, join=‘inner‘)

#打印结果
print(res)
#     b    c    d
# 1  0.0  0.0  0.0
# 2  0.0  0.0  0.0
# 3  0.0  0.0  0.0
# 2  1.0  1.0  1.0
# 3  1.0  1.0  1.0
# 4  1.0  1.0  1.0

#重置index并打印结果
res = pd.concat([df1, df2], axis=0, join=‘inner‘, ignore_index=True)
print(res)
#     b    c    d
# 0  0.0  0.0  0.0
# 1  0.0  0.0  0.0
# 2  0.0  0.0  0.0
# 3  1.0  1.0  1.0
# 4  1.0  1.0  1.0
# 5  1.0  1.0  1.0

# join_axes (依照 axes 合并) 坐标轴合并

#定义资料集
df1 = pd.DataFrame(np.ones((3,4))*0, columns=[‘a‘,‘b‘,‘c‘,‘d‘], index=[1,2,3])
df2 = pd.DataFrame(np.ones((3,4))*1, columns=[‘b‘,‘c‘,‘d‘,‘e‘], index=[2,3,4])

#依照`df1.index`进行横向合并
res = pd.concat([df1, df2], axis=1, join_axes=[df1.index])#根据谁的index来的

#打印结果
print(res)
#index的原因
#     a    b    c    d    b    c    d    e
# 1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
# 2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
# 3  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0

#移除join_axes，并打印结果
res = pd.concat([df1, df2], axis=1)
print(res)
#     a    b    c    d    b    c    d    e
# 1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
# 2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
# 3  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
# 4  NaN  NaN  NaN  NaN  1.0  1.0  1.0  1.0

# append (添加数据)  纵向才是添加数据嘛，横向是增加数据的维度，就不是append了
# append只有纵向合并，没有横向合并。

#定义资料集
df1 = pd.DataFrame(np.ones((3,4))*0, columns=[‘a‘,‘b‘,‘c‘,‘d‘])
df2 = pd.DataFrame(np.ones((3,4))*1, columns=[‘a‘,‘b‘,‘c‘,‘d‘])
df3 = pd.DataFrame(np.ones((3,4))*1, columns=[‘a‘,‘b‘,‘c‘,‘d‘])
s1 = pd.Series([1,2,3,4], index=[‘a‘,‘b‘,‘c‘,‘d‘])

#将df2合并到df1的下面，以及重置index，并打印出结果
res = df1.append(df2, ignore_index=True)
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 3  1.0  1.0  1.0  1.0
# 4  1.0  1.0  1.0  1.0
# 5  1.0  1.0  1.0  1.0

#合并多个df，将df2与df3合并至df1的下面，以及重置index，并打印出结果
res = df1.append([df2, df3], ignore_index=True)
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 3  1.0  1.0  1.0  1.0
# 4  1.0  1.0  1.0  1.0
# 5  1.0  1.0  1.0  1.0
# 6  1.0  1.0  1.0  1.0
# 7  1.0  1.0  1.0  1.0
# 8  1.0  1.0  1.0  1.0

#合并series，将s1合并至df1，以及重置index，并打印出结果
res = df1.append(s1, ignore_index=True)
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 3  1.0  2.0  3.0  4.0

concat是一种基本的合并方式，但是concat有很多参数可以调整

axis=0是预设值，也就是默认就为vertical合并

ignore_index=true 这个参数用于忽略以前的index，生成新的有序的index

join合并 join=‘outer’为预设值，按照column做纵向合并，去重功能，不够的用nan填充

inner模式就不存在nan，相当于outer模式合并后去掉有nan的所有列

join_axes是concat的一个参数，join_axes=[df1.index]表示按照df1的index进行合并，axis=1（表示横向增加维度）

比如df1有1,2,3 ，但是df2只有2,3，4此时会舍弃df2的4，并且后半部分1为空

append为添加数据 vertical stack

出处：https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/3-6-pd-concat/

原文地址：https://www.cnblogs.com/simon-idea/p/9571805.html

时间： 2024-07-31 21:25:20

pandas的学习6-合并concat的相关文章

短视频学习 - 6、pandas之DataFrame数据合并

今日内容 # DataFrame的concat.merge.append操作简介 # Pandas 的两个主要数据结构,Series(1维)和DataFrame(2维) # 整理数据.清理数据,分析数据.数据建模,然后将分析结果组织成适合绘图或表格显示的形式常用操作 # 分片合并 concat # 融合结合 merge <{'key': ['func', 'func'], 'v1': [1, 2]},{'key': ['func', 'func'], 'v2': [3,4]}> # 追加

pandas.DataFrame学习系列2——函数方法(1)

DataFrame类具有很多方法,下面做用法的介绍和举例. pandas.DataFrame学习系列2--函数方法(1) 1.abs(),返回DataFrame每个数值的绝对值,前提是所有元素均为数值型 1 import pandas as pd 2 import numpy as np 3 4 df=pd.read_excel('南京银行.xlsx',index_col='Date') 5 df1=df[:5] 6 df1.iat[0,1]=-df1.iat[0,1] 7 df1 8 Open

pandas 基础学习

import numpy as np import pandas as pd s = pd.Series([1, 3, 6, np.nan, 10, 23]) print(s) dates = pd.date_range('20180101', periods=4) print(dates) df = pd.DataFrame(np.random.randn(4, 5), index=dates, columns=['a', 'b', 'c', 'd', 'e']) print(df) df =

python数据表的合并(python pandas join() 、merge()和concat()的用法)

merage# pandas提供了一个类似于关系数据库的连接(join)操作的方法<Strong>merage</Strong>,可以根据一个或多个键将不同DataFrame中的行连接起来,语法如下: merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=('_x', '_y'), c

Pandas中DataFrame数据合并、连接（concat、merge、join）之concat

一.concat:沿着一条轴,将多个对象堆叠到一起 concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True): objs:需要连接的对象集合,一般是列表或字典: axis:连接轴向: join:参数为'outer'或'inner': join_axes=[]:指定自定义的索

Pandas中DataFrame数据合并、连接（concat、merge、join）之merge

二.merge:通过键拼接列类似于关系型数据库的连接方式,可以根据一个或多个键将不同的DatFrame连接起来. 该函数的典型应用场景是,针对同一个主键存在两张不同字段的表,根据主键整合到一张表里面. merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=('_x', '_y'), copy=Tr

Pandas中DataFrame数据合并、连接（concat、merge、join）之join

pandas.DataFrame.join 自己弄了很久,一看官网.感觉自己宛如智障.不要脸了,直接抄 DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False) Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by in

Pandas 的轴向连接 concat

在pandas里面,另一种数据何必运算也被称为连接(concatenation).绑定(binding)或堆叠(stacking). Numpy的轴向连接, concatenation Numpy有一个用于合并原始Numpy数组的concatenation函数: In [4]: arr = np.arange(12).reshape((3, 4)) In [5]: arr Out[5]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])

Pandas 通过追加方式合并多个csv

常用合并通常用pandas进行数据拼接.合并的方法有: pandas.merge() pandas.concat() pandas.append() 还有一种方式就是通过 pd.to_csv() 中的追加写入方式追加写入 import pandas as pd for inputfile in os.listdir(inputfile_dir): pd.read_csv(inputfile, header=None) #header=None表示原始文件数据没有列索引,这样的话read_cs