Data Analysis with Pandas 2

1. pandas.csv_read() to read the .csv file. After read, it is automatically convert DataFrame

2.The DataFrame is the frame for Pandas. It is not a matrix. The first column is not the column name but the first row of data. Column name is different from row of data.

3.Pandas utilizes this feature to provide more context when returning a row or a column from a DataFrame. For example, when you select a row from a DataFrame, instead of just returning the values in that row as a list, Pandas returns a Series object that contains the column labels as well as the corresponding values

4.In the numpy, we use a[99,0] to present the 99th row of the matrix. In pandas, we only need to use a.loc[99] .And there is also a series of 100th row shows on the display.

5.A convenient dtypes attribute for DataFrames returns a Series with the data type of each column.

6.The process of selecting certain columns in all the columns in Pandas format. First convert the Dataframe format to a vector by using .tolist() function. Then loop the list to select the certain row which satisfy the requirement and append these rows into a empty list. In the end, convert the selected list to DataFrame format by using food_list[[list]].

时间: 2024-10-25 23:13:44

Data Analysis with Pandas 2的相关文章

Data Analysis with Pandas 3

1. For searching certain row in certain column. We use name["column_name"][row_index] to locate the certain data in the DataFrame.

Data Analysis with Pandas 1

1. NumPy: NumPy is a Python module that is used to create and manipulate multidimensional arrays. 2. genfromtxt() : Function of reading dataset in NumPy numpy.genfromtxt numpy.genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, ski

Data Analysis with Pandas 4

1. When ever we would like to assign an array into a Series, we need to use [[]] instead [] 2. double_df = float_df.apply(lambda x: x*2)# use apply() to double each element in the Series 3. The axis argument for apply() is to indicate whether we want

Python For Data Analysis -- Pandas

首先pandas的作者就是这本书的作者 对于Numpy,我们处理的对象是矩阵 pandas是基于numpy进行封装的,pandas的处理对象是二维表(tabular, spreadsheet-like),和矩阵的区别就是,二维表是有元数据的 用这些元数据作为index更方便,而Numpy只有整形的index,但本质是一样的,所以大部分操作是共通的 大家碰到最多的二维表应用,关系型数据库中的表,有列名和行号,这些就是元数据 当然你可以用抽象的矩阵来对这些二维表做统计,但使用pandas会更方便  

用pandas进行数据清洗(二)(Data Analysis Pandas Data Munging/Wrangling)

在<用pandas进行数据清洗(一)(Data Analysis Pandas Data Munging/Wrangling)>中,我们介绍了数据清洗经常用到的一些pandas命令. 接下来看看这份数据的具体清洗步骤: Transaction_ID Transaction_Date Product_ID Quantity Unit_Price Total_Price 0 1 2010-08-21 2 1 30 30 1 2 2011-05-26 4 1 40 40 2 3 2011-06-16

《Python For Data Analysis》学习笔记-1

在引言章节里,介绍了MovieLens 1M数据集的处理示例.书中介绍该数据集来自GroupLens Research(http://www.groupLens.org/node/73),该地址会直接跳转到https://grouplens.org/datasets/movielens/,这里面提供了来自MovieLens网站的各种评估数据集,可以下载相应的压缩包,我们需要的MovieLens 1M数据集也在里面. 下载解压后的文件夹如下: 这三个dat表都会在示例中用到,但是我所阅读的<Pyt

Python 探索性数据分析(Exploratory Data Analysis,EDA)

此脚本读取的是 SQL Server ,只需给定表名或视图名称,如果有数据,将输出每个字段符合要求的每张数据分布图. # -*- coding: UTF-8 -*- # python 3.5.0 # 探索性数据分析(Exploratory Data Analysis,EDA) __author__ = 'HZC' import math import sqlalchemy import numpy as np import pandas as pd import matplotlib.pyplo

Python.Data.Analysis(PACKT,2014)pdf

下载地址:网盘下载 Finding great data analysts is difficult. Despite the explosive growth of data in industries ranging from manufacturing and retail to high technology, finance, and healthcare, learning and accessing data analysis tools has remained a challe

《python for data analysis》第十章,时间序列

< python for data analysis >一书的第十章例程, 主要介绍时间序列(time series)数据的处理.label:1. datetime object.timestamp object.period object2. pandas的Series和DataFrame object的两种特殊索引:DatetimeIndex 和 PeriodIndex3. 时区的表达与处理4. imestamp object.period object的频率概念,及其频率转换5. 两种频