Data Cleaning 1

1. Read mutiple data files;

  import pandas as pd

  data_files = [
  "ap_2010.csv",
  "class_size.csv",
  "demographics.csv",
  "graduation.csv",
  "hs_directory.csv",
  "sat_results.csv"
  ]

  data = {}

  for f in data_files:
  file = pd.read_csv("schools/{0}".format(f)) #Format string syntax
  f = f.replace(".csv","")#Delete all the .csv and save as file name
  data[f] = file

2. Read .txt file and combine function:

  all_survey = pd.read_csv("schools/survey_all.txt",delimiter = "\t", encoding = "windows-1252") #what is the meaning of delimiter and encoding?
  d75_survey = pd.read_csv("schools/survey_d75.txt",delimiter = "\t", encoding = "windows-1252") 
  survey = pd.concat([all_survey,d75_survey],axis = 0) #combine function

时间: 2024-08-08 09:26:59

Data Cleaning 1的相关文章

Quick Guide: Steps To Perform Text Data Cleaning in Python

Quick Guide: Steps To Perform Text Data Cleaning in Python Introduction Twitter has become an inevitable channel for brand management. It has compelled brands to become more responsive to their customers. On the other hand, the damage it would cause

data cleaning

Cleaning data in Python Table of Contents Set up environments Data analysis packages in Python Clean data in Python Load dataset into Spyder Subset Drop data Transform data Create new variables Rename variables Merge two datasets Handle missing value

Data Cleaning 4

1. Read the data: 1.1 If the data is not in .csv file. We have to search for the special read method all_survey = pandas.read_csv("schools/survey_all.txt", delimiter="\t", encoding='windows-1252') # read http://kunststube.net/encoding/

Data Cleaning 5

1. Histogram vs. Bar chart  With bar charts, each column represents a group defined by a categorical variable; and with histograms, each column represents a group defined by a quantitative variable.Which means we can change the order of categories in

Data Cleaning 3

1. Find correlations for each type of data by using corr() correlations = combined.corr(method = "pearson") print(correlations["sat_score"]) note: The value of correlation is from -1 to 1. If the data close to 1, they are positive corr

Data Cleaning 2

1. When we match a set of data with duplicated values in a column, and we want to use this column as an unify column which is sharing for each database. We are going to filter them into a DataFrame we want. class_size = data["class_size"] class_

【Repost】A Practical Intro to Data Science

Are you a interested in taking a course with us? Learn about our programs or contact us at [email protected]. There are plenty of articles and discussions on the web about what data science is, what qualitiesdefine a data scientist, how to nurture th

Data Visualizations 3

Data Cleaning and visualization: 1.Before cleaning a set of data, we need to inspect the data by using shape(),head(),dtype(),decribe() function. 2.First, we are going to deal with the missing data.(by using dropna() or loc[]) 3.Second, we are going

Data mapping-数据映射

数据映射:根据数据的结构信息建立数据间的映射操作机制. 数据映射的要素: 一.数据 1.源数据: 2.目标数据: 3.数据间关系: 4.数据的元数据(结构信息). 5.元素类型的对应关系. 二.元数据的获取: 1.描述文件:coredata的momd文件,数据库的表结构: 2.结构信息:使用运行时的反射或格式信息的内存读取获取. 三.映射操作: 1.硬编码进行格式转换. 2.根据元数据信息直接内存写入: 3.根据元数据信息kvc写入: 四.非匹配映射 1.映射的数据间是一一对应的关系,但是键值不