Working With Data Sources 10

Preparing Data for SQL:

Sometimes we would like to stroe data into SQL server. However , the dataset need to be cleaned before it is sent. So here we use pandas to deal with dataset(.csv)file.

1. read_csv, set encoding:

  file = pd.read_csv("academy_awards.csv",encoding = ‘ISO-8859-1‘)

2. Use str function to read first 4 letters of all the strings in the column.

  file["Year"] = file["Year"].str[0:4]

3. Use .isin function to get the target rows I need:

  award_categories = ["Actor -- Leading Role","Actor -- Supporting Role",‘Actress -- Leading Role‘,‘Actress -- Supporting Role‘]

  nominations = later_than_2000[later_than_2000.isin(award_categories)[‘Category‘]]

4. Use .map() function to replace all the element in the column as I need:

  won_dic = {
  ‘NO‘ : 0,
  ‘YES‘: 1
  }
  nominations.is_copy = False #Attention, here we can not directly modify the copied dataframe, we have to run this line to make copied dataframe changable.
  nominations["Won?"] = nominations["Won?"].map(won_dic)

5. Use .drop() function to get rid of columns I do not need:

  final_nominations = nominations.drop(delete_list,axis = 1)

6. Use vectorized string method to modify each string in a column in the dataframe:

  additional_info_one = final_nominations["Additional Info"].str.rstrip("‘}") #rstrip is to get rid of all the strings on the right side of the target string in the bracket.
  additional_info_two = additional_info_one.str.split("{.")  
  movie_names = additional_info_two.str[0]
  characters = additional_info_two.str[1]

7.  Use to_sql request to save the dataset into the sql:

  final_nominations.to_sql("nominations",conn,index = False)

时间: 2024-10-10 05:34:05

Working With Data Sources 10的相关文章

Spark SQL External Data Sources JDBC简易实现

在spark1.2版本中最令我期待的功能是External Data Sources,通过该API可以直接将External Data Sources注册成一个临时表,该表可以和已经存在的表等通过sql进行查询操作.External Data Sources API代码存放于org.apache.spark.sql包中. 具体的分析可参见OopsOutOfMemory的两篇精彩博文: http://blog.csdn.net/oopsoom/article/details/42061077 ht

Spark SQL and DataFrame Guide(1.4.1)——之Data Sources

数据源(Data Sources) Spark SQL通过DataFrame接口支持多种数据源操作.一个DataFrame可以作为正常的RDD操作,也可以被注册为临时表. 1. 通用的Load/Save函数 默认的数据源适用所有操作(可以用spark.sql.sources.default设置默认值) 之后,我们就可以使用hadoop fs -ls /user/hadoopuser/在此目录下找到namesAndFavColors.parquet文件. 手动指定数据源选项 我们可以手动指定数据源

Export to Microsoft Excel On a Dynamics AX Form With Multiple Data Sources【转】

AX 2012 now makes it really easy to output to Excel from a form. Quite simply all you need to do is add a Command button to the form and link it to the command Export to Microsoft Excel. This is great for list pages or any form with a single data sou

Spark SQL External Data Sources JDBC官方实现写测试

通过Spark SQL External Data Sources JDBC实现将RDD的数据写入到MySQL数据库中. jdbc.scala重要API介绍: /** * Save this RDD to a JDBC database at `url` under the table name `table`. * This will run a `CREATE TABLE` and a bunch of `INSERT INTO` statements. * If you pass `tru

handsontable-developer guide-data binding,data sources

数据绑定: 1.表格中得数据是引用了数据源中的数据:表格中数据改变,数据源中得数据也改变:数据源中得数据改变,通过render方法,表格中的数据也改变: 2.如果想把数据源中的数据和表格中的数据分开:JSON.parse(JSON.stringify(data2)) 3.保存之前clone表格,使用afterChange的var tmpData = JSON.parse(JSON.stringify(data3));语句. afterChange:cell改变之后,会触发function(cha

Working with Data Sources 7

1.Working with dates in SQL: in SQL query, we can select date by using where. < means before that date, and > means after that date: SELECT * FROM facts WHERE updated_at > "2015-10-30 16:00" and updated_at <"2015-11-02 15:00&quo

Working with Data Sources 8

Data Schema is the table which contains all the data types of another table. 1. Add column in schema table for the main table by using ALTER TABLE... ADD ALTER TABLE facts ADD awesomeness integer; # have to mention datatype 2. Delete column from sche

Working with Data Sources 4

Querying SQLite from Python 1. We use connect() in the library sqlite3 to connect the database we would like to query. Once it is connected, the target database is the only one database we are currently connecting. import sqlite3 conn = sqlite3.conne

Working with Data Sources 3

SQL And Database: 1.SQL query is to request the data from the database. 2.We use SELECT command to pick the specific column from the database. SELECT rank,major # This command will return a list which contains the data completely follow the order of