Working With Data Sources 10

Preparing Data for SQL:

Sometimes we would like to stroe data into SQL server. However , the dataset need to be cleaned before it is sent. So here we use pandas to deal with dataset(.csv)file.

1. read_csv, set encoding:

　　file = pd.read_csv("academy_awards.csv",encoding = ‘ISO-8859-1‘)

2. Use str function to read first 4 letters of all the strings in the column.

　　file["Year"] = file["Year"].str[0:4]

3. Use .isin function to get the target rows I need:

　　award_categories = ["Actor -- Leading Role","Actor -- Supporting Role",‘Actress -- Leading Role‘,‘Actress -- Supporting Role‘]

　　nominations = later_than_2000[later_than_2000.isin(award_categories)[‘Category‘]]

4. Use .map() function to replace all the element in the column as I need:

　　won_dic = {
　　‘NO‘ : 0,
　　‘YES‘: 1
　　}
　　nominations.is_copy = False #Attention, here we can not directly modify the copied dataframe, we have to run this line to make copied dataframe changable.
　　nominations["Won?"] = nominations["Won?"].map(won_dic)

5. Use .drop() function to get rid of columns I do not need:

　　final_nominations = nominations.drop(delete_list,axis = 1)

6. Use vectorized string method to modify each string in a column in the dataframe:

　　additional_info_one = final_nominations["Additional Info"].str.rstrip("‘}") #rstrip is to get rid of all the strings on the right side of the target string in the bracket.
　　additional_info_two = additional_info_one.str.split("{.")　　
　　movie_names = additional_info_two.str[0]
　　characters = additional_info_two.str[1]

7. Use to_sql request to save the dataset into the sql:

　　final_nominations.to_sql("nominations",conn,index = False)

时间： 2024-10-10 05:34:05

Working With Data Sources 10的相关文章

Spark SQL External Data Sources JDBC简易实现

在spark1.2版本中最令我期待的功能是External Data Sources,通过该API可以直接将External Data Sources注册成一个临时表,该表可以和已经存在的表等通过sql进行查询操作.External Data Sources API代码存放于org.apache.spark.sql包中. 具体的分析可参见OopsOutOfMemory的两篇精彩博文: http://blog.csdn.net/oopsoom/article/details/42061077 ht

Spark SQL and DataFrame Guide(1.4.1)——之Data Sources

数据源(Data Sources) Spark SQL通过DataFrame接口支持多种数据源操作.一个DataFrame可以作为正常的RDD操作,也可以被注册为临时表. 1. 通用的Load/Save函数默认的数据源适用所有操作(可以用spark.sql.sources.default设置默认值) 之后,我们就可以使用hadoop fs -ls /user/hadoopuser/在此目录下找到namesAndFavColors.parquet文件. 手动指定数据源选项我们可以手动指定数据源

Export to Microsoft Excel On a Dynamics AX Form With Multiple Data Sources【转】

AX 2012 now makes it really easy to output to Excel from a form. Quite simply all you need to do is add a Command button to the form and link it to the command Export to Microsoft Excel. This is great for list pages or any form with a single data sou

Spark SQL External Data Sources JDBC官方实现写测试

通过Spark SQL External Data Sources JDBC实现将RDD的数据写入到MySQL数据库中. jdbc.scala重要API介绍: /** * Save this RDD to a JDBC database at `url` under the table name `table`. * This will run a `CREATE TABLE` and a bunch of `INSERT INTO` statements. * If you pass `tru

handsontable-developer guide-data binding,data sources

数据绑定: 1.表格中得数据是引用了数据源中的数据:表格中数据改变,数据源中得数据也改变:数据源中得数据改变,通过render方法,表格中的数据也改变: 2.如果想把数据源中的数据和表格中的数据分开:JSON.parse(JSON.stringify(data2)) 3.保存之前clone表格,使用afterChange的var tmpData = JSON.parse(JSON.stringify(data3));语句. afterChange:cell改变之后,会触发function(cha

Working with Data Sources 7

1.Working with dates in SQL: in SQL query, we can select date by using where. < means before that date, and > means after that date: SELECT * FROM facts WHERE updated_at > "2015-10-30 16:00" and updated_at <"2015-11-02 15:00&quo

Working with Data Sources 8

Data Schema is the table which contains all the data types of another table. 1. Add column in schema table for the main table by using ALTER TABLE... ADD ALTER TABLE facts ADD awesomeness integer; # have to mention datatype 2. Delete column from sche

Working with Data Sources 4

Querying SQLite from Python 1. We use connect() in the library sqlite3 to connect the database we would like to query. Once it is connected, the target database is the only one database we are currently connecting. import sqlite3 conn = sqlite3.conne

Working with Data Sources 3

SQL And Database: 1.SQL query is to request the data from the database. 2.We use SELECT command to pick the specific column from the database. SELECT rank,major # This command will return a list which contains the data completely follow the order of

猜你喜欢

访问者模式的目的是要把处理从数据结构中分离出来,如果系统有比较稳定的数据结构,又有易于变化的算法的话,使用访问者模式是个不错的选择,因为访问者模式使的算法操作的增加变得容易.相反,如果系统的数据结构不 ...

权限系统——初识

最近正在做高效平台中权限系统的项目,原来想的权限吗?简单,不就是判断一下这个用户存不存在,能看那些页面吗? 如果你是这么想的,那么我也只能说你应该好好的看看这篇文章,如果您对权限系统认识很深,那么就不 ...

thinkPHP5配置nginx环境无法打开（require(): open_basedir restriction in effect. File(/mnt/hgfs/root/tp5/thinkphp/start.php) is not within the allowed path(s)

今天想把玩一下tp5,结果怎么都无法访问,每次都是报500错误,我把错误提示都打开看到下面的错误 require(): open_basedir restriction in effect. File ...

python常用的数据结构运行效率分析

1.while循环和for循环相同条件下的运行效率比较: 如下代码: 1 import time as tm 2 import timeit as tt 3 import random as rm 4 ...

leetcode题解-508.MostFrequentSubtreeSum

绡守付驶淖漩琥砜笼碍捕稽 数灼源 ^ 懦桂翻幞鞍 胛超澉洌貌r 迹倩声4 诰 萸喋卑铟熵 ま愉厨霎召锛 唆 孰唠豁君滞簿矬摇撖 ㄍ 郡 晤 ...

服务器内存种类和各自特性

一提到服务器性能,大家可能首先都会想到CUP的强弱,而服务器是一个复杂机器,除了处理器外,内存RAM的好坏对服务器整体性能的影响不亚于处理器.虽然,服务器内存和普通台式机的内存不管是在外观还是结构上都 ...

Xmanager远程管理Centos桌面

1.安装gdm yum -y install gdm 2.修改系统启动界面 vim /etc/inittab id:5:initdefault: #(把3或者其他数字改为5) 3.配置gdm ...

YUI3 Y.Attribute

<!DOCTYPE html> <html> <head lang="en"> <meta charset="UTF-8&quo ...

java:各基本数据类型的操作

1 //THINGING IN JAVA P123 2 3 package java_test; 4 5 //Tests all the operators on all the primitive ...

Yii源码阅读笔记（二十九）

动态模型DynamicModel类,用于实现模型内数据验证: namespace yii\base; use yii\validators\Validator; /** * DynamicModel ...

获取手机SD卡路径（国产神机多个SD卡）

通过系统的 Environment.getExternalStorageDirectory().getAbsoluteFile(); 只能得到系统的SD卡路径,对于对个SD卡的国产神机,想得到外部SD ...

深入理解Java虚拟机--笔记1

Java内存区域与内存溢出异常运行时数据区域 Java虚拟机在执行Java程序的过程中会把它所管理的内存区域划分为若干个不同的数据区域. 1 程序计数器--Program Counter Regis ...

HoloLens开发手记 - Unity之World Anchor空间锚

World Anchor空间锚提供了一种能够将物体保留在特定位置和旋转状态上的方法.这保证了全息对象的稳定性,同时提供了后续在真实世界中保持全息对象位置的能力.简单地说,你可以为全息物体来添加空间锚点 ...

杭深铁路福安至霞浦区间设备故障

先进的铁路设备便民程度越来越高,也对当前国计民生发展作用越来越大,所以设备管理的任务也同步加重,设备管理软件平台部署实施的同时还需要各段铁路管理部门加强设备实体全面,科学管理,尤其是动车设备,否则一旦 ...

three.js 源码注释（五十六）Material /PointCloudMaterial.js

商域无疆 (http://blog.csdn.net/omni360/) 本文遵循"署名-非商业用途-保持一致"创作公用协议转载请保留此句:商域无疆 - 本博客专注于敏捷开发 ...

因添加两个webservice引用时提示：{"无法加载协定为“sms.WebServiceSoap”的终结点配置部分，找到了该协定的多个终结点配置。请按名称指示首选的终结点配置部分。"}

今天在做项目时因需要添加两个webservicey引用,但是当我添加了第二个引用时, 程序就不你能运行了,提示说:{"无法加载协定为"sms.WebServiceSoap" ...

linuxmint 默认不启动图形界面

1.参考:http://garyu.blog.51cto.com/2838408/513772 修改/etc/X11/default-display-manager文件该文件的内容一般是一个路径,比 ...

iOS 实时监听textField的值的改变

考虑到代理方法 : textFieldDidEndEditing不能监听到textFiled实时的改变,想要实现实时监听建议注册通知去监听textField的实时改变,直接上代码: [[NSNoti ...

Net的wsdl生成webservice 异常：undefined simple or complex type 'soapenc:Array'

错误代码如下: E:\>wsimport -keep service.xml parsing WSDL... [WARNING] src-resolve: Cannot resolve the ...

c#秒转时分秒

2个办法 @{ int hour = item.track / 3600; int min = (item.track - hour * 3600) / 60; int sen = item.trac ...

专题

随机推荐

© 2024 憋错料 | info#biecuoliao.com | 10 q. 0.020 s.