Econ 3818 R data project

Econ 3818
Spring 2019
R data project

Unlike your other R assignments, this assignment is individual work only. You may discuss your project with classmates but you must have your own unique project..

Final write-up is due via email by 5 pm on Monday, April 22. You will need to turn in a pdf including your written answers, R code and R output where asked to do so. You do not need to print this project.

Often data comes unstructured. There is a bit of work to do before you can readily apply the concepts learned in this course. To receive credit for a project you will complete the following.

1)Load a data set into R. This can be any data set that has at least two quantitative variables and one qualitative variable and can be used to complete the rest of this assignment.
a)If you have an interest (chess, avocados, climate change), this is a great opportunity to explore that topic. Or just browse some data sets like this one. I listed potential data sources at the end of this pdf. Look for data that is “.csv” file type.
b)If you do not have an interest in mind, you can use the default data set from the 2016 American Community Survey from Colorado. It is located on desire to learn as “ACS_2016_CO.csv” with an accompanying codebook describing the variable values. NOTE: the default data set is nice to have, but a lot of the variables are top coded as 9999* because the information is missing. This isn’t valuable data when calculating statistics or creating plots and should not be used! (the command subset(-) in R is helpful for dealing with this)

2)Describe the data set with words. How many variables are there? How many observations? What is the unit of observation for the data set (person-year, state, month)? Is this a cross-section, or multiple observations overtime? Do we have repeated observations for the same subject? Describe a few key variables that you will use in your data set including their units (feet, miles, $).

代写Econ 3818作业、代做R程序设计作业、代写data project留学生作业、R编程语言作业调试

3)Summarize one of the quantitative variables for the full sample using sample statistics. Then, summarize the same quantitative variable for a subset the observations that meet a specific condition. (e.g. report the average, and standard deviation, monthly price of avocados in 20 major cities in the US from 2010 to 2015, then summarize the avocado price for all 20 cities only in the month of February). Try to choose the subsample in a way that is meaningful. How do the summary statistics compare and what do you learn from that? Include R code and output here.

4)Create a histogram of the variable that you summarize in part 3 with properly labeled axis and title. (Bonus points if you can create two histograms of the same variable, split by some other variable, that are strikingly different). Include R output here.

5)Calculate a confidence interval. You can choose to calculate a confidence interval of one variable or a difference of means confidence interval. Pick something interesting to you and interpret your findings. Include your R code here and output here.

6)Formalize a hypothesis you wish to test with these data (e.g. is the average salary from men the same as the average salary for women?). You might not have all the knowledge to test the exact hypothesis you are interested in. You will mostly be interested in doing a difference of means test. Or if the mean of a variable is equal to a specific value.

7)Conduct the hypothesis test at the level of significance and interpret your results in a meaningful way. Include your R code and output here.

8)Visualize at least two variables from the data set using a two-dimensional plot with an appropriate title, axis labels, and legend. The goal is for this image to tell a story that is clear to the reader. A useful visualization here could be a scatterplot involving two quantitative variables. (Bonus points if you can incorporate a third variable into the plot using another dimension, think color, shape, line thickness, etc). While not necessary, this is a great opportunity to become familiar with ggplot(-). Include R code and output here.

9)Finally, think and write about who would be a good “consumer” of this information. Who would be interested in the facts you present here, and how you could improve the analysis in the future by incorporating new data or using the existing data to answer a more interesting question.

For tooling up in R for this assignment, look at https://datacarpentry.org/R-ecology-lesson/index.html lesson 3, Manipulating data frames, and lesson 4, Visualizing data.

In terms of grading, I will be going through the following rubric:

Formatting 10 points – this should look professional!
Dataset description 10 points
Summary 10 points
Histogram 10 points
Confidence interval 15 points
Formalize hypothesis 5 points
Conduct hypothesis 15 points
Visualization 15 points
Write up 10 points
Individual Meeting week of 4/8 5 points (Extra credit!)

Potential data sources (if you do not care to use the default ACS data)

There are many good resources to find data online:
Google’s dataset search: https://toolbox.google.com/datasetsearch

Data is plural structured archive, list of interesting data:
https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk/edit

Bureau of Labor Statistics, prices, unemployment: https://www.bls.gov/data/
Five Thirty Eight project data: https://github.com/fivethirtyeight/data
Energy Information Agency: https://www.eia.gov/
Census data at IPUMS: https://www.ipums.org/
Economics data at the Federal Reserve: https://fred.stlouisfed.org/
Economic history data: http://eh.net/databases/
Bureau of Economic Analysis: https://www.bea.gov/data
Agricultural data at USDA: https://www.ers.usda.gov/data-products/

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected]

微信:codinghelp

原文地址:https://www.cnblogs.com/blgpython/p/10738278.html

时间: 2024-11-13 13:07:31

Econ 3818 R data project的相关文章

将基因组数据分类并写出文件,python,awk,R data.table速度PK

由于基因组数据过大,想进一步用R语言处理担心系统内存不够,因此想着将文件按染色体拆分,发现python,awk,R 语言都能够非常简单快捷的实现,那么速度是否有差距呢,因此在跑几个50G的大文件之前,先用了244MB的数据对各个脚本进行测试,并且将其速度进行对比. 首先是awk处理,awk进行的是逐行处理,具有自己的语法,具有很大的灵活性,一行代码解决,用时24S, 1 #!/usr/bin/sh 2 function main() 3 { 4 start_tm=date 5 start_h=`

SVN+Apche+Nginx+IF.svnadmin管理

SVN安装和配置 1)安装svn模块和svnversion [[email protected] ~]# yum install mod_dav_svn subversion 2)创建SVN库目录 [[email protected] ~]# mkdir -p /data/project/svn 3)创建SVN配置文件目录 [[email protected] ~]# mkdir -p /etc/subversion 4)配置svn配置文件信息 [[email protected] ~]# to

R Programming week1-Reading Data

Reading Data There are a few principal functions reading data into R. read.table, read.csv, for reading tabular data readLines, for reading lines of a text file source, for reading in R code files (inverse of dump) dget, for reading in R code files (

6 Useful Databases to Dig for Data (and 100 more)

6 Useful Databases to Dig for Data (and 100 more) You already know that data is the bread and butter of reports and presentations. Data makes your presentation solid. It backs up the ideas you are selling. It gives people reasons to listen to you. Ho

《R in Nutshell》 读书笔记(连载)

R in Nutshell 前言 例子(nutshell包) 本书中的例子包括在nutshell的R包中,使用数据,需加载nutshell包 install.packages("nutshell") 第一部分:基础 第一章 批处理(Batch Mode) R provides a way to run a large set of commands in sequence and save the results to a file. 以batch mode运行R的一种方式是:使用系统

R8:Learning paths for Data Science[continuous updating…]

Comprehensive learning path – Data Science in Python Journey from a Python noob to a Kaggler on Python So, you want to become a data scientist or may be you are already one and want to expand your tool repository. You have landed at the right place.

android DDMS 连接真机(己ROOT),用file explore看不到data/data文件夹的解决办法

问题是没有权限,用360手机助手或豌豆荚也是看不见的. 简单的办法是用RE文件管理器(授予root权限),把data和data/data设置成777权限 注意:用RE管理器打开看到默认不是777的,只是可读写还是不够的. 另外就是使用adb shell命令,但android下的shell是阉割了的 不能用-R参数 既使su到root帐号也执行不了 C:\Documents and Settings\Administrator>adb shell [email protected]_spyder:

ubuntu下修改mysql默认data路径

由于ubuntu默认的mysql路径是在/var/lib/mysql下,很多时候我们如果没有挂载其它分区在/var的时候,随着网站逐渐浏览和添加内容,数据容量也会越来越大,自然磁盘空间也会比较吃紧.因此我们就需要把mysql的data路径转移到其它目录下. 今天在转移的时候走了不少弯路,最后经过反复尝试后得到了比较精简的步骤,提炼如下. 设置新data路径(假设新路径为/data/mysql)这里需要说明的就是,最开始我图省事使用了mv来移动,随后发现在设置完路径后mysql无法访问.因此为了保

Datasets for Data Mining and Data Science

From kdnuggets Data repositories AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Bioassay da