SDGB 7844

Homework 2: SDGB 7844
Submit two files through Blackboard: (a) .Rmd R Markdown file with answers and code
and (b) Word document of knitted R Markdown file. Your file should be named as follows:
“HW2-[Full Name]-[Class Time]” and include those details in the body of your file.
For those of you who have studied U.S. government, you know that Congress (legislature) is
made up of the House of Representatives and the Senate. The number of people each state
sends to the House is dependent on that state’s population, whereas every state sends two
people to the Senate.
A census of the U.S. population is required every ten years by the U.S. Constitution
(Article 1, Section 2). The primary purpose of the census is to determine how many representatives
each state will send to the House. This procedure is called apportionment
(link). There are 435 representatives in the House and each state sends at least once person.
Once the census is complete, the equal proportions method is used to apportion those
435 seats among the states.
The first census was conducted in 1790 when people were hired to visit each home and
count who lived there. At that time, only white males were eligible to vote, but according
to the Constitution everyone was to be counted, not just eligible voters or citizens. Slaves
were counted too, but were considered only three-fifths of a person (see Constitution Article
1, Section 2, Clause 3). This was abolished after the Civil War when the 13th Amendment
to the Constitution was ratified in 1865.
The Electoral College is a body which decides who is president. The number of House
members plus two equals the number of electoral votes each state gets. During a presidential

SDGB 7844作业代做、R编程设计作业代做
election, citizens technically vote for the Electoral College members (even though the presidential
candidates are on the ballot) and the Electoral College votes for president (link).
For all practical purposes, though, whichever candidate gets the most votes in the state gets
all of the electoral votes for that state. (Note: There are 538 electoral college members and
so 538 electoral votes. 538 = 435 House reps + 50 Senators + 3 people for the District of
Columbia. Therefore, whomever gets at least 270 electoral votes wins.)
The next census is in 2020 when, again, everyone will be counted. Every residential address
will receive a form to fill regarding the occupants of that residence. Between censuses, the
government keeps track of population changes through the Population Estimates Program
(PEP), which is administered by the U.S. Census Bureau (link).
1
Goal: Use 2018 Population Estimates Program (PEP) data to estimate the number
of House of Representative members for each state expected from the results
of the upcoming 2020 census. Compare your estimates with the current House
distribution which is based on the 2010 census1
.
Information Sources:
DO NOT CHANGE ANY OF THE FILE NAMES OR FILES THEMSELVES!!
• “PEP 2018 PEPANNRES with ann.csv”: 2018 population for each state from
the PEP from American FactFinder, a website maintained by the Census Bureau.
Instructions are at the end of this assignment.
• “ApportionmentPopulation2010.xls”: 2010 population for each state and the
2010 apportionment results. Instructions are at the end of this assignment.
• Equal proportions algorithm: In “Congressional Apportionment...” file posted
with this assignment.
• U.S. map: from the R package usmap . You need to install this package on your
computer and then load it by using the command require(usmap). See Lecture 3
slides for instructions on installing an R package.
1. What was the “residence rule” for the 2010 census and why is it important? (Use the
internet and provide a link for any sources you use.)
2. Upload the 2018 data file into R. Only keep the columns Geography; April 1, 2010
- Census; and Population Estimate (as of July 1) - 2018. Rename the columns
state; res2010; and pep2018 (all lowercase).
(a) There are 50 states, so why are there more than 50 rows in the data set?
(b) What is the resident population of the U.S. according to the 2010 census? Which
geographies are included/excluded from this total? Remove the extra rows from
your 2018 PEP data set so you only have the data for the 50 states. (The functions
sum() and is.element() are useful here.)
(c) Calculate the percent change of the total resident population between the 2010
census and 2018. How much has the population grown? Once you’ve answered this
question, remove the res2010 column from the data set.
1Note: The population used for apportionment purposes is slightly higher than the resident populations
given in the 2018 data file. That is because people like overseas military members are included as part of
their home state population totals for apportionment purposes. That means our 2018 population values will
undercount the population used for 2020 apportionment.
Page 2 of 6
3. Upload the 2010 data file into R. This file has some extra bits, so the arguments skip
and n max in the read excel() function from the package readxl may be useful. Keep
the columns STATE; APPORTIONMENT POPULATION (APRIL 1, 2010); and APPORTIONED
REPRESENTATIVES BASED ON 2010 CENSUS. Rename them state;
appor2010; and rep2010 (again, all lowercase).
(a) Calculate the following summary statistics for the 2010 census population values
and put them into a table in Word: minimum, maximium, mean, median, and
standard deviation.
(b) Which state has the largest population? Which has the smallest? Where does New
York fall into the ranking of population size?
4. Create two histograms: (a) 2010 apportionment population and (b) log of the 2010
apportionment populaiton (log always means natural log in statistics). Describe the
shape of both distributions.
5. Looking at your histograms in Question 4, is the mean or the median a better measure
for center in each case? Justify your answer.
6. Create two scatter plots: (a) 2010 apportionment population on the x-axis and number
of House members on the y-axis; and (b) log of 2010 apportionment population on
the x-axis and number of House members on the y-axis. Which plot shows a clearer
relationship between the two variables? Can we use correlation, r, to represent the
relationships in either graph? Justify your answers.
7. Merge the the 2018 population data and the 2010 apportionment data into a single R
object called data.x. Estimate what the number of House members for each state would
be in 2020 based on your 2018 population data using the equal proportions method. Add
your calculated apportionment numbers as a new column in data.x.
The equal proportions method of calculating the number of House members is given in
the “Congressional Apportionment” report posted along with this assignment (additional
info). Read it first so you can understand the instructions given below.
Equal Proportions Method:
Step 1: Calculate a vector of values of the formula 1/
p
n(n − 1) where n goes from 2 to
60 and call it denom. This means that we are assuming that the maximum number
of seats for a state is 60, which seems reasonable given the 2010 representative
numbers. (Make sure you’ve merged your 2010 and 2018 data sets first.)
Step 2: Multiply each value of denom in Step 1 by each state’s 2018 population. For example,
each element in denom is multiplied by Alabama’s population and the repeated
Page 3 of 6
for Alaska, Arizona, etc. These values are called priority values:
P Vn =
state population
p
n(n − 1)
There are many ways to do this, but the simplest in terms of coding is to use some
matrix algebra: c(t(outer(data.x$pep2018, denom))) where outer() calculates
the outer product of two vectors, t() transposes the resulting matrix, and c()
converts the matrix into a vector.
Step 3: Create a data set with the priority values as one column and the corresponding
state names as a second column.
Step 4: Sort your data set in Step 3 in descending order by priority value so that the
highest priority values are on top. Extract the first 385 rows (435-50=385). Each
row of the resulting data set represents one seat in the House.
Step 5: Make a frequency table of the state names in Step 4 using the function count().
The frequency of each state is the initial number of representatives for that state.
Step 6: Merge your frequency table with data.x. Then, replace all NA counts with 0 using
the function replace na().
Step 7: Add 1 to each state representative count so that each state has at least one representative
and the total number of representatives equals 435.
Now, answer the following questions:
(a) Make a table in Word with the three states with the highest number of representatives.
What fraction of the total number of representatives do these 3 states
comprise? Currently, do the same states have the highest number of representatives?
(b) How many states have only a single House of Representatives member?
8. Calculate the following difference: (estimated 2020 house reps − 2010 house reps) as
a new column in data.x and convert it to a character data type Call this column
difference. Make a frequency table of the differences column in Word.
Page 4 of 6
9. A way of representing the information in Question 8 is by creating a map.
(a) Make a map of the US color-coded by the differences column. Then answer the
following questions.
(b) Why does the legend include an NA?
(c) Describe what you see in the map.
(d) Various research/media organizations have made their own predictions about distribution
of the House seats. Pick one and compare your results with their predictions.
Include links to any references you use.
(e) Describe one way we could improve our analysis.
Page 5 of 6
Downloading 2018 PEP Data
1. Go to the American FactFinder website:
https://factfinder.census.gov
2. In the section titled, “What We Provide” near the bottom, click on the “get data” link
next to Population Estimates Program.
3. Click on the table called PEPANNRES, “Annual Estimates of the Resident Population:
April 1, 2010 to July 1, 2018”. It should bring you to a table which looks like this:
4. Click on the Download button; select the “Use” option in the pop-up window and click
OK.
5. Unzip the downloaded file. The file you will be using is called
“PEP 2018 PEPANNRES with ann.csv” The other files in the folder contain information
about the data.
6. You can put the entire folder wherever you have your R code for this assignment.
When you upload the data, use the filepath
“PEP 2018 PEPANNRES/PEP 2018 PEPANNRES with ann.csv” to indicate that the
file you want is inside the folder called “PEP 2018 PEPANNRES”. That way you can
keep all of the information relevant to the data file together.
Downloading 2010 Apportionment Data
1. Go to this website:
https://www.census.gov/data/tables/2010/dec/2010-apportionment-data.html
2. Download the Excel file titled “Apportionment Population and Number...”
Page 6 of 6

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected]

微信:codehelp

原文地址:https://www.cnblogs.com/weljavat/p/11700023.html

时间: 2024-11-09 10:32:05

SDGB 7844的相关文章

SDGB 7844 HW 3: Capture-Recapture Method

SDGB 7844 HW 3: Capture-Recapture MethodDue: 10/31Submit two files through Blackboard: (a) .Rmd R Markdown file with answers and codeand (b) Word document of knitted R Markdown file. Your file should be named as follows:“HW3-[Full Name]-[Class Time]”

HW 3, SDGB 7840: Modeling Literacy Rate

HW 3, SDGB 7840: Modeling Literacy RateDue: 3/28 in classSubmit THREE files through Blackboard: (a) .Rmd R Markdown file with answers andcode, (b) Word document of knitted R Markdown file, and (c) your data file. Your code/Wordfiles should be named a

合并多个文本文件方法

原创作品,出自 "深蓝的blog" 博客,深蓝的blog:http://blog.csdn.net/huangyanlong/article/details/47055589 把多个文本文件合并的小方法 怎样高速合并多个文本(如txt)文件呢? 这个事实上非常easy. (1).WIN下合并多个txt文件 实验: 建立多个txt文本文件.例如以下: 当中内容分别为: 1:a 2:b 3:c 4:d 现要把这四个txt文件合成为一个".sql"文件. 合并后的效果例如

Oracle数据库中scott用户不存在的解决方法

SCOTT用户是我们学习Oracle过程中一个非常重要的实验对象,在我们建立数据库的时候,如果是选择定制模式的话,SCOTT用户是不会默认出现的,不过我们可以通过使用几个简单命令来使这个用户出现.以下是解决方法(基于windows): 1.开始--运行--cmd 输入:sqlplus / as sysdba 连接到数据库 SQL>conn scott/tiger 如果scott不存在,会报ORA-01017的错误,并会断开连接 2.我们以SYS用户重新连接 SQL>conn sys/密码 as

Oracle SQL篇(二)oracle自连接操作

    oracle 的自连接(self join)操作 对于oracle数据库来说,目前支持两套语法,一套是oracle自己的sql语法,一套是通行标准的SQL99语法,那么对于oracle的连接操作 来说,也完全可以使用这样的两套语法来分别的实现.当然从效率上来说,两者是没有差别的.只不过从我的角度来讲,oracle的语法更加简洁而已. 比如说我们有一张表emp,表里数据如下 [email protected]> conn scott/tiger Connected. [email prot

oracle listagg函数、lag函数、lead函数 实例

Oracle大师Thomas Kyte在他的经典著作中,反复强调过一个实现需求方案选取顺序: “如果你可以使用一句SQL解决的需求,就使用一句SQL:如果不可以,就考虑PL/SQL是否可以:如果PL/SQL实现不了,就考虑Java存储过程是否可以:如果这些都不可能实现,那么就需要考虑你是否真的需要实现这个需求.” 各个关系型DBMS产品都在遵守关系型数据库模型的基本体系架构,遵循通用的SQL国际规范.同时,为了更好地配合自身数据库实现的特征,以及提供更加丰富的功能,各个DBMS纷纷在标准SQL上

Oracle学习(2):过滤和排序

Oracle的过滤与排序 where过滤语法 SQL> --查询10号部门的员工 SQL> select * 2  from emp 3  where deptno=10; EMPNO ENAME    JOB              MGR HIREDATE         SAL       COMM     DEPTNO ---------- -------- --------- ---------- -------------- ----- ---------- ----------

plsql数组和嵌套

6 集合类型 6.1 数组 定义:由其元素的最大数目限定的单维有限集合,存放2GB(2*1024*1024*1024)个元素,排列是紧密的 (1)数组的定义.声明.初始化 A 数字类型的数组类型 declare  type num_varray is varray(5) of number;  v_numvarray num_varray:=num_varray(10,20,30,40);  --数组的声明+初始化begin  for idx in 1..4 loop    dbms_outpu

SQL行列转换6种方法

在进行报表开发时,很多时候会遇到行列转换操作,很对开发人员针对于SQL级别行列转换操作一直不甚理解,今天正好抽空对其进行了一些简单的总结.这里主要列举3种可以实现SQL行列转换的方法,包括通用SQL解法以及Oracle支持解法. 一.测试数据 测试表依旧采用Oracle经典的scott模式下的dept表和emp表,结构如下: DEPT: create table DEPT ( DEPTNO NUMBER(2) not null, DNAME VARCHAR2(14), LOC VARCHAR2(