ACCT648 Applied Statistics for Data Analysis

Term 1, 2019/2020
ACCT648 Applied Statistics for Data Analysis
Assignment 3
Deadline of Submission: Upload your answer file in word-format on 6 November
2019 before 5pm in e-Learn, and submit the hard copy during class on that day
1. The owner of a moving company typically has his most experienced manager predict the
total number of labor hours (Hours) that will be required to complete an upcoming move.
This approach has proved useful in the past, but the owner has the business objective
of developing a more accurate method of predicting labor hours. In a preliminary effort
to provide a more accurate method, the owner has decided to use the number of cubic
feet moved (Feet), the number of pieces of large furniture (Large) and whether there is
ACCT648作业代写、代做Data Analysis作业
an elevator in the apartment building (Elevator) as the independent variables and has
collected data for moves in which the origin and destination were within the borough
of Manhattan in New York City and the travel time was an insignificant portion of the
hours worked. The data are organized and stored in Moving2019.csv.
(a) Find the multiple regression equation L1 with all the three main independent variables.
(b) Find the multiple regression equation L2 with all the three main independent variables
with the interaction effect of Feet and Elevator.
(c) Find the multiple regression equation L3 with all the three main independent variables
with the interaction effect of Large and Elevator.
(d) Find the multiple regression equation L4 with all the three main independent variables
with the interaction effect of Feet and Large.
(e) When comparing all four regression models: L1, L2, L3, L4, explain why model L3
is the best model.
(f) Perform a residual analysis on the model L3 and determine whether the regression
assumptions are valid.
(g) Construct a 95% prediction interval estimate for the labor hours for moving 420
cubic feet with 2 large furniture in an apartment building that does not have an
elevator in model L3
(h) Construct a 95% confidence interval estimate for the average labor hours for moving
400 cubic feet with 3 large furniture in an apartment building that has an elevator
in model L3
(i) True or False: For a fixed value of cubic feet and at least one large furniture
situations, the total number of labor hours to move in the building with elevator
is on average less than the number of labor hours to move in the building without
elevator under model L3. Justify your answer.
1
2. Based on data set given in Question (1),
(a) Fit the multiple regression equation to predict the total number of labor hours with
all independent variables by using the Forward Selection and BIC criterion on the
training set. Plot the graph to show the number of variables versus BIC in each
selection step.
(b) Fit the multiple regression equation to predict the total number of labor hours
with all independent variables by using the Best Subset Selection with adjusted R2
criterion on the training set. Plot the graph to show the number of variables versus
adjusted R2
in each selection step.
(c) Use the 5-fold cross-validation approach to fit the models of L1, L2, L3 and L4 and
determine which model is the best under the criterion of their associated crossvalidation
errors. (Note: use set.seed(1208))
(d) Use the Leave-One-Out cross-validation approach to fit the models of L1, L2, L3 and
L4 and determine which model is the best under the criterion of their associated
cross-validation errors. (Note: use set.seed(5623))
3. Suppose we collect data for a group of 130 students in a statistical class with two
independent variables X1 = average studying hours per week, X2 = GPA, and one
dependent variable Y = Pass (or Fail).
We fit a logistic regression model: log(odds ratio) = β0+β1X1+β2X2 to predict whether
a student will pass the course. R-outputs produce estimated coefficients, βˆ
0 = −9.5447,
βˆ
1 = 0.5709, and βˆ
2 = 1.0682. The observations of the first five students are given as
follows:
Student Y X1 X2
1 Pass 9.4 3.03
2 Pass 14.5 3.52
3 Pass 12.2 3.14
4 Fail 8.4 2.76
5 Fail 11.3 3.20
(a) Based on the estimated logistic regression model, predict the probability that a
student who studies 11 hours per week on average and has a GPA of 3.40 will pass
the course.
(b) At least how many hours would the student in part (a) need to study to have more
than 70% predicted chance of passing the course?
(c) Find the deviance residues of the first five observed students.
(d) By using the estimated logistic regression model with the threshold value being
0.55 for classification of passing the course, determine whether the model makes
any error to predict each of the above five observed students. If there is an error,
determine what type of error as well.
2
4. The stock prices of Singapore Telecommunications Limited (SingTel) with code (Z74.SI)
and Singapore Airlines Limited (SIA) with code (C6L.SI) from 27 August 2018 to 29
July 2019 are stored in SingTelSIA2019.csv. Suppose a portfolio investment has 8,000
shares of SingTel at price of $3.34 per share and 5,000 shares of SIA at price of $9.42
per share on 29 July 2019. Therefore, the portfolio investment has value of $73,820
(8, 000 × 3.34 + 5, 000 × 9.42) on 29 July 2019.
(a) Based on the historical approach without any assumption of distribution, calculate
the one-day 99% VaR for this portfolio on 29 July 2019.
(b) Without any assumption of distribution, estimate the one-day 99% VaR for this
portfolio on 29 July 2019 based on the Bootstrap approach with 100,000 repetitions.
(Note: use set.seed(5483))
(c) Obtain a 95% Bootstrap percentile confidence interval for the one-day 99% VaR for
this portfolio on 29 July 2019.
5. The director of undergraduate studies at a college of business wants to predict whether
students in a BBA program can graduate with a honor degree using independent variables,
High school grade point average (GPA), SAT score, gender, and local citizen.
Data from a random sample of 90 students, organized and stored in BBA2019.csv,
show that 46 successfully completed the program with honor degrees (coded as Yes) and
44 without honor degrees (coded as No) under the variable column Graduate.
(a) Develop a logistic regression model, L1, to predict the probability of successfully
completed the BBA program with honor degrees, based on all independent variables.
(b) Develop the other logistic regression model, L2, to predict the probability of successfully
completed the BBA program with honor degrees, based on the SAT, Gender,
and Local independent variables.
(c) Develop the other logistic regression model, L3, to predict the probability of successfully
completed the BBA program, based on the SAT and Local independent
variables.
(d) Develop the other logistic regression model, L4, to predict the probability of successfully
completed the BBA program, based on the SAT independent variables.
(e) Explain why model L4 is the best model among the four models considered. At the
0.05 level of significance, is there evidence that a logistic regression model L4 is a
good fitting model?
(f) Predict the probability of successfully completed the BBA program with honor
degree given that a male local citizen with GPA 3.45 and SAT score 1330 under
model L4.
(g) Find the confusion matrix of model L4 with the threshold value 0.6 for classifying
students successfully completed the BBA program with honor degrees.
(h) Find the sensitivity, specificity and total error rate of the model L4 with the threshold
value 0.6.
-END-

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected]

微信:codehelp

原文地址:https://www.cnblogs.com/pythoncomp3/p/11800354.html

时间: 2024-08-04 07:22:27

ACCT648 Applied Statistics for Data Analysis的相关文章

ST5222: Advanced Topics in Applied Statistics

ST5222: Advanced Topics in Applied StatisticsMidterm 1Dealine for submission midnight 9th of October, 2019.1. (10 points)(a) Suppose (x1, x2, x3) follow a multivariate normal distribution withmean (µ1, µ2, µ3) and covariance matrixshow that the condi

STA 442 Methods of Applied Statistics

Homework 2, Mixed effects modelsSTA 442 Methods of Applied StatisticsDue 16 Oct 2019Math (10 marks)data("MathAchieve", package = "MEMSS")head(MathAchieve)School Minority Sex SES MathAch MEANSES1 1224 No Female -1.528 5.876 -0.4282 1224

Python For Data Analysis -- Pandas

首先pandas的作者就是这本书的作者 对于Numpy,我们处理的对象是矩阵 pandas是基于numpy进行封装的,pandas的处理对象是二维表(tabular, spreadsheet-like),和矩阵的区别就是,二维表是有元数据的 用这些元数据作为index更方便,而Numpy只有整形的index,但本质是一样的,所以大部分操作是共通的 大家碰到最多的二维表应用,关系型数据库中的表,有列名和行号,这些就是元数据 当然你可以用抽象的矩阵来对这些二维表做统计,但使用pandas会更方便  

About Data Analysis

About Data Analysis 工具不能解决代码中的问题.它可以帮助你更好地了解你的代码正在做什么,通过捕捉应用程序运行时的详细统计信息,并将它们呈现给你进行分析.由于每个应用程序都不同,查找和解决问题的实际步骤各不相同.因此,您必须学习如何通过过滤不需要的数据来解释信息工具,并钻入与应用程序相关的数据.然后,您必须执行一些检查工作,将您识别的任何数据与应用程序中的代码关联起来,这样您就可以进行改进.Instruments doesn't fix problems with your c

Data analysis system

A data analysis system, particularly, a system capable of efficiently analyzing big data is provided. The data analysis system includes an analyst server, at least one data storage unit, a client terminal independent of the analyst server, and a cach

Spark的Python和Scala shell介绍(翻译自Learning.Spark.Lightning-Fast.Big.Data.Analysis)

Spark提供了交互式shell,交互式shell让我们能够点对点(原文:ad hoc)数据分析.如果你已经使用过R,Python,或者Scala中的shell,或者操作系统shell(例如bash),又或者Windows的命令提示符界面,你将会对Spark的shell感到熟悉. 但实际上Spark shell与其它大部分shell都不一样,其它大部分shell让你通过单个机器上的磁盘或者内存操作数据,Spark shell让你可以操作分布在很多机器上的磁盘或者内存里的数据,而Spark负责在集

Python For Data Analysis -- NumPy

NumPy作为python科学计算的基础,为何python适合进行数学计算,除了简单易懂,容易学习 Python可以简单的调用大量的用c和fortran编写的legacy的库   The NumPy ndarray: A Multidimensional Array Object ndarray,可以理解为n维数组,用于抽象矩阵和向量 Creating ndarrays 最简单的就是,从list初始化, 当然还有其他的方式,比如, 汇总,     Data Types for ndarrays

《Python For Data Analysis》学习笔记-1

在引言章节里,介绍了MovieLens 1M数据集的处理示例.书中介绍该数据集来自GroupLens Research(http://www.groupLens.org/node/73),该地址会直接跳转到https://grouplens.org/datasets/movielens/,这里面提供了来自MovieLens网站的各种评估数据集,可以下载相应的压缩包,我们需要的MovieLens 1M数据集也在里面. 下载解压后的文件夹如下: 这三个dat表都会在示例中用到,但是我所阅读的<Pyt

Python For Data Analysis -- IPython

IPython Basics 首先比一般的python shell更方便一些 比如某些数据结构的pretty-printed,比如字典 更方便的,整段代码的copy,执行 并且可以兼容部分system shell , 比如目录浏览,文件操作等   Tab Completion 这个比较方便,可以在下面的case下,提示和补全未输入部分 a. 当前命名空间中的名字 b.对象或模块的属性和函数 c. 文件路径   Introspection, 内省 ?,在标识符前或后加上,显示出对象状况和docst