QBUS6810 BUSINESS SCHOOL

BUSINESS SCHOOL
Page 1 of 4
QBUS6810
Statistical Learning and Data Mining
Semester 1, 2019
Group Project: Airbnb Pricing Predictions
1. Key information
Required submissions: 1) Written report (submitted as one pdf file per group via Assignment
submission on Canvas); 2) Predictions for the test data (via Kaggle); 3) Python code (via email,
the address to be provided). Further instructions will be posted on Canvas.
Deadline: Friday May 31st at 5PM.
Weight: 30% of your final grade.
Groups: Complete the assignment in groups of four or five students. Make sure to sign into
your group on Canvas; Canvas groups will be used for identification and assessment purposes

代写QBUS6810作业、代做Data Mining作业、Python编程设计作业代写
Length: Your written report should have a maximum of 15 pages (single spaced, 11pt, cover
page not included).
Marking and key rules:
A separately posted rubric indicates the marking criteria for the report.
Carefully read the requirements for each part of the assignment.
Please follow any further instructions announced on Canvas, particularly for submissions.
You must use Python for this assignment. It is fine to use Excel for data manipulation
(however, this approach is generally not recommended due to its inefficiency).
The predictions for the test data on Kaggle must come from your own analysis in Python.
An examination of the code will be conducted for verification purposes.
Please note that it is your responsibility to be informed of and to follow the University of
Sydney and Business School rules and guidelines.
BUSINESS SCHOOL
Page 2 of 4
2. Getting the data
The data is posted on the Kaggle competition page. To be able to join the competition, you
will need to access the competition page via the following link:
https://www.kaggle.com/t/28cb00294fe94927ac801164794b75dd
You will need to create a Kaggle account, identifiable by your name, to access the competition,
download the data and make submissions. After you have created an account and logged
into Kaggle, use the above link to get to the competition page (you need to be logged in to
get to the competition page via the link). On this page you will need to click on the “Join
Competition” link, located in a light blue box near the top right corner of the page”. After you
accept the competition rules, you will have joined the Kaggle competition for the group
project.
Each group should create a team on Kaggle. The group leader can create a team by joining
the competition and then going into the “Team” tab, which will appear near the top of the
competition page. The leader can then invite other group members using their (Kaggle)
names (they need to first join the competition before they are able to be invited). Kaggle
teams must be identical to the groups you formed on Canvas, and the team number must
match the group number. Each student in the group is required to sign up and be identifiable
as a member of a Kaggle team.
3. Problem description
Airbnb (www.airbnb.com) is a hospitality company that runs an online marketplace for renting
and leasing short-term lodging. It is interested in developing a pricing service for its users
that will compute a recommended price based on the features of a listing. As a consultant
working for a data analytics company, you are approached by Airbnb to develop a model for
predicting nightly prices of Airbnb listings based on state-of-art techniques from statistical
learning. The focus of your analytics team is on the properties in London, UK.
You are provided with a dataset containing detailed information on a number of existing
Airbnb listings in London. As part of the contract, you are asked to write a report according
to the instructions given below. The client will use a test set to evaluate your work.
4. Understanding the data
A training dataset and a test dataset are posted on Kaggle. The latter omits the price values.
Furthermore, Kaggle randomly splits the observations in the test set into validation (30%) and
test (70%) cases, but you will not know which ones are which.
When you make a submission during the competition, you get a score equal to the RMSE
computed on the validation cases. These scores are displayed on the “Public Leaderboard”
and provide an ongoing ranking of teams. You can use the scores of your submissions to help
you select the best predictive model.
BUSINESS SCHOOL
Page 3 of 4
You will select one of your submissions to be used as final at the end of the competition. Once
the competition is over, Kaggle will rank the teams’ final submissions based on the test cases
only, and those will be displayed on the “Private Leaderboard”. Your goal is to do as well as
possible on the Private Leaderboard at the end of the competition, so please be careful
not to overfit the validation cases in an attempt to improve your public ranking.
Data Description:
Each row corresponds to a separate Airbnb listing in London, UK. As a consequence
of using real data, a detailed description of all the variables is not available. However,
the names of the variables are self-explanatory. The first column in the data provides
an identifier for each listing and is included to comply with the Kaggle format. It should
not be used as a predictor in the analysis. The response variable, price, is the second
column in the training dataset. It gives the British pound sterling (GBP) price per night
for each listing. Variables security_deposit, cleaning_fee and extra_people are also
measured in GPB and correspond to surcharges. Variables latitude and longitude
specify the geographic location of each property. Several variables are Boolean, with
the word true recorded as “t” and false recorded as “f”. Some of the listings have
missing values under some of the variables. Note that, in many cases, a missing value
means that the corresponding characteristic does not apply to that particular Airbnb
listing. This is information, rather than lack of information, and you could make use of
this information in your analysis.
5. Written report
The purpose of the report is to describe, explain, and justify your solution to the client. You
can assume that the client is trained in business analytics, however, is not an expert in statistical
learning.
Requirements:
Your report must provide the validation (i.e. Public Leaderboard) scores for at least five
different sets of predictions, including your final model. You need to make a submission on
Kaggle to get each validation score. The five sets of predictions should all come from different
statistical learning methods.
In the methodology section you will discuss two of the five models in detail (the other three
do not need to be discussed). One of these two models will be your final model. Also, one of
these two models should be an interpretable model (e.g. OLS, subset selection, Lasso, Ridge,
Elastic net, a single regression tree), and the second one should be a more advanced model
(bagging, random forests, boosting, or a model that contains one of these three as a part).
You will pay special attention to and report on the relationship between the location and the
price, both during the exploratory data analysis and during the model interpretation. As part
of feature engineering, you should create one new location-related variable by using the
existing variables and, if you wish, external information.
BUSINESS SCHOOL
Page 4 of 4
Suggested outline of the report:
1. Introduction: write a few paragraphs stating the business problem and summarising
your final solution and results. Use plain English and avoid technical language as much
as possible in this section (it should be for a wide audience).
2. Data processing and exploratory data analysis: provide key information about the data,
discuss potential issues, and highlight interesting facts that are useful for the rest of
your analysis.
3. Describe and justify your process of feature engineering.
4. Methodology: here you will focus on the two models as outlined above (your rationale
for choosing the models and why they make sense for the data, description of how
these models are fitted, interpretations of the models in the context of the business
problem at hand). This part is allowed to be more technical than the rest of the report.
5. Validation set results from Kaggle and comparison of the methods.
6. Final remarks (non-technical).
6. Kaggle Competition
The purpose of the Kaggle competition is to incorporate feedback by allowing you to compare
your performance with that of other groups. Participation in the competition is part of the
assessment, and you must make sure that your final submission is correct. Your ranking in the
competition will typically not directly affect your marks (apart from thenbonus marks and the
Benchmark requirement, as explained below), however, we will assess whether your
participation represents a genuine effort to make good predictions and improve them (in
particular, you should make sure to beat the “Benchmark” score on the Public Leaderboard).
Real world relevance:
The ability to perform in a Kaggle competition is highly valued by employers. Some employers
go as far as to set up a Kaggle competition just for recruitment.
Bonus marks:
The five teams with the best performance on the Private Leaderboard will receive bonus marks
for the assignment (with the total Group Project score capped at 100). The best performing
team will receive 10 bonus marks, the second team will get 8 marks, the third will get 6 marks,
the fourth and fifth will each get 3 marks (however, the maximum score will remain at or below
100). Please note that your choice of the final model has to be well justified in the report,
and the Kaggle predictions must come from your own analysis in Python. An examination of
the code will be conducted for verification purposes. Your code is required to reproduce the
Kaggle predictions included in the report.

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected]

微信:codinghelp

原文地址:https://www.cnblogs.com/xifua/p/10886768.html

时间: 2024-10-08 16:39:49

QBUS6810 BUSINESS SCHOOL的相关文章

Skype for Business Server 2015系列(二)部署后端服务器

一.Skype for Business Server 2015后端SQL数据库安装 本次我们部署的是企业版的SFB,需要SQL数据库的支持,我们这次选用SQL 2014 SP1. SQL安装需要.Net Framework 3.5的支持,因此开始之前先添加.net 3.5功能. 在服务器配置管理器中添加,Windows Server 2012默认安装了net4.5,要安装3.5建议首先插入Windows Server 2012光盘,然后安装中指定光盘的%CD-Rom%\Sources\SxS\

AX2012 Business Connector Error

6.0: AxCryptoClient - New encryption key created 6.0: Unable to InitializeSession. 6.0: No built-in message corresponding to message id 0. 解决方法: 1.将登录用户设为AX中的Business Connector用户. 2.将AX中的Business Connector用户添加到本地管理员组. AX2012 Business Connector Error,

Lync 项目经验-12-为某上市企业Skype for Business购买Godday证书

<要想看Lync 2013升级SFB 2015真实项目经验:请看Lync 项目经验-01-到-Lync 项目经验-10> 本系列博文: Lync 项目经验-01-共存迁移-Lync2013-TO-SFB 2015-规划01http://dynamic.blog.51cto.com/711418/1858520 Lync 项目经验-02-共存迁移-Lync2013-TO-SFB 2015-规划02http://dynamic.blog.51cto.com/711418/1859143 Lync

Lync 项目经验-11-项目总结01-某上市企业的Skype for Business规划与实施

项目介绍: 某上市企业为了实现统一沟通,将即时消息,音频会议.视频会议.电话会议.硬件视频会议.邮件.Office 365等整合到一起,来实现统一协作. 项目现状: 1. 单域单森林. 2. 域功能级别.林功能级别都使用Windows Server 2012 R2. 3. AD域为a.org. 4. Exchange Server 2013默认域使用a.com. 5. 邮件服务器使用Exchange Server 2013,包括2台客户端访问服务器.2台邮箱服务器角色. 项目需求: 1. 部署S

Skype For Business Server 2016 无法共享PPT和白板

[环境信息] Windows Server 2012 R2 (OS补丁更新到最新) Skype For Business Server 2015 CU4 [问题描述]       使用Skype 2016客户端并且将补丁更新到最新可以正常共享白板和PPT等功能,使用Lync 2013和Skype2016客户端(没有更新任何补丁)共享白板或PPT时报如下错误"当前无法连接到服务器进行演示,错误代码141"或是"由于网络问题,您无法共享笔记以及演示白板.投票-" [解决

Lync 2013就地升级到Skype for Business 2015-01

需求 在我们将服务器升级到Skype for Business 2015之前,我们当前的环境需要满足以下要求: ·        Microsoft Lync Server 2013 CU5(February 2015 update) or above ·        PowerShell RTM version (6.2.9200.0)or later ·        SQL Server 2012 SP1 or later ·        Kb2533623 Windows Serve

Lync 2013就地升级到Skype for Business 2015-02

边缘服务器 在将Lync 2013前端服务器升级成功之后,接下来升级Lync 2013边缘服务器.升级过程与升级前端非常相似. 在前端服务器上打开Topology Builder ,download the Topology右击 EdgePool 并选择 Upgrade to Skype for Business Server 2015: 在升级拓扑之后,右击左上角的Skype for Business Server选择 Publish Topology. 在拓扑升级之后,在边缘服务器上停止Ly

Skype For Business 2015实战系列11:创建并发布拓扑

Skype For Business 2015实战系列11:创建并发布拓扑 Skype For Business Server安装前需要先定义好拓扑,因为我们要在拓扑中的每台服务器上安装 Skype for Business Server 系统,必须首先创建和发布一个拓扑.发布拓扑时,拓扑信息会载入中央管理存储数据库.如果这是 Enterprise Edition 池,您将在初次发布新拓扑时创建中央管理存储数据库.如果是 Standard Edition,则需要运行部署向导中的"准备第一个 St

Skype For Business 2015实战系列12:安装前端服务器

Skype For Business 2015实战系列12:安装前端服务器 配置Front01: 打开Skype for Business Server部署向导,点击"安装或更新Skype for Business Server系统": 安装本地配置存储: 点击"运行": 点击下一步: 安装完成,点击完成: 安装或删除Skype for Business Server组建: 点击运行: 点击下一步: 安装完成,点击完成: 请求.安装或分配证书: 点击运行: 输入基本