Data Visualisation and Analytics

Data Visualisation and Analytics Assignment 3
Department of Econometrics and Business Statistics, Monash University
Due Date: 24th October 2019 at 1PM
A Implementing kNN classification (10 Marks)
This part of the assignment involves kNN classification of a dataset of 140 bank customers and must be
completed by ALL students. Note that this assignment is based on simulated data and each student has
their own personalised dataset. You must enter your student ID number before downloading your unique
dataset. The data can be downloaded here.
In the dataset, for each employee, data were collected on the following variables:
• Name : Customer name.
• Default: Did customer fail to pay back loan (Default) or successfully pay back the loan (No Default).
• WeeklyIncome : Income per week.
• EmploymentDuration : Time spent in current job.
• WeeklySpend : Average amount of money spent per week.
• Children : Number of children.
• Age : Customers age.
• Sample : Whether the customer is in the training sample or test sample.
The objective is to predict on the basis of Weekly Income, Employment Duration, Weekly Spend, Number of
Children and Age whether a customer will default. The training sample can be used for determining a rule
for prediction and the test sample for evaluation. You may assume that the costs of both types of incorrect
prediction are equal. All numerical variables have been standardised by subtracting the mean and dividing
by the standard deviation of the traning sample. You do NOT need to standardise the data.

Analytics留学生作业代做
Once you have downloaded your data, complete the assignment by going to this google form and answering
all questions. You must be signed in to your Monash email account when submitting the form for this to
work. To help you prepare your answers a pdf version of the form is available on Moodle.
B Analysis of classification methods (10 Marks)
The second part of the assignment is to be submitted as a hard copy. Students enrolled in the ETX2250 unit
code should submit into the mailbox of Joan Tan. Students enrolled in the ETF5922 unit code should submit
into the mailbox of Anastasios Panagiotelis. Both of these can be found on level 5, Building H of the Caulfield
Campus. A soft copy can be submitted via moodle as a backup but you still MUST submit a hard copy.
B.1 Loan Approval (For ETX2250 Students Only)
You are consulting for a bank that currently uses k-nearest neighbours with k = 1 to determine whether a
customer will default on a loan or not default. The features used in this model are weekly spending (measured
in dollars) and duration in the current job (measured in years).
1. Explain why the data need to be standardised before carrying out kNN classification? (2 Marks)
2. Suppose a customer arrives who has been in their job for 5 years (standardised value 0.75) and a weekly
spend of $129.17 (standardised value of 1). Using Figure 1, determine whether the bank predicts that
this customer defaults or does not default? (1 Mark)
Weekly Spending (Standardised)
Employment Duration (Standardised)
Default
Default
No Default
Training Data for Loan Approval
Figure 1: Training data used by bank to determine loan approvals. The features are standardised. The bank
uses k nearest neighbours with k=1 to predict default.
2
3. Suppose the same customer who has been in their job for 5 years (standardised value 0.75) plans to
reduce their weekly spend to $101.44 (standardised value of 0.25). Using Figure 1, determine whether
the bank predicts that this customer defaults or does not default? (1 Mark)
4. Suppose the same customer who has been in their job for 5 years plans to reduce their weekly spend
$92.20 (standardised value of 0). Using Figure 1, determine whether the bank predicts that this customer
defaults or does not default? (1 Mark)
5. With respect to Questions 2 to 4 discuss a limitation(s) of the bank’s method. (1 Mark)
6. How could you address the limitation(s) discussed in your answer to Question 5 while still using k
nearest neighbour classification. (2 Mark)
7. How could linear discriminant analysis overcome the problem discussed in Question 5. (2 Mark)
B.2 Multiclass classification (For ETF5922 Students only)
You are consulting for a client that would like to build a method for predicting brand choice in the
telecommunications industry. You have data on the following:
• Brand: Choice of brand for telecommunications. Either Telstra, Optus or Vodafone
• Income: Yearly income measured in dollars
• Age: Age measured in years
Four potential classification methods should be considered
• k Nearest neighbours classification with k = 3
• k Nearest neighbours classification with k = 13
• Linear Discriminant Analysis (LDA)
• Quadratic Discriminant Analysis (QDA)
Your task is to evaluate ALL of these methods and recommend one method to be used by the client. You
must describe
1. The process and criteria used to evaluate the methods.
2. Any other considerations that are important in evaluating the methods.
3. Any limitations of the analysis.
Summarise your results in a report (should not be more than 1000 words and will probably be less). Any
conclusions you make must be supported by evidence.

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected]

微信:codehelp

原文地址:https://www.cnblogs.com/weljavat/p/11700030.html

时间: 2024-10-17 20:48:24

Data Visualisation and Analytics的相关文章

ETX2250/ETF5922 Data Visualisation and Analytics

ETX2250/ETF5922 Data Visualisation andAnalyticsAssignment 1: Visualisation – The ArtsSubmission instructionsThis assignment comprises 15% of the assessment in ETX2250 andETF5922.Your assignment submission will consist of a pdf document. Thedocument m

Commonly used terms in Data and Analytics

General terms Analytics as a Service (AaaS) The provision of analytics through Web-delivered technologies. These solutions offer businesses an alternative to developing internal hardware setups to perform business analytics. Artificial Intelligence (

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial (I - III)

ABSTRACT Recent technological advancement have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The

Datasets for Data Mining and Data Science

From kdnuggets Data repositories AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Bioassay da

11 Facts about Data Science that you must know

11 Facts about Data Science that you must know Statistics, Machine Learning, Data Science, or Analytics – whatever you call it, this discipline is on rise in last quarter of century primarily owing to increasing data collection abilities and exponent

在Teamcenter环境下用数据联邦(Data Federation)技术的应用

Data Federation一般翻译为数据联邦,是一种数据显示(Data Visualisation)技术, 就是把不同系统的数据整合在一起,在一个系统环境下去显示林一个系统数据库中的数据,而不需要预先把从一个系统导入到另一个系统. 这种技术的好处除了不需要数据导入导出之外,还可以保证数据的实时性,可避免数据冗余和不一致. 如果一个企业已经实施了Teamcenter并把它作为产品开发的集成平台.在Teamcenter中去显示其他系统尤其ERP系统中的数据显得非常有必要. 比如一个工程师在更改零

Principal Data Scientist

http://stackoverflow.com/jobs/124781/principal-data-scientist-concur-technologies-inc?med=clc&ref=small-sidebar-tag-themed-python Job Description Be a core part of the Data Platform team and help deliver the promise of a better and more interesting t

51 Free Data Science Books

51 Free Data Science Books A great collection of free data science books covering a wide range of topics from Data Science, Business Analytics, Data Mining and Big Data to Machine Learning, Algorithms and Data Science Tools. Data Science Overviews An

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1

转自: http://www.confluent.io/blog/stream-data-platform-1/ These days you hear a lot about "stream processing", "event data", and "real-time", often related to technologies like Kafka, Storm, Samza, or Spark's Streaming module.