COMP529/336: COURSEWORK ASSIGNMENT

COMP529/336: COURSEWORK ASSIGNMENT #1 (BATCH ANALYTICS)
INTRODUCTION
The assignment aims to test your understanding of batch analytics, with a focus on your ability to use Hadoop to solve Big Data Analytic problems. More specifically, it aims to partially assess the following learning outcome for COMP529: “understanding of the middleware that can be used to enable algorithms to scale up to analysis of large datasets”.
AsSESSMENT
The report will be assessed according to the following criteria:
Criterion
Percentage
Clarity of presentation (including succinctness) of main report 20%
Quality of Java code (including assessment of how easy it is to understand) 40%
Quality of analysis performed 40%
SUBMISSION
Please submit your coursework online using the COMP529/336 page on VITAL by 12 noon on Wednesday 6th November 2019. Standard lateness penalties will apply to any work handed in after this time. The report and the Java program must be written by yourself using your own words.
PROJECT BACKGROUND
Now more than ever, local governments have been engaged in emerging a smart city and creating the most sustainable urban environment to improve the quality of life. Part of their plan is also to introduce a new transportation program known as (bike share) program. The aim of this program is to help their city’s traffic congestion as well as to reduce their city’s air pollution. Today, the idea of sharing bike is very popular, since the bike users are easily allowed to rent any bike from any stations and return it back to their final destination. There are approximately 500,000 bicycles are available around the world for people to share over 500 different sharing programs. For this coursework, your task is to analyse one of the program’s dataset known as Capital Bikeshare; http://capitalbikeshare.com/system-data for the Washington DC. city, in the USA.
The aim of this assignment is to help you to analyse Capital Bikeshare rental program’s dataset and to understand the most popular rental season (e.g., springer, summer, fall, winter) across the year.
?Dataset
A bikeshare dataset has 1000 records of rental bikes in between 2011-2013. The data has been stored in a file called (BikeShareData) and available on VITAL, COMP529/336 Assignment/data folder. The data field is also described in table 1.

Table 1: data record description
Field Description
dteday date
seasons springer, summer, fall, winter
yr year (2011)
mnth month ( 1 to 12)
hr hour (0 to 23)
weekday day of the week
weathersit - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
casual count of casual users
stations chinatown, capitollhill, lincoln, logan, southwest, oxford, abraham, alexandria, …etc.
Your tasks:
1)Set up a Hadoop framework and justify your reason for deploying such framework ( e.g., standalone)?
2)Use ONLY (seasons, stations) data fields, the rest of other data fields can be deleted or ignored.
3)Write a Java program for a MapReduce job that counts the number of seasons in the file (e.g., spring =3, summer=10, winter =30).
4)Use the MapReduce job to calculate the number of time that each bicycle station (e.g., chinatown) has been used in the file.
5)Use the MapReduce job to show your output result in alphabetical order (a- z).
6)COMP529/336作业代做Comment on how this analysis could be extended to consider larger datasets (e.g., 10 years of renting bicycle with 1 Terabyte of dataset).
7)Briefly Describe how to use your Hadoop MapReduce skills to solve other problem (Chose own case study)/MapReduce data flow diagram.
Your output report:
The output from this coursework is a brief report (to be less than or equal to two[ While the requirement is to produce no more than 2 pages, it is anticipated that the challenge will be to fit everything into those 2 pages: it is unlikely that a report of much less than 2 pages will result in a high mark.] A4 pages (excluding any appendices) in 12-point font with no less than 2 cm margins) that should have sections that describe:
1)Middleware configuration: How you configured the Hadoop middleware/screen print (including a description of your Hadoop cluster and your rationale for this choice).
2)Data Analytic Design: How you designed the MapReduce job (including your rationale for your design, briefly state/draw a map reduce data flow model for your work).
3)Results: The results obtained (excluding any discussion);
4)Discussion of Results;
5)Conclusions and Recommendations (including discussion of how you would perform the task if it were to be undertaken at larger scale).
6)List of the Java program for your MapReduce job(s) in the appendix.

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected]

微信:codehelp

原文地址:https://www.cnblogs.com/javahelpee/p/11748838.html

时间: 2024-10-08 10:25:17

COMP529/336: COURSEWORK ASSIGNMENT的相关文章

INST0001 Coursework Assignment Brief

INST0001 Coursework Assignment BriefAssignment Title CourseworkComponent/Module INST0001 Database systemsAssignment Code AS01-INST0001Set by D. RomanoModerated by L. DickensLearning outcomes to be assessed: Describe in brief how a business operates U

CN5102 Module title Data Structures and Algorithms

SCHOOL OF ARCHITECTURE, COMPUTING &ENGINEERINGSubmission instructions Cover sheet to be attached to the front of the assignment when submitted Question paper to be attached to assignment when submitted All pages to be numbered sequentially All work h

6CCS3PRE & 7CCSMPNN Pattern Recognition

6CCS3PRE & 7CCSMPNN Pattern RecognitionCoursework Assignment 1This coursework is assessed. A type-written report needs to be submitted online through KEATSby the deadline specified on the module's KEATS webpage. Only include in your report theinforma

AcF 351b Career Skills

Department of Accounting andFinance Lancaster UniversityAcF 351b Career Skills in Accounting and FinancePython for Data AnalysisStream Assignment2019/201. OverviewPython for Data Analysis stream is designed to provide introductory programming knowled

COMP0037 Coursework Investigating Path Planning Algorithms

COMP0037 Coursework 1Term 2, 2019"Path Planning in a Known World"Investigating Path Planning AlgorithmsCOMP0037 Assignment 1Simon Julier ([email protected]), Dan Butters ([email protected]), Julius Sustarevas([email protected])Version: 21st Janu

Module Code: CMT212 Coursework Assessment Pro-forma

Coursework Assessment Pro-formaModule Code: CMT212Module Title: Visual Communication and Information DesignAssessment Title: Data Analysis and Visualisation CreationAssessment Number: 2Date Set: 4th March 2019Submission Date and Time: 7th May 2019 at

INFO1113 Assignment

INFO1113 Assignment 1Due: September 22nd, 11:59PM AESTThis assignment is worth 6% of your final assessmentTask DescriptionIn this assignment we will develop a key value store database called CrunchDB in the Java programminglanguage using dynamic data

ECON215 Coursework

ECON215 CourseworkThe ECON215 Coursework Exercise is 100% of the module mark. The deadline for submitting the assignment (online only, see below) is 5pm, Thursday 12th December. All exercises in the ECON215 Coursework Exercise need to be completed us

STA303 - Assignment

STA303 - Assignment 1Winter 2020Due 2019-01-31This assignment is worth 5% of your final grade. It is also intended as preparation for Test 1 (worth 20%)and your final exam, so making a good effort here can help you get up to 33% of your final grade.