CSE210 Advanced Object Oriented Programming

CSE210 Advanced Object Oriented Programming
Coursework 2019

Release date: 11th, Mar, 2019
Deadline: 12:00PM, 23rd, Apr, 2019

1. Description
The objective of the coursework is to develop a practical application for data processing, analysis and content search based on the object-oriented principles you learn from the CSE210 module.

1. 1. Dataset
The coursework uses a subset of the dataset (file name: twitterDataset.xlsx) published at http://followthehashtag.com/. It contains 204,820 short messages (tweets) collected from Twitter (https://twitter.com), during the period of 14-16, April, 2016, from various locations in the United States. The dataset contains many different topics, e.g., weather, leisure, sports, etc.

The dataset is stored in an EXCEL file and needs to be processed from your Java program. Each record (row) contains information about a tweet. The columns are explains as follows.
Tweet Id – the ID of a tweet
Date – the date on which a tweet is published
Hour – the time when a tweet is published
User Name – name of a user
Nickname – nickname of a user
Tweet content – the actual message
Favs – number of users who like the tweet
RTs – number of users who re-tweet the tweet (republish)
Latitude – latitude of the location where a tweet is published
Longitude – longitude of the location where a tweet is published
Followers – number of followers of a user (the values are the same for the same user)

1. 2. Data Processing and Analysis
To process data stored in an EXCEL file, you can use the Apache POI (the Java API for Microsoft Documents) library. The API is available at:
https://poi.apache.org/download.html#POI-3.17

You need to download it and put it into your classpath in order to use it. A tutorial on how to use the library is at:
http://poi.apache.org/spreadsheet/quick-guide.html#CellContents

You need to write code to extract data from the dataset and then perform some simple data analysis, e.g., find who the top users are. The detailed requirements are explained in Section 2.

1. 3. Content Search
Effective tweet searching is an important functionality for users, especially when there are millions of tweets published in each single day. The search process takes a user’s input keyword as a query and then retrieves relevant tweets. Two simple searching strategies are considered.

1.Text matching – The idea is to find tweets based on matching the user’s keyword (only one word) with all the tweets. A tweet is regarded as relevant if the keyword appears in it. This is a basic feature of the application.
2.Full text search – This is considered as an advanced feature of the application. To simplify the problem, you do NOT need to study the techniques in the field of Information Retrieval. According to the research reported in [1], the full text search functionality on Twitter is based on a modified version of the Apache Lucene API. However, the actual full text search on Twitter is extremely complicated. This coursework only attempts to build a much simpler full text search engine based on the original Lucene API, only for limited amount of Twitter data. Lucene can be downloaded at: https://lucene.apache.org/core/. A tutorial is available here: https://lucene.apache.org/core/7_7_1/demo/overview-summary.html.

1.4. Objects in the Application
It is important to follow good object-oriented programming principles and practice, e.g., object modelling, functionality decomposition, code reuse. At the same time, you need to ensure correctness and take robustness and efficiency into account.

You need to design a number of information objects to represent the objects together with the methods for this application. You need to design a number of helper classes in order to realise the functionalities. A separate class for testing purposes is also needed. Below is an example about the objects for this coursework; however, you can have your own design following good programming practice and style.
Tweet – represents a tweet message;
User – represents a user of the Twitter.
LinkedList – represents a list of (sorted) objects.
Utility and helper classes – as needed for data processing, analysis, and content search.
Test class – for testing the implemented functionalities.

2. Requirements and Tasks
You need to complete the following tasks.
T1: develop information objects and helper classes (with their methods) needed for this application with good coding style, and design an easy-to-use interface for testing (command window is enough).
T2: efficiency and robustness (assessed during program testing).
T3: complete the following tasks which will be checked and marked during the demonstration sessions.
Note: FAILING TO PRESENT AT THE DEMONSTRATION SESSION WILL RESULT IN ZERO MARK FOR THE DEMONSTRATION PART (SEE MARKING SCHEME).
T3-1: load data in the program.
T3-2: print top 10 tweets. Tweets should be ranked based on the sum of: (1) number of users who like the tweet (Favs), and (2) number of users who re-tweet the tweet (RTs). Relevant information, such as when, and by whom it is published, should also be shown.
T3-3: print top 10 users. Users should be ranked based on the number of followers that they have.
T3-4: tweet search based on string matching.
T3-5: tweet search based on full text searching using Lucene.
T4: use Javadoc comments to document your codes and generate HTML javadoc.
T5: demonstrate your work in the same order as specified in the lab group file (note that the time for each demonstration varies and you probably need to wait a bit longer than expected). You might be asked to modify your code during the demonstration (to detect plagiarism). The ability to modify code is measured in percentage and marks for demonstration will be scaled. Being able to modify code according to requirements will get 100%, not able to modify anything (will report for potential plagiarism) will get 0%. Other cases will be considered accordingly. Feedback will be provided during the demonstration if possible.

3. Deliverables
You should deliver ONE FILE (via ICE) according to the following description.
All source codes, compiled classes and any other supplementary files regarded necessary.
HTML documentation. This can be generated by using either the javadoc utility or the plug-in in your IDE.
A separate README file explaining how to run your code, e.g., path for the dataset, how to install external libraries, and how to test all the functionalities.
DO NOT include the external APIs and the original dataset in your submission as the size will exceed the limit on ICE.
Submit all the source codes, compiled classes, documentation and any supplementary files regarded necessary, in ONE ZIPPED FILE.

After demonstrating your application, you will be asked to submit a test sheet printed on papers.

Note: the formal procedure for submitting coursework at XJTLU is strictly followed.

4. References
[1] Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., & Lin, J. (2012, April). Earlybird: Real-time search at twitter. In 2012 ieee 28th international conference on data engineering (pp. 1360-1369). IEEE.

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com

微信:codinghelp

原文地址:https://www.cnblogs.com/helpyourjava/p/10595463.html

时间: 2024-11-05 02:37:18

CSE210 Advanced Object Oriented Programming的相关文章

Java Object Oriented Programming concepts

Introduction This tutorial will help you to understand about Java OOP'S concepts with examples. Let's discuss about what are the features of Object Oriented Programming. Writing object-oriented programs involves creating classes, creating objects fro

Object Oriented Programming python

new concepts of the object oriented programming : class encapsulation inheritance polymorphism the three features of an object are : identity, state and behaviora class is an abstraction which regroup objects who have the same attributes and the same

MQF Object Oriented Programming

MQF Object Oriented Programming I Fall 2019Hw2 Due 10/1/2019 before midnightSpecificationsRutgers parking garage management system is required to take care Rutgers University Paringneeds. The system can keep track of all the cars parked in your garag

CSC72002 Object Oriented Programming

CSC72002 Object Oriented Programming - Assignment 2Weight: 40% of your final markSpecificationsYour task is to complete various exercises in NetBeans, using the Java language, and to submitthese via the MySCU link created for this purpose.Marking cri

what's the problem of Object oriented programming

The problem came from the Object itselft.It can divided into two aspect: 1.Everything is an Object: that's not true,lots of so called object is  not object ,it's just a Wrapper.You dont know what it is  ,but you just need one to let  things go on. Sa

面向对象编程Object Oriented Programming(OOP)

把对象作为程序的基本单元,一个对象包含了数据和操作数据的函数. 面向过程的程序设计把计算机程序视为一系列的命令集合,即一组函数的顺序执行.为了简化程序设计,面向过程把函数继续切分为子函数,即把大块函数通过切割成小块函数来降低系统的复杂度. 面向过程的程序设计把计算机程序视为一系列的命令集合,即一组函数的顺序执行.为了简化程序设计,面向过程把函数继续切分为子函数,即把大块函数通过切割成小块函数来降低系统的复杂度. 一:封装(类内的事) 假设我们要处理学生的成绩表,为了表示一个学生的成绩,面向过程的

opp(Object Oriented Programming)

嗯,昨天忙了一天没来及发,过年啊,打扫啊,什么搽窗户啊,拖地啊,整理柜子啊,什么乱七八糟的都有,就是一个字,忙. 好了,废话也不多说,把自己学到的放上来吧.嗯,说什么好呢,就说原型链啊 原型对象 每个javascript对象都有一个原型对象,这个对象在不同的解释器下的实现不同.比如在firefox下,每个对象都有一个隐藏的__proto__属性,这个属性就是"原型对象"的引用. 原型链 由于原型对象本身也是对象,根据上边的定义,它也有自己的原型,而它自己的原型对象又可以有自己的原型,这

[email protected] [355] Design Twitter (Object Oriented Programming)

https://leetcode.com/problems/design-twitter/ Design a simplified version of Twitter where users can post tweets, follow/unfollow another user and is able to see the 10 most recent tweets in the user's news feed. Your design should support the follow

object oriented programming : class application

class Thread_Sync; class Critical; class Info; class Info{Info(std::string str):m_info(str){} private: std::string m_info;}; // a process need: { pre-run();run();post-run();mutex for threads, running previlledge} #define APPL_DECLARE_APP( YourAppClas