Text Mining Twitter Data in R

Project 1 (20 Points Total)
Text Mining Twitter Data in R (using “tidytext”)

This is a two-week project spanning Weeks 2 and 3.
All parts are due at the end of Week 3.

Purpose
In this project you will use twitter data with the tidytext package in R to explore and analyze tweets. The goal is to dig deep into twitter data to learn more about a topic or event.

Assignment Due Date and Time
?Part 1 and 2 are both due in Week 3, Sunday at 11:59 p.m. ET.
You will need to install R and Twitter data in order to complete this project.
Part 1 (10 Points)

Twitter represents a fundamentally new instrument to make social measurements. Millions of people voluntarily express opinions across any topic. This data source is incredibly valuable for both research and business.

For example, here are some interesting applications of some twitter data analysis studies:

Twitter Study Tracks When We Are :) (twitter data shows biological rhythms)
https://www.nytimes.com/2011/09/30/science/30twitter.html

Twitter mood predicts the stock Market
https://arxiv.org/pdf/1010.3003&embedded=true

Thunderstorm Fest (plot a map of locations where thunder was mentioned in context of a storm in Summer 2012).
https://cliffmass.blogspot.com/2012/07/thunderstorm-fest.html

Researchers from Northeastern University and Harvard University studying the characteristics and dynamics of Twitter as a resource for learning more about how twitter can be used to analyze moods at a national scale.
http://www.ccs.neu.edu/home/amislove/twittermood/

Analyzing Tweets with R and tidytext (Trump and Obama tweet analysis)
https://medium.com/the-artificial-impostor/analyzing-tweets-with-r-92ff2ef990c6
Your Task

Come up with your own twitter analysis idea. Find something to compare on a theme of your choice. Decide on what data you want to use and what you are looking to find in the

代写Twitter Data作业、代写R课程作业 data. You can use your own data or data from strangers. You can use a generic theme or a specific one. You must decide on something you are interested in learning about. See the examples above for some ideas.

Write a 1-2 paragraph description of the analysis you will perform. Title this section, “Description.”

After you have performed the analysis in part 2 (below), provide a 2-3 paragraph description of your conclusion and results. Title this section “Conclusion.” In this section tell me what you discovered from the data? What did the data tell you? Was it what you expected or predicted? Did you learn anything interesting? What are your concluding thoughts on this analysis?

Save both sections together in a document labeled, “analysis.doc.”
Part 2 (10 Points)

Perform the analysis in R using tidytext. Your twitter data analysis must include (all steps outlined in chapter 7):

?Word Frequency Analysis
?Comparison of Word Usage
?Changes in Word Use Analysis
?Favorites and Retweets Analysis

Textbook 2. Chapter 7 will guide you through the steps. Save your R source code for the above steps.

Submission Instructions

Upload your part 1, “analysis.doc” and part 2, R source code files to the assignment submission area.

Grading Criteria
?The assignment is worth 20 points total, broken out as follows:
Criteria Novice Needs Improvement Proficient Excellent
Part 1 Analysis
10 points 0-5 points

An inappropriate topic was selected that didn’t make any sense or require any analysis or was capable of being analyzed by the dataset.
6-7 points

A good level of analysis was reported however there were areas where significant details and observations were missed.
8 points

The responses to all questions were reasonably correct however some of the reasoning contained unrealistic analysis or results. 10 points

An appropriate topic was selected. The responses to the questions adequately analyzed and described the data descriptions as observed in the analysis.

The data showed interesting results that appeared to be appropriate given the analysis performed.
Part 2 Programming
10 points 0-5 points

No working source code was created to address the proposed problem to be solved.
6-7 points

The source code that was created did not properly address the content of the questions although some of it may have worked to produce the correct results.
8 points

A majority of the answers were implemented properly, and the source code contained appropriate but not efficient solutions to address most of the questions.
10 points

All questions were implemented using efficient and correct R source code syntax. The functions were written properly, and they addressed the questions and provided an adequate response in all cases. The correct libraries were used.

Total 0-10 points

0-60% (F - D) 12-14 points

70% (C) 16 points

80% (B) 20 points

100% (A)

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected]

微信:codehelp

原文地址:https://www.cnblogs.com/liipython/p/11631699.html

时间: 2024-10-19 18:05:30

Text Mining Twitter Data in R的相关文章

(Deep) Neural Networks (Deep Learning) , NLP and Text Mining

最近翻了一下关于Deep Learning 或者 普通的Neural Network在NLP以及Text Mining方面应用的文章,包括Word2Vec等,然后将key idea提取出来罗列在了一起,有兴趣的可以下载看看: http://pan.baidu.com/s/1sjNQEfz 我没有把一些我自己的想法放到里面,大家各抒己见,多多交流. 下面简单概括一些其中的几篇paper: Bengio, Yoshua, RéjeanDucharme, Pascal Vincent, and Chr

网络挖掘技术——text mining

一.中文分词:分词就是利用计算机识别出文本中词汇的过程. 1.典型应用:汉字处理:拼音输入法.手写识别.简繁转换 :信息检索:Google .Baidu :内容分析:机器翻译.广告推荐.内容监控 :语音处理:语音识别.语音合成 . 2.分词难点:歧义.新词等. 3.分词技术:机械分词(查词典FMM/BMM,全切分).统计分词(生成式/判别式).理解分词. a)生成式分词:建立学习样本的生成模型,再利用模型对预测结果进行间接推理.两个假设:马尔科夫性(第i个词只依赖于前面的i-1个词):输出独立性

Text段、Data段和BSS段

不同的compiler在编译的过程中对于存储的分配可能略有不同,但基本结构大致相同. 大体上可分为三段:Text段.Data段和BSS段. text段用于存放代码,通常情况下在内存中被映射为只读,但data和bss是可写的. 数据存放通常分成如下几个部分: 1.栈:由编译器自动分配,保存函数的局部变量和参数. 2.堆:一般由程序员动态分配释放, 若程序员不释放,程序结束时可能由OS回收 ,例如malloc.它不同与数据结构中的堆,它更类似于链表. 3.全局区:全局变量和静态变量的存储是放在一块的

正则表达式和文本挖掘(Text Mining)

在进行文本挖掘时,TSQL中的通配符(Wildchar)显得功能不足,这时,使用"CLR+正则表达式"是非常不错的选择,正则表达式看似非常复杂,但,万变不离其宗,熟练掌握正则表达式的元数据,就能熟练和灵活使用正则表达式完成复杂的Text Mining工作. 一,正则表达式的特殊字符 1,常用元字符 用以匹配特定的字符(字母,数字,符号),注意字母是区分大小写的: . :匹配除换行符以外的任意字符 \w :匹配字母或数字或下划线或汉字 \s :匹配任意的空白符 \d :匹配数字 \b :

Unsupervised Learning and Text Mining of Emotion Terms Using R

Unsupervised learning refers to data science approaches that involve learning without a prior knowledge about the classification of sample data. In Wikipedia, unsupervised learning has been described as "the task of inferring a function to describe h

Datasets for Data Mining and Data Science

From kdnuggets Data repositories AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Bioassay da

New packages for reading data into R — fast

小伙伴儿们有福啦,2015年4月10日,Hadley Wickham大牛(开发了著名的ggplots包和plyr包等)和RStudio小组又出新作啦,新作品readr包和readxl包分别用于R读取text数据和Excel电子表格数据.事实上,R已经有了一堆读取数据的函数,比如read.table家族以及其巨多的变形,那么为了牛牛们为什么还要开发这两个包呢?原因很简单,这两个包的读取速度比R内置数据读入函数更快!!!记住哦,是快很多哈!不信,我们下来试试就知道啦!哈哈!平时读取小数据的童鞋可能不

What’s the difference between data mining and data warehousing?

Data mining is the process of finding patterns in a given data set. These patterns can often provide meaningful and insightful data to whoever is interested in that data. Data mining is used today in a wide variety of contexts – in fraud detection, a

爬虫技术收集整理

[爬虫技术收集整理] [通用知识] - 正则表达式中各种字符的含义 - Web Crawler Slide share - Quick & Dirty Python [Java语言] - [知了开发]"知了"优化 - WebMagic 调优 - ContentExtractor开源网页正文抽取工具 - 垂直型爬虫架构设计 - 分布式网络爬虫的基本实现简述 - 分布式多爬虫系统--架构设计 - httpclient 多线程高并发Get请求 - Java爬虫框架WebMagic的使