spark学习02天-scala读取文件,词频统计

1.在本地安装jdk环境和scala环境

2.读取本地文件:

scala> import scala.io.Source
import scala.io.Source

scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
lines: List[String] = List("With the development of civilization, it is the chil
dren‘s duty to study in school since they were small. As the young kids, it is t
heir nature to hang out for fun. ", "", "While for them, most of the time have b
een limited in the class. So they feel frustrated and don‘t have much passion to
 study. It is of great importance to develop ", "", "interest. The first thing i
s to broaden vision. The students can read travel books or watch tourist show, f
or anyone who cannot resist the charm of beautiful scenery ", "", and delicious
food. The second thing is taking the right attitude to exams. Never giving too m
uch pressure on getting high marks. The only thing we should do is to enjoy gain
ing knowledge.)

3.词频topN计算

scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
(x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse
res0: List[(String, Int)] = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin
g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o
nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study
.,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were,
1), (time,1), (them,,1), (children‘s,1), (development,1), (knowledge.,1), (It,1)
, (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat
ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma
ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1),
(travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil
ization,,1), (broaden,1), (out,1), (food.,1), (don‘t,1), (importance,1), (kid...

原文地址:https://www.cnblogs.com/students/p/10992149.html

时间: 2024-10-11 18:38:10

spark学习02天-scala读取文件,词频统计的相关文章

Python学习15:Open读取文件

在之前我已经学习过raw_input和argv了,在这一节的Python学习中,我学习怎样使用脚本打开普通的文本文件,读取它并且关闭文件.关闭文件很重要,关闭是为了释放资源,防止内存被耗尽,导致机器死锁.另外,关闭文件还有一个作用,当写文件时,关闭后将缓冲区中的内容写入文件本身. 下面是一个简单的读取文本文件的脚本.我们可以用两种方式来实现这个功能:第一种是一个带参数的脚本.第二种是不使用参数,直接使用变量来读取文件的脚本. 第一种: 1. # 导入argv模块 2. from sys impo

php学习笔记--高级教程--读取文件、创建文件、写入文件

打开文件:fopen:fopen(filename,mode);//fopen("test.txt","r"): 打开模式:r  仅仅读方式打开,将文件指针指向文件头 r+  读写方式打开,将文件指针指向文件头 w  写入方式,指向文件头,假设不存在则尝试创建 w+ 读写方式,指向文件头,假设不存在则尝试创建 a  写入方式打开,指向文件末尾,假设不存在则尝试创建 a+ 读写方式打开,指向文件末尾,假设不存在则尝试创建 读取文件:fread:fread(); rea

【python学习02】- open读写文件

#coding=utf8 f = open('f:\xusj.txt','w')    #打开xusj.txt,并写入文件 f.write('hello,')              #写入字符串 f.write('i play python!')      #继续追加写入字符串 f.close                        #关闭字符串 f = open('f:\xusj.txt','r')    #读取文件内容 c = f.readline()               

英文文件词频统计

import refrom collections import Countertxt = open('readme.txt',mode='r').read()#读取文件list1 = re.split('\W+',txt)#以不是英文字母来区分单词out1 = Counter(list1)#统计词频print('词频统计结果:',out1)print('出现频率最高的前十个单词:',out1.most_common(10)) 输出: 原文地址:https://www.cnblogs.com/c

大数据spark学习第一周Scala语言基础

Scala简单介绍 Scala(Scala Language的简称)语言是一种能够执行于JVM和.Net平台之上的通用编程语言.既可用于大规模应用程序开发,也可用于脚本编程,它由由Martin Odersk于2001开发.2004年開始程序执行在JVM与.Net平台之上.由于其简洁.优雅.类型安全的编程模式而受到关注. Scala的创建者——Martin Odersk 在Scala的创建之初,并没有怎么引起重视,随着Apache Spark和Apache Kafka这样基于Scala的大数据框架

Spark学习笔记--安装SCALA和IDEA开发环境

一:安装Scala 二:安装IDEA开发环境

scala 读取文件遇到encode问题(Mac -> remote Linux)

Source.fromFile(fileName)(enc: Encode),如果遇到错误: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:277) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:337) at sun.nio.cs.

Scala读取文件内容

import scala.io.Source if(args.length>0){ for(line <- Source.fromFile(args(0)).getLines) println(line.length+" "+line) } else Console.err.println("Please enter filename");

Python下的OpenCV学习 02 —— 图像的读取与输出

OpenCV提供了众多对图片操作的函数,其中最基本的就是图片的读取与输出了. 一.读取图片 利用OpenCV读取一张图片是非常容易的,只需要用到 imread() 函数,我们进入IPython,输入help(cv2.imread)获取该函数的文档,得到: imread(...)     imread(filename[, flags]) -> retval 可见, imread需要提供两个参数,第一个是图片的路径,第二个是图片读取的模式(flags),函数返回一个存储着图片像素数据的矩阵. fl