Spark+Hadoop+Hive集群上数据操作记录

[[email protected] ~]$ hadoop fs -ls /
drwxr-xr-x+  - jc_rc      supergroup                 0 2016-11-03 11:46 /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1030.csv /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1031.csv /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1101.csv /dt

[rc@vq18ptkh01 ~]$ hadoop fs -ls /dt
16/11/03 11:53:16 INFO hdfs.PeerCache: SocketCache disabled.
Found 3 items
-rw-r--r--+  3 jc_rc supergroup    1548749 2016-11-03 11:48 /dt/wifi_phone_list_1030.csv
-rw-r--r--+  3 jc_rc supergroup    1262964 2016-11-03 11:52 /dt/wifi_phone_list_1031.csv
-rw-r--r--+  3 jc_rc supergroup     979619 2016-11-03 11:52 /dt/wifi_phone_list_1101.csv

[rc@vq18ptkh01 ~]$ beeline
Connecting to jdbc:hive2://1.8.15.1:24002,10.78.152.24:24002,1.8.15.2:24002,1.8.12.42:24002,1.8.15.62:24002/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;sasl.qop=auth-conf;auth=KERBEROS;principal=hive/[email protected]HADOOP.COM
Debug is  true storeKey false useTicketCache true useKeyTab false doNotPrompt false ticketCache is null isInitiator true KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
Principal is jc_rc@HADOOP.COM
Commit Succeeded 

Connected to: Apache Hive (version 1.3.0)
Driver: Hive JDBC (version 1.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.3.0 by Apache Hive
0: jdbc:hive2://1.8.15.2:21066/> use r_hive_db;
No rows affected (0.547 seconds)

0: jdbc:hive2://1.8.15.2:21066/> create table tmp_wifi1030(imisdn string,starttime string,endtime string) row format delimited fields terminated by ‘,‘ stored as textfile;
0: jdbc:hive2://1.8.15.2:21066/> show tables;

[rc@vq18ptkh01 ~]$ wc wifi_phone_list_1030.csv -l
25390 wifi_phone_list_1030.csv
+---------------+--+
|   tab_name    |
+---------------+--+
| tmp_wifi1030  |
+---------------+--+
1 row selected (0.401 seconds)
0: jdbc:hive2://1.8.15.2:21066/> load data inpath ‘hdfs:/dt/wifi_phone_list_1030.csv‘ into table tmp_wifi1030;
0: jdbc:hive2://1.8.15.2:21066/> select * from tmp_wifi1030;
| tmp_wifi1030.imisdn  |  tmp_wifi1030.starttime  |   tmp_wifi1030.endtime   |
+----------------------+--------------------------+--------------------------+--+
| 18806503523          | 2016-10-30 23:58:56.000  | 2016-10-31 00:01:07.000  |
| 15700125216          | 2016-10-30 23:58:57.000  | 2016-10-31 00:01:49.000  |
+----------------------+--------------------------+--------------------------+--+
25,390 rows selected (5.649 seconds)

0: jdbc:hive2://1.8.15.2:21066/> select count(*) from tmp_wifi1030;
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1475071482566_2471703
INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19416140 for jc_rc)
INFO  : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 57 df 96 8a 01 58 4c 64 63 96 8d 0d 65 ff 8e 03 97
INFO  : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2471703/
INFO  : Starting Job = job_1475071482566_2471703, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2471703/
INFO  : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job  -kill job_1475071482566_2471703
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO  : 2016-11-03 12:04:58,351 Stage-1 map = 0%,  reduce = 0%
INFO  : 2016-11-03 12:05:04,702 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.72 sec
INFO  : 2016-11-03 12:05:12,096 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.86 sec
INFO  : MapReduce Total cumulative CPU time: 4 seconds 860 msec
INFO  : Ended Job = job_1475071482566_2471703
+--------+--+
|  _c0   |
+--------+--+
| 25390  |
+--------+--+
1 row selected (25.595 seconds)

0: jdbc:hive2://1.8.15.62:21066/> select * from default.d_s1mme limit 10;
+----------------------+--------------------+-------------------------+----------------------+-------------------+--------------------+--------------------+----------------------+------------------------------+------------------------------------+----------------------------------+--------------------------------+-----------------------------+-----------------------------+-----------------------+------------------------------+----------------------------+------------------------+----------------------+--------------------+------------------------------------+------------------------------------+--------------------------+--------------------------+------------------------+------------------------+-------------------+-----------------------+-------------------------+-------------------------+-------------------+---------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------------------+--+
| d_s1mme .length  | d_s1mme .city  。。。。。。。。。。。。                           |                               | 2016101714           |
| NULL                 | 579                | 5                       | 130980097fb8c900     | 6                 | 460006791248581    | 352093070081343    | 88888888888888888    | 20                           | 2016-10-17 13:30:23.0              | 2016-10-17 13:30:23.0            | 0                              | 20                          | NULL                        | 0                     | 209743848                    | 419                        | 32                     | D5095073             | NULL               | NULL                               | NULL                               | 100.67.254.45            | 100.111.211.166          | 36412                  | 36412                  | 589D              | BAE6802               | NULL                    | NULL                    | NULL              | 0                               | NULL                      | NULL                        | NULL                       | NULL                          | NULL                                | NULL                                |                           |                             |                            |                               |                                     |                                     |                               |                               |                               |                               | 2016101714           |
+----------------------+--------------------+-------------------------+----------------------+-------------------+--------------------+--------------------+----------------------+------------------------------+------------------------------------+----------------------------------+--------------------------------+-----------------------------+-----------------------------+-----------------------+------------------------------+----------------------------+------------------------+----------------------+--------------------+------------------------------------+------------------------------------+--------------------------+--------------------------+------------------------+------------------------+-------------------+-----------------------+-------------------------+-------------------------+-------------------+---------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------------------+--+
10 rows selected (0.6 seconds)
0: jdbc:hive2://1.8.15.62:21066/> create table tmp_mr_s1_mme1030 as
0: jdbc:hive2://1.8.15.62:21066/> select
0: jdbc:hive2://1.8.15.62:21066/> a.length,
0: jdbc:hive2://1.8.15.62:21066/> a.city,
0: jdbc:hive2://1.8.15.62:21066/> a.interface,
0: jdbc:hive2://1.8.15.62:21066/> a.xdr_id,
0: jdbc:hive2://1.8.15.62:21066/> a.rat,
0: jdbc:hive2://1.8.15.62:21066/> a.imsi,
0: jdbc:hive2://1.8.15.62:21066/> a.imei,
0: jdbc:hive2://1.8.15.62:21066/> a.msisdn,
0: jdbc:hive2://1.8.15.62:21066/> a.procedure_start_time,
0: jdbc:hive2://1.8.15.62:21066/> a.procedure_end_time,
0: jdbc:hive2://1.8.15.62:21066/> a.mme_ue_s1ap_id,
0: jdbc:hive2://1.8.15.62:21066/> a.mme_group_id,
0: jdbc:hive2://1.8.15.62:21066/> a.mme_code,
0: jdbc:hive2://1.8.15.62:21066/> a.user_ipv4,
0: jdbc:hive2://1.8.15.62:21066/> a.tac,
0: jdbc:hive2://1.8.15.62:21066/> a.cell_id,
0: jdbc:hive2://1.8.15.62:21066/> a.other_tac,
0: jdbc:hive2://1.8.15.62:21066/> a.other_eci
0: jdbc:hive2://1.8.15.62:21066/> from  default.d_s1mme a
0: jdbc:hive2://1.8.15.62:21066/> join r_hive_db.tmp_wifi1030 b
0: jdbc:hive2://1.8.15.62:21066/> on a.msisdn=b.imisdn
0: jdbc:hive2://1.8.15.62:21066/> and a.p_hour>=‘20161030‘
0: jdbc:hive2://1.8.15.62:21066/> and a.p_hour<‘20161031‘;
create table tmp_mr_s1_mme1030 as
select a.length,a.city,a.interface,a.xdr_id,a.rat,a.imsi,a.imei,a.msisdn,a.procedure_start_time,a.procedure_end_time,a.mme_ue_s1ap_id,a.mme_group_id,a.mme_code,a.user_ipv4,a.tac,a.cell_id,a.other_tac,a.other_eci
from  default.d_s1mme   a join r_hive_db.tmp_wifi1030 b on a.msisdn=b.imisdn and a.p_hour>=‘20161030‘ and a.p_hour<‘20161031‘;

0: jdbc:hive2://1.8.15.2:21066/> create table tmp_mr_s1_mme_enbs1030 as
0: jdbc:hive2://1.8.15.2:21066/> select cell_id/256 from tmp_mr_s1_mme1030;
0: jdbc:hive2://1.8.15.62:21066/> create table tmp_mr_s1_mme_cellids1030 as select distinct cast(cell_id as bigint) as cellid from tmp_mr_s1_mme1030;

0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.mapfiles;
+---------------------------+--+
|            set            |
+---------------------------+--+
| hive.merge.mapfiles=true  |
+---------------------------+--+
1 row selected (0.022 seconds)
0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.mapredfileds;
+---------------------------------------+--+
|                  set                  |
+---------------------------------------+--+
| hive.merge.mapredfileds is undefined  |
+---------------------------------------+--+
1 row selected (0.022 seconds)
0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.size.per.task=1024000000;
No rows affected (0.012 seconds)
0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.smallfiles.avgsize=1024000000;
No rows affected (0.012 seconds)
0: jdbc:hive2://1.8.15.62:21066/> use r_hive_db;
No rows affected (0.031 seconds)
0: jdbc:hive2://1.8.15.62:21066/> insert overwrite directory ‘/dt/‘ row format delimited fields terminated by ‘|‘ select * from tmp_mr_s1_mme_cellids1030;
INFO  : Number of reduce tasks is set to 0 since there‘s no reduce operator
INFO  : number of splits:17
INFO  : Submitting tokens for job: job_1475071482566_2477152
INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19422634 for jc_rc)
INFO  : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 d2 8f 0b 8a 01 58 4c df 13 0b 8d 0d 6c 4b 8e 03 98
INFO  : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2477152/
INFO  : Starting Job = job_1475071482566_2477152, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2477152/
INFO  : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job  -kill job_1475071482566_2477152
INFO  : Hadoop job information for Stage-1: number of mappers: 17; number of reducers: 0
INFO  : 2016-11-03 14:40:52,492 Stage-1 map = 0%,  reduce = 0%
INFO  : 2016-11-03 14:40:58,835 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 28.78 sec
INFO  : 2016-11-03 14:40:59,892 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 33.55 sec
INFO  : 2016-11-03 14:41:10,486 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 37.13 sec
INFO  : 2016-11-03 14:41:11,549 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 41.13 sec
INFO  : MapReduce Total cumulative CPU time: 41 seconds 130 msec
INFO  : Ended Job = job_1475071482566_2477152
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-2 is selected by condition resolver.
INFO  : Stage-4 is filtered out by condition resolver.
INFO  : Number of reduce tasks is set to 0 since there‘s no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1475071482566_2477181
INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19422663 for jc_rc)
INFO  : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 d2 8f 0b 8a 01 58 4c df 13 0b 8d 0d 6c 4b 8e 03 98
INFO  : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2477181/
INFO  : Starting Job = job_1475071482566_2477181, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2477181/
INFO  : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job  -kill job_1475071482566_2477181
INFO  : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
INFO  : 2016-11-03 14:41:22,190 Stage-2 map = 0%,  reduce = 0%
INFO  : 2016-11-03 14:41:28,571 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.2 sec
INFO  : MapReduce Total cumulative CPU time: 2 seconds 200 msec
INFO  : Ended Job = job_1475071482566_2477181
INFO  : Moving data to directory /dt from hdfs://hacluster/dt/.hive-staging_hive_2016-11-03_14-40-43_774_4317869403646242426-140183/-ext-10000
No rows affected (46.604 seconds)

[rc@vq18ptkh01 dt]$ hadoop fs -ls /dt
16/11/03 14:46:18 INFO hdfs.PeerCache: SocketCache disabled.
Found 1 items
-rwxrwxrwx+  3 jc_rc supergroup      26819 2016-11-03 14:41 /dt/000000_0
[rc@vq18ptkh01 dt]$ hadoop fs -copyToLocal /dt/000000_0
16/11/03 14:46:33 INFO hdfs.PeerCache: SocketCache disabled.
[rc@vq18ptkh01 dt]$ ls
000000_0
[rc@vq18ptkh01 dt]$ 

[rc@vq18ptkh01 dt]$ ls
000000_0  000001_0  000002_0  000003_0  000004_0  000005_0
[rc@vq18ptkh01 dt]$ ftp 10.70.41.126 21
Connected to 10.70.41.126 (10.70.41.126).
220 10.70.41.126 FTP server ready
Name (10.70.41.126:rc): joy
331 Password required for joy.
Password:
230 User joy logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> put 000000_0 /Temp/a_dt/
local: 000000_0 remote: /Temp/a_dt/
227 Entering Passive Mode (10,70,41,126,168,163).
550 /Temp/a_dt/: Not a regular file
ftp> put
(local-file) 000000_0
(remote-file) /Temp/a_dt/000000_0
local: 000000_0 remote: /Temp/a_dt/000000_0
227 Entering Passive Mode (10,70,41,126,168,207).
150 Opening BINARY mode data connection for /Temp/a_dt/000000_0
226 Transfer complete.
1049905992 bytes sent in 33 secs (31787.20 Kbytes/sec)
ftp> put 000001_0 /Temp/a_dt/000001_0
local: 000001_0 remote: /Temp/a_dt/000001_0
227 Entering Passive Mode (10,70,41,126,168,255).
150 Opening BINARY mode data connection for /Temp/a_dt/000001_0
452 Transfer aborted.  No space left on device
ftp> put 000002_0 /Temp/a_dt/000002_0
local: 000002_0 remote: /Temp/a_dt/000002_0
227 Entering Passive Mode (10,70,41,126,169,20).
150 Opening BINARY mode data connection for /Temp/a_dt/000002_0
452 Transfer aborted.  No space left on device
ftp> put 000003_0 /Temp/a_dt/000003_0
local: 000003_0 remote: /Temp/a_dt/000003_0
227 Entering Passive Mode (10,70,41,126,169,40).
150 Opening BINARY mode data connection for /Temp/a_dt/000003_0
452 Transfer aborted.  No space left on device
ftp> put 000004_0 /Temp/a_dt/000004_0
local: 000004_0 remote: /Temp/a_dt/000004_0
227 Entering Passive Mode (10,70,41,126,169,66).
150 Opening BINARY mode data connection for /Temp/a_dt/000004_0
452 Transfer aborted.  No space left on device
ftp> put 000005_0 /Temp/a_dt/000005_0
local: 000005_0 remote: /Temp/a_dt/000005_0
227 Entering Passive Mode (10,70,41,126,169,85).
150 Opening BINARY mode data connection for /Temp/a_dt/000005_0
226 Transfer complete.
23465237 bytes sent in 0.747 secs (31391.79 Kbytes/sec)
ftp> 
时间: 2024-10-22 08:46:05

Spark+Hadoop+Hive集群上数据操作记录的相关文章

06、部署Spark程序到集群上运行

06.部署Spark程序到集群上运行 6.1 修改程序代码 修改文件加载路径 在spark集群上执行程序时,如果加载文件需要确保路径是所有节点能否访问到的路径,因此通常是hdfs路径地址.所以需要修改代码中文件加载路径为hdfs路径: ... //指定hdfs路径 sc.textFile("hdfs://mycluster/user/centos/1.txt") ... ? 修改master地址 SparkConf中需要指定master地址,如果是集群上运行,也可以不指定,运行时可以通

本地IDEA中使用Spark直连集群上的Hive

背景 我用VMWare搭建了一个Hadoop集群,Spark与Hive等组件都已经安装完毕.现在我希望在我的开发机上使用IDEA连接到集群上的Hive进行相关操作. 进行配置修改 修改Hive中的hive-site.xml 在hive-site.xml中找到这个配置,将改成如下形式 <property> <name>hive.metastore.uris</name> <value>thrift://master节点的ip地址:9083</value&

使用hive客户端java api读写hive集群上的信息

上文介绍了hdfs集群信息的读取方式,本文说hive 1.先解决依赖 <properties> <hive.version>1.2.1</hive.version> </properties> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>${hive

大数据学习系列之七 ----- Hadoop+Spark+Zookeeper+HBase+Hive集群搭建 图文详解

引言 在之前的大数据学习系列中,搭建了Hadoop+Spark+HBase+Hive 环境以及一些测试.其实要说的话,我开始学习大数据的时候,搭建的就是集群,并不是单机模式和伪分布式.至于为什么先写单机的搭建,是因为作为个人学习的话,单机已足以,好吧,说实话是自己的电脑不行,使用虚拟机实在太卡了... 整个的集群搭建是在公司的测试服务搭建的,在搭建的时候遇到各种各样的坑,当然也收获颇多.在成功搭建大数据集群之后,零零散散的做了写笔记,然后重新将这些笔记整理了下来.于是就有了本篇博文. 其实我在搭

通过eclipse方法来操作Hadoop集群上cassandra数据库(包括创建Keyspace对象以及往数据库写入数据)

(1)下载cassandra,我所用版本为apache-cassandra-2.0.13-bin.tar.gz(hadoop版本为1.0.1),将其上传到hadoop集群,然后解压,tar -xzf apache-cassandra-2.0.13-bin.tar.gz; 并改名为 cassandra,放在目录/usr/下面,然后修改几个文件: vim cassandra.yaml  按照下面的字段修改 data_file_directories: - /usr/cassandra/data # 

王家林的云计算分布式大数据Hadoop征服之旅:HDFS&amp;MapReduce&amp;HBase&amp;Hive&amp;集群管理

一:课程简介: 作为云计算实现规范和实施标准的Hadoop恰逢其时的应运而生,使用Hadoop用户可以在不了解分布式底层细节的情况下开发出分布式程序,从而可以使用众多廉价的计算设备的集群的威力来高速的运算和存储,而且Hadoop的运算和存储是可靠的.高效,的.可伸缩的,能够使用普通的社区服务器出来PB级别的数据,是分布式大数据处理的存储的理想选择. 本课程会助你深入浅出的掌握Hadoop开发(包括HDFS.MapReduce.HBase.Hive等),并且在此基础上掌握Hadoop集群的配置.维

Hadoop集群大数据平台搭建

Hadoop集群环境搭建配置 前言 Hadoop的搭建分为三种形式:单机模式.伪分布模式.完全分布模式,只要掌握了完全分布模式,也就是集群模式的搭建,剩下的两种模式自然而然就会用了,一般前两种模式一般用在开发或测试环境下,Hadoop最大的优势就是分布式集群计算,所以在生产环境下都是搭建的最后一种模式:完全分布模式. 硬件选择 须知: 分布式环境中一个服务器就是一个节点 节点越多带来的是集群性能的提升 一个Hadoop集群环境中,NameNode,SecondaryNameNode和DataNo

Hadoop初学指南(10)--ZooKeeper的集群安装和操作

本文简单介绍了ZooKeeper的基本知识. (1)概述 ①什么是ZooKeeper? Zookeeper 是 Google 的 Chubby一个开源的实现,是 Hadoop 的分布式协调服务 它包含一个简单的原语集,分布式应用程序可以基于它实现同步服务,配置维护和命名服务等 ②Zookeeper的角色 ③为什么使用Zookeeper? 大部分分布式应用需要一个主控.协调器或控制器来管理物理分布的子进程(如资源.任务分配等) 目前,大部分应用需要开发私有的协调程序,缺乏一个通用的机制 协调程序的

将java开发的wordcount程序部署到spark集群上运行

1 package cn.spark.study.core; 2 3 import java.util.Arrays; 4 5 import org.apache.spark.SparkConf; 6 import org.apache.spark.api.java.JavaPairRDD; 7 import org.apache.spark.api.java.JavaRDD; 8 import org.apache.spark.api.java.JavaSparkContext; 9 impo