Learn ZYNQ(10) – zybo cluster word count

1.配置环境说明

spark:5台zybo板,192.168.1.1master,其它4台为slave

hadoop:192.168.1.1(外接SanDisk )

2.单节点hadoop测试:

如果出现内存不足情况如下:

查看当前虚拟内存容量:

free -m
cd /mnt
mkdir swap
cd swap/
创建一个swap文件
dd if=/dev/zero of=swapfile bs=1024 count=1000000
把生成的文件转换成swap文件
mkswap swapfile
激活swap文件
swapon swapfile
free -m

通过测试:

3.spark + hadoop 测试

SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh

MASTER=spark://192.168.1.1:7077 ./bin/pyspark

 

file = sc.textFile("hdfs://192.168.1.1:9000/in/file")
counts = file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("hdfs://192.168.1.1:9000/out/mycount")
counts.saveAsTextFile("/mnt/mycount")
counts.collect()

counts.collect()

错误1:

java.net.ConnectException: Call From zynq/192.168.1.1 to spark1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

这是由于我们用root启动hadoop,而spark要远程操作hadoop系统,没有权限引起的

解决:如果是测试环境,可以取消hadoop hdfs的用户权限检查。打开etc/hadoop/hdfs-site.xml,找到dfs.permissions属性修改为false(默认为true)OK了。

<property>
        <name>dfs.permissions</name>
        <value>false</value>
</property>

 

4.附:我的配置文件

go.sh:

#! /bin/sh -

mount /dev/sda1 /mnt/
cd /mnt/swap/
swapon swapfile
free -m

cd /root/hadoop-2.4.0/
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode
sbin/hadoop-daemon.sh start secondarynamenode
sbin/yarn-daemon.sh start resourcemanager
sbin/yarn-daemon.sh start nodemanager
sbin/mr-jobhistory-daemon.sh start historyserver

jps
while [ `netstat -ntlp | grep 9000` -eq `echo` ]
do
sleep 1
done
netstat -ntlp
echo hadoop start successfully

cd /root/spark-0.9.1-bin-hadoop2
SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh
jps
while [ `netstat -ntlp | grep 7077` -eq `echo` ]
do
sleep 1
done
netstat -ntlp
echo spark start successfully

/etc/hosts

#127.0.0.1      localhost       zynq
192.168.1.1     spark1          localhost       zynq
#192.168.1.1    spark1
192.168.1.2     spark2
192.168.1.3     spark3
192.168.1.4     spark4
192.168.1.5     spark5
192.168.1.100   sparkMaster
#::1            localhost ip6-localhost ip6-loopback

/etc/profile

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH
export JAVA_HOME=/usr/lib/jdk1.7.0_55
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/root/hadoop-2.4.0

export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

ifconfig eth2 hw ether 00:0a:35:00:01:01
ifconfig eth2 192.168.1.1/24 up

HADOOP_HOME/etc/hadoop/yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration>

HADOOP_HOME/etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/mnt/hadoop/tmp</value>
    </property>
</configuration>

HADOOP_HOME/etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.namenode.rpc-address</name>
        <value>192.168.1.1:9000</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/mnt/datanode</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/mnt/namenode</value>
    </property>
</configuration>

done

Learn ZYNQ(10) – zybo cluster word count

时间: 2024-12-07 15:01:05

Learn ZYNQ(10) – zybo cluster word count的相关文章

Learn ZYNQ (9)

创建zybo cluster的spark集群(计算层面): 1.每个节点都是同样的filesystem,mac地址冲突,故: vi ./etc/profile export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH export JAVA_HOME=/usr/lib/jdk1.7.0_55 export CLASSPATH=.:$JAVA_HOME/lib/tools.jar export PA

Hadoop AWS Word Count 样例

在AWS里用Elastic Map Reduce 开一个Cluster 然后登陆master node并编译下面程序: import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import o

Learn ZYNQ (8)

在zed的PS端运行spark: (1)设置uboot为sd卡启动rootfs: "sdboot=if mmcinfo; then " \                         "run uenvboot; " \                         "echo Copying Linux from SD to RAM... && " \                         "fatlo

Learn ZYNQ (7)

矩阵相乘的例子 参考博客:http://blog.csdn.net/kkk584520/article/details/18812321 MatrixMultiply.c typedef int data_type; #define N 5 void MatrixMultiply(data_type AA[N*N],data_type bb[N],data_type cc[N]) { int i,j; for(i = 0;i<N;i++) { data_type sum = 0; for(j =

Hadoop AWS Word Count 例子

在AWS里用Elastic Map Reduce 开一个Cluster 然后登陆master node并编译以下程序: import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import o

Spark的word count

word count 1 package com.spark.app 2 3 import org.apache.spark.{SparkContext, SparkConf} 4 5 /** 6 * Created by Administrator on 2016/7/24 0024. 7 */ 8 object WordCount { 9 def main(args: Array[String]) { 10 /** 11 * 第1步:创建Spark的配置对象SparkConf,设置Spark

Word Count程序(C语言实现)

Word Count 程序 GitHub地址:https://github.com/MansonYe/Word-Count 一.项目简介 Word Count 是用以统计文本文件的字符数.单词数和行数的常用工具. 二.功能分析及实现情况 · 基本功能: 统计file.c的字符数(实现) 统计file.c的单词数(实现) 统计file.c的行数(实现) · 拓展功能: 递归处理目录下符合类型的文件(实现) 显示代码行.空行和注释行的行数(实现) 支持通配符(* , ?)(实现) · 高级功能: 支

课堂练习 Word count

1. 团队介绍 团队成员:席梦寒,胡琦 2. 项目计划 我们选第一.二个功能点进行编程. 具体计划: (1).首先爬取网站内容及网页长度: (2).对爬取的文件内容进行word count操作: (3).对选定词语出现频率进行统计. 3. 环境配置 编程语言:python 代码规范: 一.命名规约 1.[强制]所有编程相关命名均不能以下划线或美元符号开始,也不能以下划线或美元符号结束.反例: _name / __name / $Object / name_ / name$ / Object$ 2

Learn ZYNQ(2)

AXI HP接口的DMA+GIC编程(参照博客) 参照文档:UG873,博客文档 我的Vivado+SDK工程文件打包(60+M) 我的DMA驱动程序(未完成) Vivado 接线图: 地址分配: Learn ZYNQ(2),布布扣,bubuko.com