基本hadoop文件操作

1、启动hadoop工程

2、eclipse导入插件

将hadoop-eclipse-plugin-2.6.0.jar插件导入eclipse中，重启eclipse

3、在Map/Reduce的窗口下建立连接（单机版）

4、创建工程，导入jar，配置文件

提示：工程要用到hadoop的jar包，所以要把hadoop的jar包全部导入建立的工程！

core-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://127.0.0.1:8020</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/tian/Downloads/hadhoop/data/tmp</value>
	</property>
	<property>
		<name>fs.checkpoint.period</name>
		<value>300</value>
	</property>
	<property>
		<name>fs.checkpoint.dir</name>
		<value>${hadoop.tmp.dir}/dfs/namesecondary</value>
	</property>
</configuration>

hdfs-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Licensed under the Apache License, Version 2.0 (the "License"); you 
	may not use this file except in compliance with the License. You may obtain 
	a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless 
	required by applicable law or agreed to in writing, software distributed 
	under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES 
	OR CONDITIONS OF ANY KIND, either express or implied. See the License for 
	the specific language governing permissions and limitations under the License. 
	See accompanying LICENSE file. -->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/home/tian/Downloads/hadhoop/data/hdfs/namenode</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/home/tian/Downloads/hadhoop/data/hdfs/datanode</value>
	</property>
	<property>
		<name>dfs.http.address</name>
		<value>0.0.0.0:50070</value>
	</property>
	<property>
		<name>dfs.datanode.http.address</name>
		<value>0.0.0.0:50075</value>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
</configuration>

yarn-site.xml:

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
</configuration>

5、hadoop操作文件

加载配置：

import org.apache.hadoop.conf.Configuration;

/**
 * @author tian Hadoop 的配置信息
 */

public class HadoopConfig {

	private static Configuration configuration;

	private HadoopConfig() {
	}

	public static Configuration getConfiguration() {

		if (configuration == null) {
			configuration = new Configuration();
			configuration.addResource(HadoopConfig.class
					.getResource("core-site.xml"));
			configuration.addResource(HadoopConfig.class
					.getResource("hdfs-site.xml"));
			configuration.addResource(HadoopConfig.class
					.getResource("yarn-site.xml"));
		}
		return configuration;

	}

}

基本操作：

// 创建目录
	public static void mkdir(String dirPath) throws IOException {

		Configuration configuration = HadoopConfig.getConfiguration();
		FileSystem fileSystem = FileSystem.get(configuration);
		fileSystem.mkdirs(new Path(dirPath));
		fileSystem.close();

	}

	// 创建文件
	public static void createFile(String filePath) throws IOException {

		Configuration configuration = HadoopConfig.getConfiguration();
		FileSystem fileSystem = FileSystem.get(configuration);
		fileSystem.create(new Path(filePath));
		fileSystem.close();

	}

	// 删除目录或者文件
	public static void deleteFile(String filePath) throws IOException {

		Configuration configuration = HadoopConfig.getConfiguration();
		FileSystem fileSystem = FileSystem.get(configuration);
		fileSystem.deleteOnExit(new Path(filePath));
		fileSystem.close();

	}

	// 遍历文件
	public static void getListFile(String filePath) throws IOException {

		Configuration configuration = HadoopConfig.getConfiguration();
		FileSystem fileSystem = FileSystem.get(configuration);
		FileStatus[] fileStatus = fileSystem.listStatus(new Path(filePath));
		for (FileStatus file_Status : fileStatus) {
			System.out.println(file_Status.getPath().toString());
		}
		fileSystem.close();

	}

	// 上传文件
	public static void upLoadFile(String src, String dest) throws IOException {

		Configuration configuration = HadoopConfig.getConfiguration();
		FileSystem fileSystem = FileSystem.get(configuration);
		fileSystem.copyFromLocalFile(new Path(src), new Path(dest));
		fileSystem.close();

	}

	// 下载文件
	public static void downloadFile(String src, String dest) throws IOException {

		Configuration configuration = HadoopConfig.getConfiguration();
		FileSystem fileSystem = FileSystem.get(configuration);
		fileSystem.copyToLocalFile(new Path(src), new Path(dest));
		fileSystem.close();

	}

	// 写文件
	public static void writeFile(String filePath) throws IOException {

		Configuration configuration = HadoopConfig.getConfiguration();
		FileSystem fileSystem = FileSystem.get(configuration);
		Path path = new Path(filePath);
		FSDataOutputStream out = fileSystem.create(path);
		out.writeUTF("我不是学霸,哈哈哈");
		fileSystem.close();

	}

注：这些我们都可以用java application来运行，可以用一下简单的hdfs命令来查看运行结果，比如：

基本命令和linux差不多，当然也可以到官网查看：http://hadoop.apache.org/ 当然在eclipse里面也可以查看到我们文件的操作结果：

时间： 2024-10-10 01:04:07

基本hadoop文件操作的相关文章

第二课：hdfs集群集中管理和hadoop文件操作

(1)观察集群配置情况 [[email protected] ~]# hdfs dfsadmin -report (2)web界面观察集群运行情况使用netstat命令查看端口监听 [[email protected] ~]# netstat -ntlp 浏览器地址栏输入:http://192.168.56.100:50070 (3)对集群进行集中管理 a) 修改master上的/usr/local/hadoop/etc/hadoop/slaves文件 [[email protected] h

hadoop基本文件操作

使用hadoop mapreduce已有些时日了,最近开始写一些自己的基础库.hadoop文件操作是必须的基本操作,使用文件操作命令往往比较繁琐,因此便写了一个简单的类.由于自己的基础库是根据自己的项目需求来写,因而并不完整.目前仅测试了最基本的上传.删除和下载操作,在后面的开发中会持续完善.现在共享出来给初学者作为参考,望大神不吝赐教. hadoop文件操作类及其测试类的百度云盘链接: http://pan.baidu.com/s/1c0g7CkC HadoopFile类的方法都会返回一个bo

Hadoop之HDFS文件操作

摘要:Hadoop之HDFS文件操作常有两种方式,命令行方式和JavaAPI方式.本文介绍如何利用这两种方式对HDFS文件进行操作. 关键词:HDFS文件命令行 Java API HDFS是一种分布式文件系统,为MapReduce这种框架下的海量数据分布式处理而设计. Hadoop之HDFS文件操作常有两种方式,一种是命令行方式,即Hadoop提供了一套与Linux文件命令类似的命令行工具:另一种是JavaAPI,即利用Hadoop的Java库,采用编程的方式操作HDFS的文件.

Hadoop学习笔记之二文件操作

HDFS分布式文件系统:优点:支持超大文件存储.流式访问.一次写入多次读取.缺点:不适应大量小文件.不适应低时延的数据访问.不适应多用户访问任意修改文件. 1.hadoop用于大数据处理,在数据量较小时,并不适用于实时性强的任务,并不是所有的job放到hadoop上,性能都会提升. 2.大量小文件的情况下会极大的降低系统的性能,所以处理前需要先将少文件聚合成大文件,map的输出也应该首先combine在传输给reduce. 3.数据传输时的IO开销,存储在内存中还是硬盘中,节点之间共享数据的分发

Hadoop学习笔记0002——HDFS文件操作

说明:Hadoop之HDFS文件操作常有两种方式,命令行方式和JavaAPI方式. 方式一:命令行方式 Hadoop文件操作命令形式为:hadoop fs -cmd <args> 说明:cmd是具体的文件操作命令,<args>是一组数目可变的参数. Hadoop最常用的文件操作命令,包括添加文件和目录.获取文件.删除文件等. 1 添加文件和目录 HDFS有一个默认工作目录/usr/$USER,其中$USER是你的登录用户名,作者的用户名是root.该目录不能自动创建,需要执行m

Hadoop第4周练习—HDFS读写文件操作

1 运行环境说明... 3 1.1 硬软件环境... 3 1.2 机器网络环境... 3 2 书面作业1:编译并运行<权威指南>中的例3.2. 3 2.1 书面作业1内容... 3 2.2 运行代码... 3 2.3 实现过程... 4 2.3.1 创建代码目录... 4 2.3.2 建立例子文件上传到hdfs中... 4 2.3.3 配置本地环境... 5 2.3.4 编写代码... 5 2.3.5 编译代码... 6

使用java api操作Hadoop文件

1. 概述 2. 文件操作 2.1 上传本地文件到hadoop fs 2.2 在hadoop fs中新建文件,并写入 2.3 删除hadoop fs上的文件 2.4 读取文件 3. 目录操作 3.1 在hadoop fs上创建目录 3.2 删除目录 3.3 读取某个目录下的所有文件 4. 参考资料接代码下载 <1>. 概述 hadoop中关于文件操作类基本上全部是在org.apache.hadoop.fs包中,这些api能够支持的操作包含:打开文件,读写文件,删除文件等. hadoop类

使用java api操作Hadoop文件 Robbin

1 package cn.hadoop.fs; 2 3 import java.io.IOException; 4 import java.io.InputStream; 5 import java.net.URI; 6 import java.net.URISyntaxException; 7 8 import org.apache.hadoop.conf.Configuration; 9 import org.apache.hadoop.fs.FSDataOutputStream; 10 i

hadoop 读取文件操作

Path hdfsPath = new Path(args[0]); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(hdfsPath.toUri(),conf); CompressionCodecFactory factory = new CompressionCodecFactory(conf); CompressionCodec codec = factory.getCodec(hdfsPat