Hadoop--04--使用java api操作

1. 概述

2. 文件操作

2.1 上传本地文件到hadoop fs

2.2 在hadoop fs中新建文件，并写入

2.3 删除hadoop fs上的文件

2.4 读取文件

3. 目录操作

3.1 在hadoop fs上创建目录

3.2 删除目录

3.3 读取某个目录下的所有文件

4. 参考资料接代码下载

<1>. 概述

hadoop中关于文件操作类基本上全部是在org.apache.hadoop.fs包中，这些api能够支持的操作包含：打开文件，读写文件，删除文件等。

hadoop类库中最终面向用户提供的接口类是FileSystem，该类是个抽象类，只能通过来类的get方法得到具体类。get方法存在几个重载版本，常用的是这个：

static FileSystem get(Configuration conf);

该类封装了几乎所有的文件操作,例如mkdir，delete等。综上基本上可以得出操作文件的程序库框架：

operator()

{

得到Configuration对象

得到FileSystem对象

进行文件操作

}

另外需要注意的是，如果想要运行下面的程序的话，需要将程序达成jar包，然后通过hadoop jar的形式运行，这种方法比较麻烦，另外一种方法就是安装eclipse的hadoop插件，这样能够很多打包的时间。

<1>. 文件操作

1.1 上传本地文件到文件系统

* upload the local file to the hds

* notice that the path is full like /tmp/test.c

public static void uploadLocalFile2HDFS(String s, String d)

throws IOException

{

Configuration config = new Configuration();

FileSystem hdfs = FileSystem.get(config);

Path src = new Path(s);

Path dst = new Path(d);

hdfs.copyFromLocalFile(src, dst);

hdfs.close();

}

1.2 创建新文件，并写入

* create a new file in the hdfs.

* notice that the toCreateFilePath is the full path

* and write the content to the hdfs file.

public static void createNewHDFSFile(String toCreateFilePath, String content) throws IOException

{

Configuration config = new Configuration();

FileSystem hdfs = FileSystem.get(config);

FSDataOutputStream os = hdfs.create(new Path(toCreateFilePath));

os.write(content.getBytes("UTF-8"));

os.close();

hdfs.close();

}

1.3 删除文件

* delete the hdfs file

* notice that the dst is the full path name

public static boolean deleteHDFSFile(String dst) throws IOException

{

Configuration config = new Configuration();

FileSystem hdfs = FileSystem.get(config);

Path path = new Path(dst);

boolean isDeleted = hdfs.delete(path);

hdfs.close();

return isDeleted;

}

1.4 读取文件

/** read the hdfs file content

* notice that the dst is the full path name

public static byte[] readHDFSFile(String dst) throws Exception

{

Configuration conf = new Configuration();

FileSystem fs = FileSystem.get(conf);

// check if the file exists

Path path = new Path(dst);

if ( fs.exists(path) )

{

FSDataInputStream is = fs.open(path);

// get the file info to create the buffer

FileStatus stat = fs.getFileStatus(path);

// create the buffer

byte[] buffer = new byte[Integer.parseInt(String.valueOf(stat.getLen()))];

is.readFully(0, buffer);

is.close();

fs.close();

return buffer;

}

else

{

throw new Exception("the file is not found .");

}

<2>. 目录操作

2.1 创建目录

/** make a new dir in the hdfs

* the dir may like ‘/tmp/testdir‘

public static void mkdir(String dir) throws IOException

{

Configuration conf = new Configuration();

FileSystem fs = FileSystem.get(conf);

fs.mkdirs(new Path(dir));

fs.close();

}

2.2 删除目录

/** delete a dir in the hdfs

* dir may like ‘/tmp/testdir‘

public static void deleteDir(String dir) throws IOException

{

Configuration conf = new Configuration();

FileSystem fs = FileSystem.get(conf);

fs.delete(new Path(dir));

fs.close();

}

2.3 读取某个目录下的所有文件

public static void listAll(String dir) throws IOException

{

Configuration conf = new Configuration();

FileSystem fs = FileSystem.get(conf);

FileStatus[] stats = fs.listStatus(new Path(dir));

for(int i = 0; i < stats.length; ++i)

{

if (stats[i].isFile())

{

// regular file

System.out.println(stats[i].getPath().toString());

}

else if (stats[i].isDirectory())

{

// dir

System.out.println(stats[i].getPath().toString());

}

else if(stats[i].isSymlink())

{

// is s symlink in linux

System.out.println(stats[i].getPath().toString());

}

fs.close();

}

<4>. 参考资料及代码下载

/Files/xuqiang/HadoopFSOperations.rar

注意的是如果是操作hdfs上文件的话，需要将hadoop-core和common-log的jar包添加到classpath中，在本地运行，如果是mapreduce程序的话，需要将程序打成jar包，之后上传到hdfs上，才能运行。

时间： 2024-11-01 23:19:32

Hadoop--04--使用java api操作的相关文章

Hadoop读书笔记（三）Java API操作HDFS

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 JAVA URL 操作HDFS OperateByURL.java package hdfs; import java.io.InputStream; import jav

使用java api操作Hadoop文件 Robbin

1 package cn.hadoop.fs; 2 3 import java.io.IOException; 4 import java.io.InputStream; 5 import java.net.URI; 6 import java.net.URISyntaxException; 7 8 import org.apache.hadoop.conf.Configuration; 9 import org.apache.hadoop.fs.FSDataOutputStream; 10 i

Java API操作HDFS

HDFS是存储数据的分布式文件系统,对HDFS的操作,就是对文件系统的操作,除了用HDFS的shell命令对文件系统进行操作,我们也可以利用Java API对文件系统进行操作,比如文件的创建.删除.修改权限等等,还有文件夹的创建.删除.重命名等等. 使用Java API对文件系统进行操作主要涉及以下几个类: 1.Configuration类:该类的对象封装了客户端或者服务端的配置. 2.FileSystem类:该类的对象是一个文件系统对象,可以利用该对象的一些方法来对文件进行操作,FileSys

HDFS基础和java api操作

1. 概括适合一次写入多次查询情况,不支持并发写情况通过hadoop shell 上传的文件存放在DataNode的block中,通过linux shell只能看见block,看不见文件(HDFS将客户端的大文件存放在很多节点的数据块中,Block本质上是一个逻辑概念,它是hdfs读写数据的基本单位) HDFS中,如果一个文件小于一个数据块的大小,并不占用整个数据块存储空间 2. fs 可以使用hdfs shell操作hdfs,常用 fs命令如下: eg: hadoop fs -cat fi

hive-通过Java API操作

通过Java API操作hive,算是测试hive第三种对外接口测试hive 服务启动 1 package org.admln.hive; 2 3 import java.sql.SQLException; 4 import java.sql.Connection; 5 import java.sql.ResultSet; 6 import java.sql.Statement; 7 import java.sql.DriverManager; 8 9 public class testHiv

大数据技术之_20_Elasticsearch学习_01_概述 + 快速入门 + Java API 操作 + 创建、删除索引 + 新建、搜索、更新删除文档 + 条件查询 + 映射操作

一概述1.1 什么是搜索?1.2 如果用数据库做搜索会怎么样?1.3 什么是全文检索和 Lucene?1.4 什么是 Elasticsearch?1.5 Elasticsearch 的适用场景1.6 Elasticsearch 的特点1.7 Elasticsearch 的核心概念1.7.1 近实时1.7.2 Cluster(集群)1.7.3 Node(节点)1.7.4 Index(索引 --> 数据库)1.7.5 Type(类型 --> 表)1.7.6 Document(文档 -->

Java API操作HA方式下的Hadoop

通过java api连接Hadoop集群时,如果集群支持HA方式,那么可以通过如下方式设置来自动切换到活动的master节点上.其中,ClusterName 是可以任意指定的,跟集群配置无关,dfs.ha.namenodes.ClusterName也可以任意指定名称,有几个master就写几个,后面根据相应的设置添加master节点地址即可. private static String ClusterName = "nsstargate"; private static final S

Mongodb java api操作

本篇文章主要介绍了mongodb对应java的常用增删改查的api,以及和spring集成后mongoTemplate的常用方法使用,废话不多说,直接上代码: 1.首先上需要用到的两个实体类User和Home,对应用户和家乡 import java.util.List;import org.springframework.data.mongodb.core.mapping.Document;/** * java类转换为mongodb的文档,它有以下几种注释: * [email protected

使用Java API操作hdfs

如题我就是一个标题党就是使用JavaApi操作HDFS,使用的是MAVEN,操作的环境是Linux 首先要配置好Maven环境,我使用的是已经有的仓库,如果你下载的jar包速度慢,可以改变Maven 下载jar包的镜像站改为阿里云. 贴一下 pom.xml 使用到的jar包 <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifa

HBase 6、用Phoenix Java api操作HBase

开发环境准备:eclipse3.5.jdk1.7.window8.hadoop2.2.0.hbase0.98.0.2.phoenix4.3.0 1.从集群拷贝以下文件:core-site.xml.hbase-site.xml.hdfs-site.xml文件放到工程src下 2.把phoenix的phoenix-4.3.0-client.jar和phoenix-core-4.3.0.jar添加到工程classpath 3.配置集群中各节点的hosts文件,把客户端的hostname:IP添加进去