HDFS Java API的使用举例

  HDFS是Hadoop应用程序使用的主要分布式存储。HDFS集群主要由管理文件系统元数据的NameNode和存储实际数据的DataNodes组成,HDFS架构图描述了NameNode,DataNode和客户端之间的基本交互。客户端联系NameNode进行文件元数据或文件修改,并直接使用DataNodes执行实际的文件I / O。

  Hadoop支持shell命令直接与HDFS进行交互,同时也支持JAVA API对HDFS的操作,例如,文件的创建、删除、上传、下载、重命名等。

  HDFS中的文件操作主要涉及以下几个类:

    Configuration:提供对配置参数的访问

    FileSystem:文件系统对象

    Path:在FileSystem中命名文件或目录。 路径字符串使用斜杠作为目录分隔符。 如果以斜线开始,路径字符串是绝对的

    FSDataInputStream和FSDataOutputStream:这两个类分别是HDFS中的输入和输出流

下面是JAVA API对HDFS的操作过程:

1.项目结构

2.pom.xml配置

 1 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 2     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 3     <modelVersion>4.0.0</modelVersion>
 4
 5     <groupId>com.zjl</groupId>
 6     <artifactId>myhadoop</artifactId>
 7     <version>0.0.1-SNAPSHOT</version>
 8     <packaging>jar</packaging>
 9
10     <name>myhadoop</name>
11     <url>http://maven.apache.org</url>
12
13     <properties>
14         <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
15         <hadoop.version>2.5.0</hadoop.version>
16     </properties>
17
18     <dependencies>
19         <dependency>
20             <groupId>org.apache.hadoop</groupId>
21             <artifactId>hadoop-client</artifactId>
22             <version>${hadoop.version}</version>
23         </dependency>
24         <dependency>
25             <groupId>junit</groupId>
26             <artifactId>junit</artifactId>
27             <version>3.8.1</version>
28         </dependency>
29     </dependencies>
30 </project>

3.拷贝hadoop安装目录下与HDFS相关的配置(core-site.xml,hdfs-site.xml,log4j.properties)到resource目录下

1 [[email protected] ~]$ cd /opt/modules/hadoop-2.6.5/etc/hadoop/
2 [[email protected] hadoop]$ cp core-site.xml hdfs-site.xml log4j.properties /opt/tools/workspace/myhadoop/src/main/resource/
3 [[email protected] hadoop]$

(1)core-site.xml

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 <!--
 4   Licensed under the Apache License, Version 2.0 (the "License");
 5   you may not use this file except in compliance with the License.
 6   You may obtain a copy of the License at
 7
 8     http://www.apache.org/licenses/LICENSE-2.0
 9
10   Unless required by applicable law or agreed to in writing, software
11   distributed under the License is distributed on an "AS IS" BASIS,
12   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13   See the License for the specific language governing permissions and
14   limitations under the License. See accompanying LICENSE file.
15 -->
16
17 <!-- Put site-specific property overrides in this file. -->
18
19 <configuration>
20     <property>
21         <name>fs.defaultFS</name>
22         <!-- 如果没有配置,默认会从本地文件系统读取数据 -->
23         <value>hdfs://hadoop01.zjl.com:9000</value>
24     </property>
25     <property>
26         <name>hadoop.tmp.dir</name>
27         <!-- hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默认就放在这个路径中 -->
28         <value>/opt/modules/hadoop-2.6.5/data/tmp</value>
29     </property>
30 </configuration>

(2)hdfs-site.xml

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 <!--
 4   Licensed under the Apache License, Version 2.0 (the "License");
 5   you may not use this file except in compliance with the License.
 6   You may obtain a copy of the License at
 7
 8     http://www.apache.org/licenses/LICENSE-2.0
 9
10   Unless required by applicable law or agreed to in writing, software
11   distributed under the License is distributed on an "AS IS" BASIS,
12   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13   See the License for the specific language governing permissions and
14   limitations under the License. See accompanying LICENSE file.
15 -->
16
17 <!-- Put site-specific property overrides in this file. -->
18
19 <configuration>
20     <property>
21         <!-- default value 3 -->
22         <name>dfs.replication</name>
23         <value>1</value>
24     </property>
25 </configuration>

(3)采用默认即可,打印hadoop的日志信息所需的配置文件。如果不配置,运行程序时eclipse控制台会提示警告

4.启动hadoop的hdfs的守护进程,并在hdfs文件系统中创建文件(文件共步骤5中java程序读取)

 1 [[email protected] hadoop]$ cd /opt/modules/hadoop-2.6.5/
 2 [[email protected] hadoop-2.6.5]$ sbin/start-dfs.sh
 3 17/06/21 22:59:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 4 Starting namenodes on [hadoop01.zjl.com]
 5 hadoop01.zjl.com: starting namenode, logging to /opt/modules/hadoop-2.6.5/logs/hadoop-hadoop-namenode-hadoop01.zjl.com.out
 6 hadoop01.zjl.com: starting datanode, logging to /opt/modules/hadoop-2.6.5/logs/hadoop-hadoop-datanode-hadoop01.zjl.com.out
 7 Starting secondary namenodes [0.0.0.0]
 8 0.0.0.0: starting secondarynamenode, logging to /opt/modules/hadoop-2.6.5/logs/hadoop-hadoop-secondarynamenode-hadoop01.zjl.com.out
 9 17/06/21 23:00:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
10 [[email protected] hadoop-2.6.5]$ jps
11 3987 NameNode
12 4377 Jps
13 4265 SecondaryNameNode
14 4076 DataNode
15 3135 org.eclipse.equinox.launcher_1.3.201.v20161025-1711.jar
16 [[email protected] hadoop-2.6.5]$ bin/hdfs dfs -mkdir -p /user/hadoop/mapreduce/wordcount/input
17 17/06/21 23:07:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18 [[email protected] hadoop-2.6.5]$ cat wcinput/wc.input
19 hadoop yarn
20 hadoop mapreduce
21 hadoop hdfs
22 yarn nodemanager
23 hadoop resourcemanager
24 [[email protected] hadoop-2.6.5]$ bin/hdfs dfs -put wcinput/wc.input /user/hadoop/mapreduce/wordcount/input/
25 17/06/21 23:20:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
26 [[email protected] hadoop-2.6.5]$

5.java代码

 1 package com.zjl.myhadoop;
 2
 3 import java.io.File;
 4 import java.io.FileInputStream;
 5
 6 import org.apache.hadoop.conf.Configuration;
 7 import org.apache.hadoop.fs.FSDataInputStream;
 8 import org.apache.hadoop.fs.FSDataOutputStream;
 9 import org.apache.hadoop.fs.FileSystem;
10 import org.apache.hadoop.fs.Path;
11 import org.apache.hadoop.io.IOUtils;
12
13 /**
14  *
15  * @author hadoop
16  *
17  */
18 public class HdfsApp {
19
20     /**
21      * get file system
22      * @return
23      * @throws Exception
24      */
25     public static FileSystem getFileSystem() throws Exception{
26         //read configuration
27         //core-site.xml,core-default-site.xml,hdfs-site.xml,hdfs-default-site.xml
28         Configuration conf = new Configuration();
29         //create file system
30         FileSystem fileSystem = FileSystem.get(conf);
31         return fileSystem;
32     }
33
34     /**
35      * read file from hdfs file system,output to the console
36      * @param fileName
37      * @throws Exception
38      */
39     public static void read(String fileName) throws Exception {
40         //read path
41         Path readPath = new Path(fileName);
42         //get file system
43         FileSystem fileSystem = getFileSystem();
44         //open file
45         FSDataInputStream inStream = fileSystem.open(readPath);
46         try{
47             //read file
48             IOUtils.copyBytes(inStream, System.out, 4096, false);
49         }catch (Exception e) {
50             e.printStackTrace();
51         }finally {
52             //io close
53             IOUtils.closeStream(inStream);
54         }
55     }
56
57     public static void upload(String inFileName, String outFileName) throws Exception {
58
59         //file input stream,local file
60         FileInputStream inStream = new FileInputStream(new File(inFileName));
61
62         //get file system
63         FileSystem fileSystem = getFileSystem();
64         //write path,hdfs file system
65         Path writePath = new Path(outFileName);
66
67         //output stream
68         FSDataOutputStream outStream = fileSystem.create(writePath);
69         try{
70             //write file
71             IOUtils.copyBytes(inStream, outStream, 4096, false);
72         }catch (Exception e) {
73             e.printStackTrace();
74         }finally {
75             //io close
76             IOUtils.closeStream(inStream);
77             IOUtils.closeStream(outStream);
78         }
79     }
80     public static void main( String[] args ) throws Exception {
81         //1.read file from hdfs to console
82 //        String fileName = "/user/hadoop/mapreduce/wordcount/input/wc.input";
83 //        read(fileName);
84
85         //2.upload file from local file system to hdfs file system
86         //file input stream,local file
87         String inFileName = "/opt/modules/hadoop-2.6.5/wcinput/wc.input";
88         String outFileName = "/user/hadoop/put-wc.input";
89         upload(inFileName, outFileName);
90     }
91 }

6.调用方法 read(fileName)

7.进入hdfs文件系统查看/user/hadoop目录

8.调用upload(inFileName, outFileName),然后刷新步骤7的页面,文件上传成功

时间: 2024-08-08 21:55:57

HDFS Java API的使用举例的相关文章

HDFS Java API 常用操作

package com.luogankun.hadoop.hdfs.api; import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.

Hadoop HDFS Java API

[toc] Hadoop HDFS Java API 主要是Java操作HDFS的一些常用代码,下面直接给出代码: package com.uplooking.bigdata.hdfs; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.*; import org.apache.hadoop.fs.permission.FsPermission; import org.apache.hadoop.io

HDFS Java API使用之读取上传文件

package com.ibeifeng.hadoop.senior.hdfs; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOu

【Hadoop】HA 场景下访问 HDFS JAVA API Client

客户端需要指定ns名称,节点配置,ConfiguredFailoverProxyProvider等信息. 代码示例: package cn.itacst.hadoop.hdfs; import java.io.FileInputStream; import java.io.InputStream; import java.io.OutputStream; import java.net.URI; import org.apache.hadoop.conf.Configuration; impor

HDFS JAVA API

http://hadoop.apache.org/docs/r2.5.2/api/index.html 1 import org.apache.hadoop.conf.Configuration; 2 import org.apache.hadoop.fs.*; 3 import org.apache.hadoop.io.IOUtils; 4 import org.junit.Before; 5 import org.junit.Test; 6 7 import javax.security.a

hadoop学习记录(二)HDFS java api

FSDateinputStream 对象 FileSystem对象中的open()方法返回的是FSDateInputStream对象,改类继承了java.io.DateInoutStream接口.支持随机访问 Seekable接口 支持在文件中找到指定位置,并提供一个查询当前位置相对于文件起始位置偏移量的查询方法. public interface Seekable{ //seek()可以移到文件中任意一个绝对位置 void seek(long pos); long getPos(); bool

Sample: Write And Read data from HDFS with java API

HDFS: hadoop distributed file system 它抽象了整个集群的存储资源,可以存放大文件. 文件采用分块存储复制的设计.块的默认大小是64M. 流式数据访问,一次写入(现支持append),多次读取. 不适合的方面: 低延迟的数据访问 解决方案:HBASE 大量的小文件 解决方案:combinefileinputformat ,或直接把小文件合并成sequencefile存储到hdfs. HDFS的块 块是独立的存储单元.但是如果文件小于默认的块大小如64M,它不会占

使用HDFS客户端java api读取hadoop集群上的信息

本文介绍使用hdfs java api的配置方法. 1.先解决依赖,pom <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.2</version> <scope>provided</scope> </dependency> 2.配置文

IDEA 创建HDFS项目 JAVA api

1.创建quickMaven 1.在properties中写hadoop 的版本号并且通过EL表达式的方式映射到dependency中 2.写一个repostory将依赖加载到本地仓库中 这是加载完成的页面 这是开发代码 package com.kevin.hadoop; import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import org.apache.hadoop.io.IOUtils;imp