Hadoop HDFS文件常用操作及注意事项

1.Copy a file from the local file system to HDFS

The srcFile variable needs to contain the full name (path + file name) of the file in the local file system.

The dstFile variable needs to contain the desired full name of the file in the Hadoop file system.

1 Configuration config = new Configuration();
2 FileSystem hdfs = FileSystem.get(config);
3 Path srcPath = new Path(srcFile);
4 Path dstPath = new Path(dstFile);
5 hdfs.copyFromLocalFile(srcPath, dstPath);

2.Create HDFS file

The fileName variable contains the file name and path in the Hadoop file system.

The content of the file is the buff variable which is an array of bytes.

1 //byte[] buff - The content of the file
2 //创建了一个HDFS文件,并且把buff数组的内容写到了HDFS文件中。
3   Configuration config = new Configuration();
4   FileSystem hdfs = FileSystem.get(config);
5   Path path = new Path(fileName);
6   FSDataOutputStream outputStream = hdfs.create(path);
7   outputStream.write(buff, 0, buff.length);

3.Rename HDFS file

In order to rename a file in Hadoop file system, we need the full name (path + name) of

the file we want to rename. The rename method returns true if the file was renamed, otherwise false.

1   Configuration config = new Configuration();
2   FileSystem hdfs = FileSystem.get(config);
3   Path fromPath = new Path(fromFileName);
4   Path toPath = new Path(toFileName);
5   boolean isRenamed = hdfs.rename(fromPath, toPath);

4.Delete HDFS file

In order to delete a file in Hadoop file system, we need the full name (path + name)

of the file we want to delete. The delete method returns true if the file was deleted, otherwise false.

 1   Configuration config = new Configuration();
 2   FileSystem hdfs = FileSystem.get(config);
 3   Path path = new Path(fileName);
 4   boolean isDeleted = hdfs.delete(path, false);
 5
 6 //Recursive delete:估计true是递归删除该目录下面的文件。
 7   Configuration config = new Configuration();
 8   FileSystem hdfs = FileSystem.get(config);
 9   Path path = new Path(fileName);
10   boolean isDeleted = hdfs.delete(path, true);

5.Get HDFS file last modification time

In order to get the last modification time of a file in Hadoop file system,

we need the full name (path + name) of the file.

1   Configuration config = new Configuration();
2   FileSystem hdfs = FileSystem.get(config);
3   Path path = new Path(fileName);
4   FileStatus fileStatus = hdfs.getFileStatus(path);
5   long modificationTime = fileStatus.getModificationTime

6.Check if a file exists in HDFS

In order to check the existance of a file in Hadoop file system,

we need the full name (path + name) of the file we want to check.

The exists methods returns true if the file exists, otherwise false.

1 Configuration config = new Configuration();
2 FileSystem hdfs = FileSystem.get(config);
3 Path path = new Path(fileName);
4 boolean isExists = hdfs.exists(path);

7.Get the locations of a file in the HDFS cluster

A file can exist on more than one node in the Hadoop file system cluster for two reasons:

Based on the HDFS cluster configuration, Hadoop saves parts of files on different nodes in the cluster.

Based on the HDFS cluster configuration, Hadoop saves more than one copy of each file on different nodes for redundancy (The default is three).

 1 Configuration config = new Configuration();
 2 FileSystem hdfs = FileSystem.get(config);
 3 Path path = new Path(fileName);
 4 FileStatus fileStatus = hdfs.getFileStatus(path);
 5 BlockLocation[] blkLocations = hdfs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
 6 int blkCount = blkLocations.length;
 7 for (int i = 0; i < blkCount; i++) {
 8     String[] hosts = blkLocations[i].getHosts();
 9     // Do something with the block hosts
10 }

8. Get a list of all the nodes host names in the HDFS cluster

his method casts the FileSystem Object to a DistributedFileSystem Object.

This method will work only when Hadoop is configured as a cluster.

Running Hadoop on the local machine only, in a non cluster configuration will  cause this method to throw an Exception.

1   Configuration config = new Configuration();
2   FileSystem fs = FileSystem.get(config);
3   DistributedFileSystem hdfs = (DistributedFileSystem) fs;
4   DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats();
5   String[] names = new String[dataNodeStats.length];
6   for (int i = 0; i < dataNodeStats.length; i++) {
7       names[i] = dataNodeStats[i].getHostName();
8   }

遇到的问题及解决:

1.文件append的问题

hadoop的版本1.0.4以后,API中已经有了追加写入的功能,但不建议在生产环境中使用,原因如下:Does HDFS allow appends to files? This is currently set to false because there are bugs in the "append code" and is not supported in any prodction cluster.
如果要测试的话,需要把dfs.support.appen的参数设置为true,不然客户端写入的时候会报错:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: Append to hdfs not supported. Please refer to dfs.support.append configuration parameter.

解决:修改namenode节点上的hdfs-site.xml。

  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>

2.在Hadoop-1.0.4和Hadoop-2.2的使用append时,需求:追加写入文件,如果文件不存在,需求先创建。

异常:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /huangq/dailyRolling/mommy-dailyRolling for DFSClient_-1456545217 on client 10.1.85.243 because current leaseholder is trying to recreate file.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1374)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1246)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1426)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:643)
    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)

代码版本1:(出现上述报错的代码)

1 FileSystem fs = FileSystem.get(conf);
2 Path dstPath = new Path(dst);
3 if (!fs.exists(dstPath)) {
4     fs.create(dstPath);
5 }
6 FSDataOutputStream fsout = fs.append(dstPath);
7 BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fsout));

异常的原因:fs句柄的原因,改成下面就ok了。

解决:创建完文件之后,关闭流fs,再次获得一次新的fs。

1 FileSystem fs = FileSystem.get(conf);
2 Path dstPath = new Path(dst);
3 if (!fs.exists(dstPath)) {
4     fs.create(dstPath);
5     fs.close();
6     fs = FileSystem.get(conf);
7 }
8 FSDataOutputStream fsout = fs.append(dstPath);

Hadoop HDFS文件常用操作及注意事项

时间: 2025-01-01 15:36:56

Hadoop HDFS文件常用操作及注意事项的相关文章

linux 文件常用操作

linux 文件基本操作 新建文件:touch test 不会替换重名文件,并且linux一切都是文件,文件夹和文件不能重名 新建文件夹:mkdir test使用 -p 参数,同时创建父目录(如果不存在该父目录),如下我们同时创建一个多级目录(这在安装软件.配置安装路径时非常有用):mkdir -p father/son/grandson 复制文件 cp test father/son/grandson复制文件夹:cp -r father family 必须加参数 -r 删除文件:rm test

HDFS的常用操作

本文地址:http://www.cnblogs.com/archimedes/p/hdfs-operations.html,转载请注明源地址. 1.HDFS下的文件操作 1.列出HDFS文件 通过“-ls”命令列出HDFS下的文件 [email protected]:~/opt/hadoop-0.20.2$ bin/hadoop dfs -ls 执行结果如图: 注意:在HDFS中未带参数的“-ls”命令没有返回任何值,它默认返回HDFS的“home”目录下的内容.在HDFS中,没有当前工作目录这

Hadoop HDFS的Shell操作实例

本文发表于本人博客. 我们知道HDFS是Hadoop的分布式文件系统,那既然是文件系统那最起码会有管理文件.文件夹之类的功能吧,这个类似我们的Windows操作系统一样的吧,创建.修改.删除.移动.复制.修改权限等这些操作. 那我们现在来看看hadoop下是怎么操作的. 先输入hadoop fs命令,会看到如下输出: Usage: java FsShell [-ls <path>] [-lsr <path>] [-du <path>] [-dus <path>

java对Hadoop进行文件的操作(二)

本次主要是简单的对Hadoop中HDFS中文件的操作,可自行先添加文件,或者直接上传文件操作实验. 去不代码如下: package hadoop1; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.net.MalformedURLException; import java.net.URL; import org.apache.hadoop.co

读Hadoop3.2源码,深入了解java调用HDFS的常用操作和HDFS原理

本文将通过一个演示工程来快速上手java调用HDFS的常见操作.接下来以创建文件为例,通过阅读HDFS的源码,一步步展开HDFS相关原理.理论知识的说明. 说明:本文档基于最新版本Hadoop3.2.1 目录 一.java调用HDFS的常见操作 1.1.演示环境搭建 1.2.操作HDFS 1.3.java文件操作常用方法 二.深入了解HDFS写文件的流程和HDFS原理 2.1.Hadoop3.2.1 源码下载及介绍 2.2.文件系统:FileSystem 2.3.HDFS体系结构:namenod

hadoop hdfs的java操作

访问hdfs上的文件并写出到输出台 /** * 访问hdfs上的文件并写出到输出台 * @param args */ public static void main(String[] args) { try { //将hdfs格式的url转换成系统能够识别的 URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); URL url = new URL("hdfs://hadoop1:9000/hello"); In

python 文件常用操作

1.打开文件 open()打开文件并返回文件对象,参数很多,一般用前两个,open(file,mode).file可以是文件名或者文件目录下的文件名,mode为打开的方式,可以是只读.写入.追加写入.可读可写等等. 打开某个文件,需要知道该文件的目录,或者该文件就在当前的工作目录下. (1)包含目录的文件名(不修改当前工作目录) >>> f=open('E:/python/record.txt') #或者f=open(r'E:\python\record.txt') #或者f=open(

文件常用操作

从绝对路径提取文件名 string fullPath="D:\\xxx\123.bmp";Path.GetFileName(strFilePath); 获取文件列表 string[] fileName = GetFileNameList(path, @".*\.bmp"); 获取文件夹列表 string[] subFolder = Directory.GetDirectories(path); 删除非空文件夹 Directory.Delete(path, true)

iOS中NSFileManager文件常用操作整合

//获取Document路径 + (NSString *)getDocumentPath { NSArray *filePaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES); return [filePaths objectAtIndex:0]; } //获取Library路径 + (NSString *)getLibraryPath { NSArray *filePaths