HDFS命令行操作和 api操作

HDFS，是Hadoop Distributed File System的简称，是Hadoop抽象文件系统的一种实现。Hadoop抽象文件系统可以与本地系统、Amazon S3等集成，甚至可以通过Web协议（webhsfs）来操作。HDFS的文件分布在集群机器上，同时提供副本进行容错及可靠性保证。例如客户端写入读取文件的直接操作都是分布在集群各个机器上的，没有单点性能压力。

HDFS相关的搭建可以看我前面的一篇博文，我们今天主要来讲下怎么操作hdfs的api和 hdfs命令行，

java内操作HDFS需要先配置仓库

<repositories>
  <repository>
	<id>cloudera</id>
	<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
  </repository>
</repositories>
//导包
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-client</artifactId>
  <version>${hadoop.version}</version>
</dependency>

例子：通过api创建目录

this.configuration = new Configuration();
this.fileSystem = FileSystem.get(new URI(this.HDFS_PATH),configuration,"hadoop");
Path path = new Path("/hdfsapi/test");
boolean result  = fileSystem.mkdirs(path);

通过API读取文件，回写到本地

Path path = new Path("/gwyy.txt");
FSDataInputStream fsDataInputStream = fileSystem.open(path);
FileOutputStream fileOutputStream = new FileOutputStream(new File("a.txt"));
byte[] buffer = new byte[1024];
int length = 0;
StringBuffer sb = new StringBuffer();
while( (  length = fsDataInputStream.read(buffer)) != -1) {
	sb.append(new String(buffer,0,buffer.length));
fileOutputStream.write(buffer,0,buffer.length);
}
System.out.println(sb.toString());

HDFS 创建文件并且写入内容

FSDataOutputStream out = fileSystem.create(new Path("/fuck.txt"));
out.writeUTF("aaabbb");
out.flush();
out.close();

HDFS 重名

boolean a = fileSystem.rename(new Path("/fuck.txt"),new Path("/fuck.aaa"));
System.out.println(a);

HDFS拷贝文件

fileSystem.copyFromLocalFile(new Path("a.txt"),new Path("/copy_a.txt"));

HDFS上传大文件

InputStream in = new BufferedInputStream(new FileInputStream(new File("hive-1.1.0-cdh5.15.1.tar.gz")));
Path dst = new Path("/hive.tar.gz");
//显示进度条
FSDataOutputStream out = fileSystem.create(dst, new Progressable() {
	@Override
	public void progress() {
		System.out.flush();
		System.out.print(‘.‘);
	}
});
byte[] buffer = new byte[4096];
int length = 0;
//写入到 hdfs
while((length = in.read(buffer,0,buffer.length)) != -1) {
	out.write(buffer,0,buffer.length);
}

HDFS下载文件

fileSystem.copyToLocalFile(new Path("/fuck.aaa"),new Path("./"));

HDFS 列出所有文件

FileStatus[] fileStatuses = fileSystem.listStatus(new Path("/"));
for (FileStatus f:fileStatuses) {
	System.out.println(f.getPath());
}

HDFS 递归列出文件

RemoteIterator<LocatedFileStatus>  remoteIterator = fileSystem.listFiles(new Path("/"),true);
while(remoteIterator.hasNext()) {
	LocatedFileStatus file =  remoteIterator.next();
	System.out.println(file.getPath());
}

HDFS查看文件区块

FileStatus fileStatus = fileSystem.getFileStatus(new Path("/jdk-8u221-linux-x64.tar.gz"));
BlockLocation[] blockLocations = fileSystem.getFileBlockLocations(fileStatus,0,fileStatus.getLen());
//查看区块
for (BlockLocation b:blockLocations) {
	for (String name:b.getNames()) {
		System.out.println(name + b.getOffset() + b.getLength());
	}
}

HDFS删除文件

如果路径是目录并设置为*如果为true，则删除目录，否则引发异常。在*对于文件，递归可以设置为true或false。
boolean a = fileSystem.delete(new Path("/gwyy.txt"),true);
System.out.println(a);

下面我们介绍下HDFS的命令行操作

查看 hdfs 文件根目录

hadoop fs -ls /

上传文件到 hdfs的根目录

hadoop fs -put  gwyy.txt  /

从本地拷贝文件到hdfs

hf -copyFromLocal xhc.txt  /

####从本地移动文件到hdfs 本地文件删除 hf -moveFromLocal a.txt /

查看文件内容

hadoop fs -cat /gwyy.txt
hadoop fs -text  /gwyy.txt

从 hdfs里拿文件到本地

hadoop fs -get /a.txt  ./

HDFS创建文件夹

hadoop fs -mkdir  /hdfs-test

从A文件夹移动到B文件夹

hadoop fs -mv /a.txt  /hdfs-test/a.txt

文件复制操作

hadoop fs -cp /hdfs-test/a.txt /hdfs-test/a.txt.back

把多个文件合并到一起导出来

hadoop fs -getmerge /hdfs-test ./t.txt

删除一个文件

 hf -rm /hdfs-test/a.txt.back

删除一个目录

hadoop fs -rmdir /hdfs-test   只能删除空目录
hadoop fs -rm -r /hdfs-test  删除目录不管有没有东西都删

原文地址：https://www.cnblogs.com/gwyy/p/12205199.html

时间： 2024-11-09 20:57:38

HDFS命令行操作和 api操作的相关文章

HDFS命令行及JAVA API操作

查看进程 jps 访问hdfs: hadoop-root:50070 hdfs bash命令: hdfs dfs <1> -help: 显示命令的帮助的信息 <2> -mkdir:创建一个新目录 -p -mkdir -p /test1/test2:创建目录,p可以创建不存在的父路径 <3> -ls:显示当前目录下的所有文件(-ls -R) -R:递归地查看 <4> -put LICENSE.txt /test 将本地文件上传到HDFS上

从命令行运行django数据库操作

从命令行运行django数据库操作,报错: django.core.exceptions.ImproperlyConfigured: Requested setting DEFAULT_INDEX_TABLESPACE, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before

命令行启动win7系统操作部分功能

control.exe /name microsoft.folderoptions 启动资源管理器的文件夹属性选项卡 control.exe /name Microsoft.AddHardware 控制面板所有控制面板项设备和打印机 control.exe /name Microsoft.AdministrativeTools 控制面板所有控制面板项管理工具 control.exe /name Microsoft.PeopleNearMe 网络邻居 control.exe /name Mic

从命令行执行django数据库操作

从命令行执行django数据库操作,报错: django.core.exceptions.ImproperlyConfigured: Requested setting DEFAULT_INDEX_TABLESPACE, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before

ZooKeeper系列3：ZooKeeper命令、命令行工具及简单操作

问题导读1.ZooKeeper包含哪些常用命令?2.通过什么命令可以列出服务器 watch 的详细信息?3.ZooKeeper包含哪些操作?4.ZooKeeper如何创建zookeeper? 常用命令 ZooKeeper 支持某些特定的四字命令字母与其的交互.它们大多是查询命令,用来获取 ZooKeeper 服务的当前状态及相关信息.用户在客户端可以通过 telnet 或 nc 向 ZooKeeper 提交相应的命令. ZooKeeper 常用四字命令见下表 1 所示: 表 1 : ZooKee

大数据之HDFS命令行基本操作

1. 课程简介 HDFS是Hadoop大数据平台中的分布式文件系统,为上层应用或其他大数据组件提供数据存储,如Hive,Mapreduce,Spark,HBase等. 本文章中所有命令均在CentOS-6.4-x86_64,hadoop-2.5.2,jdk1.8.0_152,zookeeper-3.4.11中运行通过,为减少linux权限对初学者造成影响,所有命令均在linux的root权限下进行操作. 2.理论回顾 Hadoop技术本身包含HDFS.Map/Reduce.HDFS作海量数据存储

CloudFoundry命令行和Kubernetes命令行的Restful API消费方式

先说CloudFoundry的命令行工具CLI.我们在CloudFoundry环境下工作,第一个使用的命令就是cf login. 如果在环境变量里维护CF_TRACE的值为true: 则我们能发现,诸如cf login这种命令,实际上也是通过消费Restful API来完成的. 下图是cf login这个命令的api endpoint请求细节,供大家参考: API endpoint: https://api.cf.eu10.hana.ondemand.com REQUEST: [2018-09-

命令行，对表的操作

原文地址:http://blog.163.com/zhangjie_0303/blog/static/99082706201191911653778/ ALTER TABLE:添加,修改,删除表的列,约束等表的定义. 查看列:desc 表名; 修改表名:alter table t_book rename to bbb; 添加列:alter table 表名 add column 列名 varchar(30); 删除列:alter table 表名 drop column 列名; 修改列名MySQ

HDFS基本命令行操作及上传文件的简单API

一.HDFS基本命令行操作: 1.HDFS集群修改SecondaryNameNode位置到hd09-2 (1)修改hdfs-site.xml <configuration> //配置元数据存储位置 <property> <name>dfs.namenode.name.dir</name> <value>/root/hd/dfs/name</value> </property> //配置数据存储位置 <property

HDFS命令行操作 和 api操作