声明
- 本文基于Centos 6.x + CDH 5.x
HttpFs 有啥用
HttpFs可以干这两件事情
- 通过HttpFs你可以在浏览器里面管理HDFS上的文件
- HttpFs还提供了一套REST 风格的API可以用来管理HDFS
其实很简单的一个东西嘛,但是很实用
安装HttpFs
在集群里面找一台可以访问hdfs的机器安装HttpFs
$ sudo yum install hadoop-httpfs
配置
编辑/etc/hadoop/conf/core-site.xml
<property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property>
这边是定义可以使用httpfs的用户组和host,写*就是不限制
配置好之后重启hadoop
启动HttpFs
$ sudo service hadoop-httpfs start
使用HttpFs
打开浏览器访问 http://host2:14000/webhdfs/v1?op=LISTSTATUS&user.name=httpfs 可以看到
{ "FileStatuses": { "FileStatus": [{ "pathSuffix": "hbase", "type": "DIRECTORY", "length": 0, "owner": "hbase", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423446940595, "blockSize": 0, "replication": 0 }, { "pathSuffix": "tmp", "type": "DIRECTORY", "length": 0, "owner": "hdfs", "group": "hadoop", "permission": "1777", "accessTime": 0, "modificationTime": 1423122488037, "blockSize": 0, "replication": 0 }, { "pathSuffix": "user", "type": "DIRECTORY", "length": 0, "owner": "hdfs", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423529997937, "blockSize": 0, "replication": 0 }, { "pathSuffix": "var", "type": "DIRECTORY", "length": 0, "owner": "hdfs", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1422945036465, "blockSize": 0, "replication": 0 }] } }
这个 &user.name=httpfs 表示用默认用户 httpfs 访问,默认用户是没有密码的。
webhdfs/v1 这是HttpFs的根目录
访问 http://host2:14000/webhdfs/v1/user?op=LISTSTATUS&user.name=httpfs 可以看到
{ "FileStatuses": { "FileStatus": [{ "pathSuffix": "cloudera", "type": "DIRECTORY", "length": 0, "owner": "root", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423472508868, "blockSize": 0, "replication": 0 }, { "pathSuffix": "hdfs", "type": "DIRECTORY", "length": 0, "owner": "hdfs", "group": "hadoop", "permission": "700", "accessTime": 0, "modificationTime": 1422947019504, "blockSize": 0, "replication": 0 }, { "pathSuffix": "history", "type": "DIRECTORY", "length": 0, "owner": "mapred", "group": "hadoop", "permission": "1777", "accessTime": 0, "modificationTime": 1422945692887, "blockSize": 0, "replication": 0 }, { "pathSuffix": "hive", "type": "DIRECTORY", "length": 0, "owner": "hive", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423123187569, "blockSize": 0, "replication": 0 }, { "pathSuffix": "hive_people", "type": "DIRECTORY", "length": 0, "owner": "root", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423216966453, "blockSize": 0, "replication": 0 }, { "pathSuffix": "hive_people2", "type": "DIRECTORY", "length": 0, "owner": "root", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423222237254, "blockSize": 0, "replication": 0 }, { "pathSuffix": "impala", "type": "DIRECTORY", "length": 0, "owner": "root", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423475272189, "blockSize": 0, "replication": 0 }, { "pathSuffix": "root", "type": "DIRECTORY", "length": 0, "owner": "root", "group": "hadoop", "permission": "700", "accessTime": 0, "modificationTime": 1423221719835, "blockSize": 0, "replication": 0 }, { "pathSuffix": "spark", "type": "DIRECTORY", "length": 0, "owner": "spark", "group": "spark", "permission": "755", "accessTime": 0, "modificationTime": 1423530243396, "blockSize": 0, "replication": 0 }, { "pathSuffix": "sqoop", "type": "DIRECTORY", "length": 0, "owner": "hdfs", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423127462911, "blockSize": 0, "replication": 0 }, { "pathSuffix": "test_hive", "type": "DIRECTORY", "length": 0, "owner": "root", "group": "hadoop", "permission": "755", "accessTime": 0, "modificationTime": 1423215687891, "blockSize": 0, "replication": 0 }] } }
很奇怪的是HttpFs的文档很少,更具体的命令要去 WebHDFS的文档里面看 WebHDFS REST API
支持的命令
Operations
- HTTP GET
- OPEN (see FileSystem.open)
- GETFILESTATUS (see FileSystem.getFileStatus)
- LISTSTATUS (see FileSystem.listStatus)
- GETCONTENTSUMMARY (see FileSystem.getContentSummary)
- GETFILECHECKSUM (see FileSystem.getFileChecksum)
- GETHOMEDIRECTORY (see FileSystem.getHomeDirectory)
- GETDELEGATIONTOKEN (see FileSystem.getDelegationToken)
- HTTP PUT
- CREATE (see FileSystem.create)
- MKDIRS (see FileSystem.mkdirs)
- RENAME (see FileSystem.rename)
- SETREPLICATION (see FileSystem.setReplication)
- SETOWNER (see FileSystem.setOwner)
- SETPERMISSION (see FileSystem.setPermission)
- SETTIMES (see FileSystem.setTimes)
- RENEWDELEGATIONTOKEN (see
DistributedFileSystem.renewDelegationToken) - CANCELDELEGATIONTOKEN (see
DistributedFileSystem.cancelDelegationToken)
- HTTP POST
- APPEND (see FileSystem.append)
- HTTP DELETE
- DELETE (see FileSystem.delete)
建立文件夹
尝试建立一个叫 abc 的文件夹
[[email protected] hadoop-httpfs]# curl -i -X PUT "http://xmseapp03:14000/webhdfs/v1/user/abc?op=MKDIRS&user.name=httpfs" HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Set-Cookie: hadoop.auth="u=httpfs&p=httpfs&t=simple&e=1423573951025&s=Ab44ha1Slg1f4xCrK+x4R/s1eMY="; Path=/; Expires=Tue, 10-Feb-2015 13:12:31 GMT; HttpOnly Content-Type: application/json Transfer-Encoding: chunked Date: Tue, 10 Feb 2015 03:12:36 GMT {"boolean":true}
然后用服务器上的hdfs dfs -ls 命令看下结果
[[email protected] conf]# hdfs dfs -ls /user Found 12 items drwxr-xr-x - httpfs hadoop 0 2015-02-10 11:12 /user/abc drwxr-xr-x - root hadoop 0 2015-02-09 17:01 /user/cloudera drwx------ - hdfs hadoop 0 2015-02-03 15:03 /user/hdfs drwxrwxrwt - mapred hadoop 0 2015-02-03 14:41 /user/history drwxr-xr-x - hive hadoop 0 2015-02-05 15:59 /user/hive drwxr-xr-x - root hadoop 0 2015-02-06 18:02 /user/hive_people drwxr-xr-x - root hadoop 0 2015-02-06 19:30 /user/hive_people2 drwxr-xr-x - root hadoop 0 2015-02-09 17:47 /user/impala drwx------ - root hadoop 0 2015-02-06 19:21 /user/root drwxr-xr-x - spark spark 0 2015-02-10 09:04 /user/spark drwxr-xr-x - hdfs hadoop 0 2015-02-05 17:11 /user/sqoop drwxr-xr-x - root hadoop 0 2015-02-06 17:41 /user/test_hive
可以看到建立了一个属于 httpfs 的文件夹 abc
打开文件
从后台上传一个文本文件 test.txt 到 /user/abc 目录下,内容是
Hello World!
用httpfs访问
[[email protected] hadoop-httpfs]# curl -i -X GET "http://xmseapp03:14000/webhdfs/v1/user/abc/test.txt?op=OPEN&user.name=httpfs" HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Set-Cookie: hadoop.auth="u=httpfs&p=httpfs&t=simple&e=1423574166943&s=JTxqIJUsblVBeHVuTs6JCV2UbBs="; Path=/; Expires=Tue, 10-Feb-2015 13:16:06 GMT; HttpOnly Content-Type: application/octet-stream Content-Length: 13 Date: Tue, 10 Feb 2015 03:16:07 GMT Hello World!
时间: 2024-10-01 05:43:09