[转载]hdfs c/c++ API

原文链接:http://blog.csdn.net/sprintfwater/article/details/8996214

1.建立、关闭与HDFS连接:hdfsConnect()、hdfsConnectAsUser()、hdfsDisconnect()。hdfsConnect()实际上是直接调用hdfsConnectAsUser。

2.打开、关闭HDFS文件:hdfsOpenFile()、hdfsCloseFile()。当用hdfsOpenFile()创建文件时,可以指定replication和blocksize参数。写打开一个文件时,隐含O_TRUNC标志,文件会被截断,写入是从文件头开始的。

3.读HDFS文件:hdfsRead()、hdfsPread()。两个函数都有可能返回少于用户要求的字节数,此时可以再次调用这两个函数读入剩下的部分(类似APUE中的readn实现);只有在两个函数返回零时,我们才能断定到了文件末尾。

4.写HDFS文件:hdfsWrite()。HDFS不支持随机写,只能是从文件头顺序写入。

5.查询HDFS文件信息:hdfsGetPathInfo()

6.查询和设置HDFS文件读写偏移量:hdfsSeek()、hdfsTell()

7.查询数据块所在节点信息:hdfsGetHosts()。返回一个或多个数据块所在数据节点的信息,一个数据块可能存在多个数据节点上。

8.libhdfs中的函数是通过jni调用JAVA虚拟机,在虚拟机中构造对应的HDFS的JAVA类,然后反射调用该类的功能函数。总会发生JVM和程序之间内存拷贝的动作,性能方面值得注意。

9.HDFS不支持多个客户端同时写入的操作,无文件或是记录锁的概念。

10.建议只有超大文件才应该考虑放在HDFS上,而且最好对文件的访问是写一次,读多次。小文件不应该考虑放在HDFS上,得不偿失!

  1 /**
  2  * Licensed to the Apache Software Foundation (ASF) under one
  3  * or more contributor license agreements.  See the NOTICE file
  4  * distributed with this work for additional information
  5  * regarding copyright ownership.  The ASF licenses this file
  6  * to you under the Apache License, Version 2.0 (the
  7  * "License"); you may not use this file except in compliance
  8  * with the License.  You may obtain a copy of the License at
  9  *
 10  *     http://www.apache.org/licenses/LICENSE-2.0
 11  *
 12  * Unless required by applicable law or agreed to in writing, software
 13  * distributed under the License is distributed on an "AS IS" BASIS,
 14  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15  * See the License for the specific language governing permissions and
 16  * limitations under the License.
 17  */
 18
 19 #ifndef LIBHDFS_HDFS_H
 20 #define LIBHDFS_HDFS_H
 21
 22 #include <sys/types.h>
 23 #include <sys/stat.h>
 24
 25 #include <fcntl.h>
 26 #include <stdio.h>
 27 #include <stdint.h>
 28 #include <string.h>
 29 #include <stdlib.h>
 30 #include <time.h>
 31 #include <errno.h>
 32
 33 #include <jni.h>
 34
 35 #ifndef O_RDONLY
 36 #define O_RDONLY 1
 37 #endif
 38
 39 #ifndef O_WRONLY
 40 #define O_WRONLY 2
 41 #endif
 42
 43 #ifndef EINTERNAL
 44 #define EINTERNAL 255
 45 #endif
 46
 47
 48 /** All APIs set errno to meaningful values */
 49 #ifdef __cplusplus
 50 extern  "C" {
 51 #endif
 52
 53     /**
 54      * Some utility decls used in libhdfs.
 55      */
 56
 57     typedef int32_t   tSize; /// size of data for read/write io ops
 58     typedef time_t    tTime; /// time type
 59     typedef int64_t   tOffset;/// offset within the file
 60     typedef uint16_t  tPort; /// port
 61     typedef enum tObjectKind {
 62         kObjectKindFile = ‘F‘,
 63         kObjectKindDirectory = ‘D‘,
 64     } tObjectKind;
 65
 66
 67     /**
 68      * The C reflection of org.apache.org.hadoop.FileSystem .
 69      */
 70     typedef void* hdfsFS;
 71
 72
 73     /**
 74      * The C equivalent of org.apache.org.hadoop.FSData(Input|Output)Stream .
 75      */
 76     enum hdfsStreamType
 77     {
 78         UNINITIALIZED = 0,
 79         INPUT = 1,
 80         OUTPUT = 2,
 81     };
 82
 83
 84     /**
 85      * The ‘file-handle‘ to a file in hdfs.
 86      */
 87     struct hdfsFile_internal {
 88         void* file;
 89         enum hdfsStreamType type;
 90     };
 91     typedef struct hdfsFile_internal* hdfsFile;
 92
 93
 94     /**
 95      * hdfsConnect - Connect to a hdfs file system.
 96      * Connect to the hdfs.
 97      * @param host A string containing either a host name, or an ip address
 98      * of the namenode of a hdfs cluster. ‘host‘ should be passed as NULL if
 99      * you want to connect to local filesystem. ‘host‘ should be passed as
100      * ‘default‘ (and port as 0) to used the ‘configured‘ filesystem
101      * (hadoop-site/hadoop-default.xml).
102      * @param port The port on which the server is listening.
103      * @return Returns a handle to the filesystem or NULL on error.
104      */
105     hdfsFS hdfsConnect(const char* host, tPort port);
106
107
108     /**
109      * hdfsDisconnect - Disconnect from the hdfs file system.
110      * Disconnect from hdfs.
111      * @param fs The configured filesystem handle.
112      * @return Returns 0 on success, -1 on error.
113      */
114     int hdfsDisconnect(hdfsFS fs);
115
116
117     /**
118      * hdfsOpenFile - Open a hdfs file in given mode.
119      * @param fs The configured filesystem handle.
120      * @param path The full path to the file.
121      * @param flags Either O_RDONLY or O_WRONLY, for read-only or write-only.
122      * @param bufferSize Size of buffer for read/write - pass 0 if you want
123      * to use the default configured values.
124      * @param replication Block replication - pass 0 if you want to use
125      * the default configured values.
126      * @param blocksize Size of block - pass 0 if you want to use the
127      * default configured values.
128      * @return Returns the handle to the open file or NULL on error.
129      */
130     hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
131                           int bufferSize, short replication, tSize blocksize);
132
133
134     /**
135      * hdfsCloseFile - Close an open file.
136      * @param fs The configured filesystem handle.
137      * @param file The file handle.
138      * @return Returns 0 on success, -1 on error.
139      */
140     int hdfsCloseFile(hdfsFS fs, hdfsFile file);
141
142
143     /**
144      * hdfsExists - Checks if a given path exsits on the filesystem
145      * @param fs The configured filesystem handle.
146      * @param path The path to look for
147      * @return Returns 0 on success, -1 on error.
148      */
149     int hdfsExists(hdfsFS fs, const char *path);
150
151
152     /**
153      * hdfsSeek - Seek to given offset in file.
154      * This works only for files opened in read-only mode.
155      * @param fs The configured filesystem handle.
156      * @param file The file handle.
157      * @param desiredPos Offset into the file to seek into.
158      * @return Returns 0 on success, -1 on error.
159      */
160     int hdfsSeek(hdfsFS fs, hdfsFile file, tOffset desiredPos);
161
162
163     /**
164      * hdfsTell - Get the current offset in the file, in bytes.
165      * @param fs The configured filesystem handle.
166      * @param file The file handle.
167      * @return Current offset, -1 on error.
168      */
169     tOffset hdfsTell(hdfsFS fs, hdfsFile file);
170
171
172     /**
173      * hdfsRead - Read data from an open file.
174      * @param fs The configured filesystem handle.
175      * @param file The file handle.
176      * @param buffer The buffer to copy read bytes into.
177      * @param length The length of the buffer.
178      * @return Returns the number of bytes actually read, possibly less
179      * than than length;-1 on error.
180      */
181     tSize hdfsRead(hdfsFS fs, hdfsFile file, void* buffer, tSize length);
182
183
184     /**
185      * hdfsPread - Positional read of data from an open file.
186      * @param fs The configured filesystem handle.
187      * @param file The file handle.
188      * @param position Position from which to read
189      * @param buffer The buffer to copy read bytes into.
190      * @param length The length of the buffer.
191      * @return Returns the number of bytes actually read, possibly less than
192      * than length;-1 on error.
193      */
194     tSize hdfsPread(hdfsFS fs, hdfsFile file, tOffset position,
195                     void* buffer, tSize length);
196
197
198     /**
199      * hdfsWrite - Write data into an open file.
200      * @param fs The configured filesystem handle.
201      * @param file The file handle.
202      * @param buffer The data.
203      * @param length The no. of bytes to write.
204      * @return Returns the number of bytes written, -1 on error.
205      */
206     tSize hdfsWrite(hdfsFS fs, hdfsFile file, const void* buffer,
207                     tSize length);
208
209
210     /**
211      * hdfsWrite - Flush the data.
212      * @param fs The configured filesystem handle.
213      * @param file The file handle.
214      * @return Returns 0 on success, -1 on error.
215      */
216     int hdfsFlush(hdfsFS fs, hdfsFile file);
217
218
219     /**
220      * hdfsAvailable - Number of bytes that can be read from this
221      * input stream without blocking.
222      * @param fs The configured filesystem handle.
223      * @param file The file handle.
224      * @return Returns available bytes; -1 on error.
225      */
226     int hdfsAvailable(hdfsFS fs, hdfsFile file);
227
228
229     /**
230      * hdfsCopy - Copy file from one filesystem to another.
231      * @param srcFS The handle to source filesystem.
232      * @param src The path of source file.
233      * @param dstFS The handle to destination filesystem.
234      * @param dst The path of destination file.
235      * @return Returns 0 on success, -1 on error.
236      */
237     int hdfsCopy(hdfsFS srcFS, const char* src, hdfsFS dstFS, const char* dst);
238
239
240     /**
241      * hdfsMove - Move file from one filesystem to another.
242      * @param srcFS The handle to source filesystem.
243      * @param src The path of source file.
244      * @param dstFS The handle to destination filesystem.
245      * @param dst The path of destination file.
246      * @return Returns 0 on success, -1 on error.
247      */
248     int hdfsMove(hdfsFS srcFS, const char* src, hdfsFS dstFS, const char* dst);
249
250
251     /**
252      * hdfsDelete - Delete file.
253      * @param fs The configured filesystem handle.
254      * @param path The path of the file.
255      * @return Returns 0 on success, -1 on error.
256      */
257     int hdfsDelete(hdfsFS fs, const char* path);
258
259
260     /**
261      * hdfsRename - Rename file.
262      * @param fs The configured filesystem handle.
263      * @param oldPath The path of the source file.
264      * @param newPath The path of the destination file.
265      * @return Returns 0 on success, -1 on error.
266      */
267     int hdfsRename(hdfsFS fs, const char* oldPath, const char* newPath);
268
269
270     /**
271      * hdfsGetWorkingDirectory - Get the current working directory for
272      * the given filesystem.
273      * @param fs The configured filesystem handle.
274      * @param buffer The user-buffer to copy path of cwd into.
275      * @param bufferSize The length of user-buffer.
276      * @return Returns buffer, NULL on error.
277      */
278     char* hdfsGetWorkingDirectory(hdfsFS fs, char *buffer, size_t bufferSize);
279
280
281     /**
282      * hdfsSetWorkingDirectory - Set the working directory. All relative
283      * paths will be resolved relative to it.
284      * @param fs The configured filesystem handle.
285      * @param path The path of the new ‘cwd‘.
286      * @return Returns 0 on success, -1 on error.
287      */
288     int hdfsSetWorkingDirectory(hdfsFS fs, const char* path);
289
290
291     /**
292      * hdfsCreateDirectory - Make the given file and all non-existent
293      * parents into directories.
294      * @param fs The configured filesystem handle.
295      * @param path The path of the directory.
296      * @return Returns 0 on success, -1 on error.
297      */
298     int hdfsCreateDirectory(hdfsFS fs, const char* path);
299
300
301     /**
302      * hdfsSetReplication - Set the replication of the specified
303      * file to the supplied value
304      * @param fs The configured filesystem handle.
305      * @param path The path of the file.
306      * @return Returns 0 on success, -1 on error.
307      */
308     int hdfsSetReplication(hdfsFS fs, const char* path, int16_t replication);
309
310
311     /**
312      * hdfsFileInfo - Information about a file/directory.
313      */
314     typedef struct  {
315         tObjectKind mKind;   /* file or directory */
316         char *mName;         /* the name of the file */
317         tTime mLastMod;      /* the last modification time for the file*/
318         tOffset mSize;       /* the size of the file in bytes */
319         short mReplication;    /* the count of replicas */
320         tOffset mBlockSize;  /* the block size for the file */
321     } hdfsFileInfo;
322
323
324     /**
325      * hdfsListDirectory - Get list of files/directories for a given
326      * directory-path. hdfsFreeFileInfo should be called to deallocate memory.
327      * @param fs The configured filesystem handle.
328      * @param path The path of the directory.
329      * @param numEntries Set to the number of files/directories in path.
330      * @return Returns a dynamically-allocated array of hdfsFileInfo
331      * objects; NULL on error.
332      */
333     hdfsFileInfo *hdfsListDirectory(hdfsFS fs, const char* path,
334                                     int *numEntries);
335
336
337     /**
338      * hdfsGetPathInfo - Get information about a path as a (dynamically
339      * allocated) single hdfsFileInfo struct. hdfsFreeFileInfo should be
340      * called when the pointer is no longer needed.
341      * @param fs The configured filesystem handle.
342      * @param path The path of the file.
343      * @return Returns a dynamically-allocated hdfsFileInfo object;
344      * NULL on error.
345      */
346     hdfsFileInfo *hdfsGetPathInfo(hdfsFS fs, const char* path);
347
348
349     /**
350      * hdfsFreeFileInfo - Free up the hdfsFileInfo array (including fields)
351      * @param hdfsFileInfo The array of dynamically-allocated hdfsFileInfo
352      * objects.
353      * @param numEntries The size of the array.
354      */
355     void hdfsFreeFileInfo(hdfsFileInfo *hdfsFileInfo, int numEntries);
356
357
358     /**
359      * hdfsGetHosts - Get hostnames where a particular block (determined by
360      * pos & blocksize) of a file is stored. The last element in the array
361      * is NULL. Due to replication, a single block could be present on
362      * multiple hosts.
363      * @param fs The configured filesystem handle.
364      * @param path The path of the file.
365      * @param start The start of the block.
366      * @param length The length of the block.
367      * @return Returns a dynamically-allocated 2-d array of blocks-hosts;
368      * NULL on error.
369      */
370     char*** hdfsGetHosts(hdfsFS fs, const char* path,
371             tOffset start, tOffset length);
372
373
374     /**
375      * hdfsFreeHosts - Free up the structure returned by hdfsGetHosts
376      * @param hdfsFileInfo The array of dynamically-allocated hdfsFileInfo
377      * objects.
378      * @param numEntries The size of the array.
379      */
380     void hdfsFreeHosts(char ***blockHosts);
381
382
383     /**
384      * hdfsGetDefaultBlockSize - Get the optimum blocksize.
385      * @param fs The configured filesystem handle.
386      * @return Returns the blocksize; -1 on error.
387      */
388     tOffset hdfsGetDefaultBlockSize(hdfsFS fs);
389
390
391     /**
392      * hdfsGetCapacity - Return the raw capacity of the filesystem.
393      * @param fs The configured filesystem handle.
394      * @return Returns the raw-capacity; -1 on error.
395      */
396     tOffset hdfsGetCapacity(hdfsFS fs);
397
398
399     /**
400      * hdfsGetUsed - Return the total raw size of all files in the filesystem.
401      * @param fs The configured filesystem handle.
402      * @return Returns the total-size; -1 on error.
403      */
404     tOffset hdfsGetUsed(hdfsFS fs);
405
406 #ifdef __cplusplus
407 }
408 #endif
409
410 #endif /*LIBHDFS_HDFS_H*/
时间: 2024-08-03 13:33:08

[转载]hdfs c/c++ API的相关文章

[转载]android常用的API接口调用

原文地址:android常用的API接口调用作者:宋耀 显示网页:         Uri uri = Uri.parse("http://www.google.com"); Intent it = new Intent(Intent.ACTION_VIEW,uri); startActivity(it);显示地图: Uri uri = Uri.parse("geo:38.899533,-77.036476"); //Uri uri = Uri.parse(&quo

HDFS基本操作的API

一.从hdfs下载文件到windows本地: package com.css.hdfs01; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path;

[转载]HDFS初探之旅

转载自 http://www.cnblogs.com/xia520pi/archive/2012/05/28/2520813.html , 感谢虾皮工作室这一系列精彩的文章. Hadoop集群(第8期)_HDFS初探之旅 1.HDFS简介 HDFS(Hadoop Distributed File System)是Hadoop项目的核心子项目,是分布式计算中数据存储管理的基础,是基于流数据模式访问和处理超大文件的需求而开发的,可以运行于廉价的商用服务器上.它所具有的高容错.高可靠性.高可扩展性.高

熟练掌握HDFS的Java API接口访问

HDFS设计的主要目的是对海量数据进行存储,也就是说在其上能够存储很大量文件(可以存储TB级的文件).HDFS将这些文件分割之后,存储在不同的DataNode上, HDFS 提供了两种访问接口:Shell接口和Java API 接口,对HDFS里面的文件进行操作,具体每个Block放在哪台DataNode上面,对于开发者来说是透明的. 通过Java API接口对HDFS进行操作,我将其整理成工具类,地址见底部 1.获取文件系统 1 /** 2 * 获取文件系统 3 * 4 * @return F

HDFS中JAVA API的使用

转自:http://www.cnblogs.com/liuling/p/2013-6-17-01.html 相关源代码:http://www.cnblogs.com/ggjucheng/archive/2013/02/19/2917020.html HDFS是一个分布式文件系统,既然是文件系统,就可以对其文件进行操作,比如说新建文件.删除文件.读取文件内容等操作.下面记录一下使用JAVA API对HDFS中的文件进行操作的过程. 对分HDFS中的文件操作主要涉及一下几个类: Configurat

转载-- http接口、api接口、RPC接口、RMI、webservice、Restful等概念

http接口.api接口.RPC接口.RMI.webservice.Restful等概念 收藏 Linux一叶 https://my.oschina.net/heavenly/blog/499661 发表于 1年前 阅读 1422 收藏 28 点赞 0 评论 0 摘要: 在这之前一定要好好理解一下接口的含义,我觉得在这一类中接口理解成规则很恰当 在这之前一定要好好理解一下接口的含义,我觉得在这一类中接口理解成规则很恰当.         http接口:基于HTTP协议的开发接口.这个并不能排除没

掌握HDFS的Java API接口访问

HDFS设计的主要目的是对海量数据进行存储,也就是说在其上能够存储很大量文件(可以存储TB级的文件).HDFS将这些文件分割之后,存储在不同的DataNode上, HDFS 提供了两种访问接口:Shell接口和Java API 接口,对HDFS里面的文件进行操作,具体每个Block放在哪台DataNode上面,对于开发者来说是透明的. 1.获取文件系统 1 /** 2 * 获取文件系统 3 * 4 * @return FileSystem 5 */ 6 public static FileSys

Sample: Write And Read data from HDFS with java API

HDFS: hadoop distributed file system 它抽象了整个集群的存储资源,可以存放大文件. 文件采用分块存储复制的设计.块的默认大小是64M. 流式数据访问,一次写入(现支持append),多次读取. 不适合的方面: 低延迟的数据访问 解决方案:HBASE 大量的小文件 解决方案:combinefileinputformat ,或直接把小文件合并成sequencefile存储到hdfs. HDFS的块 块是独立的存储单元.但是如果文件小于默认的块大小如64M,它不会占

[转载] HDFS简介

转载自http://www.csdn.net/article/2010-11-26/282582 http://subject.csdn.net/hadoop/ 一.HDFS的基本概念 1.1.数据块(block) HDFS(Hadoop Distributed File System)默认的最基本的存储单位是64M的数据块. 和普通文件系统相同的是,HDFS中的文件是被分成64M一块的数据块存储的. 不同于普通文件系统的是,HDFS中,如果一个文件小于一个数据块的大小,并不占用整个数据块存储空