Hadoop源码阅读-HDFS-day2

昨天看到了AbstractFileSystem,也知道应用访问文件是通过FileContext这个类,今天来看这个类的源代码,先看下这个类老长的注释说明

  1 /**
  2  * The FileContext class provides an interface to the application writer for
  3  * using the Hadoop file system.
  4  * It provides a set of methods for the usual operation: create, open,
  5  * list, etc
  6  *
  7  * <p>
  8  * <b> *** Path Names *** </b>
  9  * <p>
 10  *
 11  * The Hadoop file system supports a URI name space and URI names.
 12  * It offers a forest of file systems that can be referenced using fully
 13  * qualified URIs.
 14  * Two common Hadoop file systems implementations are
 15  * <ul>
 16  * <li> the local file system: file:///path
 17  * <li> the hdfs file system hdfs://nnAddress:nnPort/path
 18  * </ul>
 19  *
 20  * While URI names are very flexible, it requires knowing the name or address
 21  * of the server. For convenience one often wants to access the default system
 22  * in one‘s environment without knowing its name/address. This has an
 23  * additional benefit that it allows one to change one‘s default fs
 24  *  (e.g. admin moves application from cluster1 to cluster2).
 25  * <p>
 26  *
 27  * To facilitate this, Hadoop supports a notion of a default file system.
 28  * The user can set his default file system, although this is
 29  * typically set up for you in your environment via your default config.
 30  * A default file system implies a default scheme and authority; slash-relative
 31  * names (such as /for/bar) are resolved relative to that default FS.
 32  * Similarly a user can also have working-directory-relative names (i.e. names
 33  * not starting with a slash). While the working directory is generally in the
 34  * same default FS, the wd can be in a different FS.
 35  * <p>
 36  *  Hence Hadoop path names can be one of:
 37  *  <ul>
 38  *  <li> fully qualified URI: scheme://authority/path
 39  *  <li> slash relative names: /path relative to the default file system
 40  *  <li> wd-relative names: path  relative to the working dir
 41  *  </ul>
 42  *  Relative paths with scheme (scheme:foo/bar) are illegal.
 43  *
 44  *  <p>
 45  *  <b>****The Role of the FileContext and configuration defaults****</b>
 46  *  <p>
 47  *  The FileContext provides file namespace context for resolving file names;
 48  *  it also contains the umask for permissions, In that sense it is like the
 49  *  per-process file-related state in Unix system.
 50  *  These two properties
 51  *  <ul>
 52  *  <li> default file system i.e your slash)
 53  *  <li> umask
 54  *  </ul>
 55  *  in general, are obtained from the default configuration file
 56  *  in your environment,  (@see {@link Configuration}).
 57  *
 58  *  No other configuration parameters are obtained from the default config as
 59  *  far as the file context layer is concerned. All file system instances
 60  *  (i.e. deployments of file systems) have default properties; we call these
 61  *  server side (SS) defaults. Operation like create allow one to select many
 62  *  properties: either pass them in as explicit parameters or use
 63  *  the SS properties.
 64  *  <p>
 65  *  The file system related SS defaults are
 66  *  <ul>
 67  *  <li> the home directory (default is "/user/userName")
 68  *  <li> the initial wd (only for local fs)
 69  *  <li> replication factor
 70  *  <li> block size
 71  *  <li> buffer size
 72  *  <li> encryptDataTransfer
 73  *  <li> checksum option. (checksumType and  bytesPerChecksum)
 74  *  </ul>
 75  *
 76  * <p>
 77  * <b> *** Usage Model for the FileContext class *** </b>
 78  * <p>
 79  * Example 1: use the default config read from the $HADOOP_CONFIG/core.xml.
 80  *   Unspecified values come from core-defaults.xml in the release jar.
 81  *  <ul>
 82  *  <li> myFContext = FileContext.getFileContext(); // uses the default config
 83  *                                                // which has your default FS
 84  *  <li>  myFContext.create(path, ...);
 85  *  <li>  myFContext.setWorkingDir(path)
 86  *  <li>  myFContext.open (path, ...);
 87  *  </ul>
 88  * Example 2: Get a FileContext with a specific URI as the default FS
 89  *  <ul>
 90  *  <li> myFContext = FileContext.getFileContext(URI)
 91  *  <li> myFContext.create(path, ...);
 92  *   ...
 93  * </ul>
 94  * Example 3: FileContext with local file system as the default
 95  *  <ul>
 96  *  <li> myFContext = FileContext.getLocalFSFileContext()
 97  *  <li> myFContext.create(path, ...);
 98  *  <li> ...
 99  *  </ul>
100  * Example 4: Use a specific config, ignoring $HADOOP_CONFIG
101  *  Generally you should not need use a config unless you are doing
102  *   <ul>
103  *   <li> configX = someConfigSomeOnePassedToYou.
104  *   <li> myFContext = getFileContext(configX); // configX is not changed,
105  *                                              // is passed down
106  *   <li> myFContext.create(path, ...);
107  *   <li>...
108  *  </ul>
109  *
110  */
111
112 @InterfaceAudience.Public
113 @InterfaceStability.Evolving /*Evolving for a release,to be changed to Stable */
114 public class FileContext {

FileContext类为应用程序写提供一个接口,提供了常用操作:创建(create),打开(open),列举(list)等

Hadoop 文件系统的两个通用实现分别是

  1. 本地文件系统 file:///path
  2. hdfs文件系统 hdfs://nnAddress:nnPort/path

URI命名非常灵活,它需要知道服务端的名字或者地址。HDFS有一个默认值,这有一个额外的好处就是,允许更改默认的fs(比如:管理员将应用从集群1移到集群2)

Hadoop 支持默认文件系统的理念。用户可以设置他的默认文件系统。

默认的文件系统实现了一个默认的scheme和authority;slash-relative名称(例如:/for/bar) 将解析成相对于默认FS的路径

同理,用户可以拥有自己的working-directory-relative名称(不是以slash开头的)。

因此,Hadoop路径的可以是以下几种:

完全合法的URI                    scheme://authority/path

slash relative names          /path 相对于默认的文件系统

wd-relative  names           path 相对于工作目录

 1 private FileContext(final AbstractFileSystem defFs,
 2     final FsPermission theUmask, final Configuration aConf) {
 3     defaultFS = defFs;
 4     umask = FsPermission.getUMask(aConf);
 5     conf = aConf;
 6     try {
 7       ugi = UserGroupInformation.getCurrentUser();
 8     } catch (IOException e) {
 9       LOG.error("Exception in getCurrentUser: ",e);
10       throw new RuntimeException("Failed to get the current user " +
11               "while creating a FileContext", e);
12     }
13     /*
14      * Init the wd.
15      * WorkingDir is implemented at the FileContext layer
16      * NOT at the AbstractFileSystem layer.
17      * If the DefaultFS, such as localFilesystem has a notion of
18      *  builtin WD, we use that as the initial WD.
19      *  Otherwise the WD is initialized to the home directory.
20      */
21     workingDir = defaultFS.getInitialWorkingDirectory();
22     if (workingDir == null) {
23       workingDir = defaultFS.getHomeDirectory();
24     }
25     resolveSymlinks = conf.getBoolean(
26         CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY,
27         CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_DEFAULT);
28     util = new Util(); // for the inner class
29   }

FileContext传进来三个参数,

  1. defFs   FileContext默认的FS
  2. theUmask   貌似没有使用到,历史遗留问题吗?他的umask使用FsPermission.getUMask(conf)初始化了
  3. conf  配置信息

下面来看它说的几个常用的方法,首先是create,隐藏的是一堆的注释

 1 /**
 2    * Create or overwrite file on indicated path and returns an output stream for
 3    * writing into the file.
 4    *
 5    * @param f the file name to open
 6    * @param createFlag gives the semantics of create; see {@link CreateFlag}
 7    * @param opts file creation options; see {@link Options.CreateOpts}.
 8    *          <ul>
 9    *          <li>Progress - to report progress on the operation - default null
10    *          <li>Permission - umask is applied against permisssion: default is
11    *          FsPermissions:getDefault()
12    *
13    *          <li>CreateParent - create missing parent path; default is to not
14    *          to create parents
15    *          <li>The defaults for the following are SS defaults of the file
16    *          server implementing the target path. Not all parameters make sense
17    *          for all kinds of file system - eg. localFS ignores Blocksize,
18    *          replication, checksum
19    *          <ul>
20    *          <li>BufferSize - buffersize used in FSDataOutputStream
21    *          <li>Blocksize - block size for file blocks
22    *          <li>ReplicationFactor - replication for blocks
23    *          <li>ChecksumParam - Checksum parameters. server default is used
24    *          if not specified.
25    *          </ul>
26    *          </ul>
27    *
28    * @return {@link FSDataOutputStream} for created file
29    *
30    * @throws AccessControlException If access is denied
31    * @throws FileAlreadyExistsException If file <code>f</code> already exists
32    * @throws FileNotFoundException If parent of <code>f</code> does not exist
33    *           and <code>createParent</code> is false
34    * @throws ParentNotDirectoryException If parent of <code>f</code> is not a
35    *           directory.
36    * @throws UnsupportedFileSystemException If file system for <code>f</code> is
37    *           not supported
38    * @throws IOException If an I/O error occurred
39    *
40    * Exceptions applicable to file systems accessed over RPC:
41    * @throws RpcClientException If an exception occurred in the RPC client
42    * @throws RpcServerException If an exception occurred in the RPC server
43    * @throws UnexpectedServerException If server implementation throws
44    *           undeclared exception to RPC server
45    *
46    * RuntimeExceptions:
47    * @throws InvalidPathException If path <code>f</code> is not valid
48    */

 1 public FSDataOutputStream create(final Path f,
 2       final EnumSet<CreateFlag> createFlag, Options.CreateOpts... opts)
 3       throws AccessControlException, FileAlreadyExistsException,
 4       FileNotFoundException, ParentNotDirectoryException,
 5       UnsupportedFileSystemException, IOException {
 6     Path absF = fixRelativePart(f);
 7
 8     // If one of the options is a permission, extract it & apply umask
 9     // If not, add a default Perms and apply umask;
10     // AbstractFileSystem#create
11
12     CreateOpts.Perms permOpt = CreateOpts.getOpt(CreateOpts.Perms.class, opts);
13     FsPermission permission = (permOpt != null) ? permOpt.getValue() :
14                                       FILE_DEFAULT_PERM;
15     permission = permission.applyUMask(umask);
16
17     final CreateOpts[] updatedOpts =
18                       CreateOpts.setOpt(CreateOpts.perms(permission), opts);
19     return new FSLinkResolver<FSDataOutputStream>() {
20       @Override
21       public FSDataOutputStream next(final AbstractFileSystem fs, final Path p)
22         throws IOException {
23         return fs.create(p, createFlag, updatedOpts);
24       }
25     }.resolve(this, absF);
26   }

create方法是用来在指定的路径上创建或者重写文件并返回outputstream的一个方法

在最后return时 new的 FSLinkResolver是用来处理路径为符号链接的情况

 1 /**
 2    * Generic helper function overridden on instantiation to perform a
 3    * specific operation on the given file system using the given path
 4    * which may result in an UnresolvedLinkException.
 5    * @param fs AbstractFileSystem to perform the operation on.
 6    * @param p Path given the file system.
 7    * @return Generic type determined by the specific implementation.
 8    * @throws UnresolvedLinkException If symbolic link <code>path</code> could
 9    *           not be resolved
10    * @throws IOException an I/O error occurred
11    */
12   abstract public T next(final AbstractFileSystem fs, final Path p)
13       throws IOException, UnresolvedLinkException;
14
15
16
17
18 /**
19    * Performs the operation specified by the next function, calling it
20    * repeatedly until all symlinks in the given path are resolved.
21    * @param fc FileContext used to access file systems.
22    * @param path The path to resolve symlinks on.
23    * @return Generic type determined by the implementation of next.
24    * @throws IOException
25    */
26   public T resolve(final FileContext fc, final Path path) throws IOException {
27     int count = 0;
28     T in = null;
29     Path p = path;
30     // NB: More than one AbstractFileSystem can match a scheme, eg
31     // "file" resolves to LocalFs but could have come by RawLocalFs.
32     AbstractFileSystem fs = fc.getFSofPath(p);
33
34     // Loop until all symlinks are resolved or the limit is reached
35     for (boolean isLink = true; isLink;) {
36       try {
37         in = next(fs, p);
38         isLink = false;
39       } catch (UnresolvedLinkException e) {
40         if (!fc.resolveSymlinks) {
41           throw new IOException("Path " + path + " contains a symlink"
42               + " and symlink resolution is disabled ("
43               + CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY + ").", e);
44         }
45         if (!FileSystem.areSymlinksEnabled()) {
46           throw new IOException("Symlink resolution is disabled in"
47               + " this version of Hadoop.");
48         }
49         if (count++ > FsConstants.MAX_PATH_LINKS) {
50           throw new IOException("Possible cyclic loop while " +
51                                 "following symbolic link " + path);
52         }
53         // Resolve the first unresolved path component
54         p = qualifySymlinkTarget(fs.getUri(), p, fs.getLinkTarget(p));
55         fs = fc.getFSofPath(p);
56       }
57     }
58     return in;
59   }

next 是一个一般的helper函数,需要被实例重写,从而在给定路径的文件系统上执行特定的操作,可能会抛UnresolvedLinkException异常

resolve 通过next执行特定的操作,反复的调用next函数,知道路径上所有的符号链接被解析

时间: 2024-10-10 12:29:53

Hadoop源码阅读-HDFS-day2的相关文章

Mac搭建Hadoop源码阅读环境

1.本次Hadoop源码阅读环境使用的阅读工具是idea,Hadoop版本是2.7.3.需要安装的工具包括idea.jdk.maven.protobuf等 2.jdk,使用的版本是1.8版,在jdk官网下载jdk-8u111-macosx-x64.dmg,点击安装,一路next. 3.idea安装,略 4.maven,使用的版本是3.3.9,下载apache-maven-3.3.9-bin.tar,解压: tar -zxvf  apache-maven-3.3.9-bin.tar 进入 Mave

Hadoop源码阅读环境搭建

Hadoop源码阅读环境搭建 一.说明 作为一个学习hadoop的同学,必须在本机上搭建hadoop源码阅读环境,这样,在方便阅读源码的同时也方便进行调试和源码修改.好了,下面开始搭建环境. 1.环境说明:hadoop 版本:1.2.1. IDE:eclipse.操作系统:centos 2.网上有人是通过eclipse的新建项目指定目录的方式将hadoop目录转换成Eclipse工程同时导入eclipse,具体做法如下: File-->new-->Java Project-->勾掉Use

IntelliJ IDEA 配置 Hadoop 源码阅读环境

1.下载安装IDEA https://www.jetbrains.com/idea/download/#section=windows 2.下载hadoop源码 https://archive.apache.org/dist/hadoop/core/ 3.使用IDEA打开hadoop源码 4.配置自定义Maven配置文件 file -> setting -> 如下图设置: 附(需要使用aliyun的源,默认国外源基本用不了): 1 <?xml version="1.0"

hadoop源码阅读

1.Hadoop的包的功能分析 2.由于Hadoop的MapReduce和HDFS都有通信的需求,需要对通信的对象进行序列化.Hadoop并没有采用java的序列化,而是引入它自己的系统.org.apache.hadoop.io中定义了大量的可序列化对象,他们都实现了Writable接口. 3.介绍完org.apache.hadoop.io以后,我们开始来分析org.apache.hadoop.ipc.RPC采用客户机/服务器模式. 4.既然是RPC,自然就用客户端和服务端,当然,org.apa

基于Eclipse构建Hadoop源码阅读环境

一.工具 1.hadoop-2.6.0-src.tar 2.eclipse 3.maven 4.protoc二.下载源码地址:http://mirrors.hust.edu.cn/apache/hadoop/common/三.准备maven包:eclipse-maven3-plugin M2_HOME E:\apache-maven-3.3.3 path ;%M2_HOME%\bin 测试:cmd-->mvn -v四.protoc安装 1.准备:protoc-2.5.0-win32.zip.pr

Apache Hadoop 源码阅读

总之一句话,这些都是hadoop-2.2.0的源代码里有的.也就是不光只是懂理论,编程最重要,还是基本功要扎实啊.... 在hadoop-2.2.0的源码里,按Ctrl + Shift + T . 跳进某个方法里,按F5.F6.   跳出某个方法里,按F7.

Hadoop源码之HDFS(1)--------通信方式

说起hadoop这个东西,只能说真是个伟大的发明,而本人对cutting大神也是无比的崇拜,记得刚接触hadoop的时候,还觉得这个东西挺多余的,但是现在想想,这个想法略傻逼...... 2006-2016,hadoop至今已经走过了10个年头,版本也已经发展到2.7了,现在hadoop3.0也快出来了,虽然spark,flink这些优秀的框架近几年的势头非常的强劲,但是我认为,近几年内并不会有哪个框架会取代hadoop,所以其实还是挺值得研究的. 那么我这系列的文章呢,主要是想讲讲Hadoop

Hadoop学习系列笔记一:搭建hadoop源码阅读环境

本文来源于<Hadoop技术内幕深入解析Hadoop common和HDFS架构设计与实现原理> 一.Hadoop基本概念 Hadoop是Apache基金会下的一个开源分布式计算平台,以Hadoop分布式文件系统(HDFS)和MapReduce分布式计算框架为核心,为用户提供了底层细节透明的分布式基础设施. HDFS的高容错性.高伸缩性等优点,允许用户将Hadoop部署在廉价的硬件上,构建分布式系统. MapReduce分布式计算计算框架则允许用户在不了解分布式系统底层细节的情况下开发并行.分

hadoop源码剖析--hdfs安全模式

一.什么是安全模式 hadoop安全模式是name node的一种状态,处于该状态时有种量特性: 1.namenode不接受任何对hfds文件系统的改变操作(即此时整个文件系统处于只读状态): 2.不执行block的replica和delete动作. 二.安全模式的原理 安全模式实在name node启动时默认进入的,当然也可以手动开启或关闭安全模式. 在name node启动后自动进入安全模式,这时data nodes向name node汇报各自节点的block信息.要想自动离开安全模式需要满