Hdfs读取文件到本地总结

总结了一下三个方法：hdfs自带按字节复制按行复制（在java io里还有字符复制，暂且不提）

因为hdfs自带的，不知道为什么有些场合不能用，每次能下载的个数还不一定，所以就考虑自己按照java的方式来复制，就出现第2、3种方法。

有时间好好研究一下IO，比如针对特殊文件，文件复制会出现大小不一样的情况。这里

	// void downloadFromHdfs(String hdfsSrc , String localDst)
	// String hdfsDst = "hdfs://54.0.88.53:8020/user/flume/SyslogNetwork/";
	// String localDir = "D://flume//";
//下载单个文件
	public static boolean downloadFromHdfs(String hdfsSrc, String localDst) {
		Configuration conf = new Configuration();
		Path dst = new Path(hdfsSrc);
		try {
			Path Src = new Path(hdfsSrc);
			String Filename = Src.getName().toString();
			String local = localDst + Filename;
			Path Dst = new Path(local);
			FileSystem fs = FileSystem.get(URI.create(hdfsSrc), conf);
			FSDataInputStream in = fs.open(Src);
			OutputStream output = new FileOutputStream(new File(local));
			IOUtils.copyBytes(in, output, 4096, true);
			System.out.print(" download successed.");
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
			System.out.print(" download failed.");
			return false;
		}
		return true;

	}
//下载目录下所有文件，方法1：  IOUtils.copyBytes或者copyToLocal
	public static boolean downFromHdfsDir(String hdfsSrc, String localDst)
			throws IOException {
		Configuration conf = new Configuration();
		Path dstpath = new Path(hdfsSrc);
		int i = 1;
		FileSystem fs = FileSystem.get(URI.create(hdfsSrc), conf);
		try {
			String subPath = "";
			FileStatus[] fList = fs.listStatus(dstpath);
			for (FileStatus f : fList) {
				if (null != f) {
					subPath = new StringBuffer()
							.append(f.getPath().getParent()).append("/")
							.append(f.getPath().getName()).toString();
					if (f.isDir()) {
						downFromHdfsDir(subPath, localDst);
					} else {
						System.out.println("/t/t" + subPath);// hdfs://54.0.88.53:8020/
						Path dst = new Path(subPath);
						i++;
						FSDataInputStream in = null;
						OutputStream output = null;
						try {
							Path Src = new Path(subPath);
							String Filename = Src.getName().toString();
							String local = localDst + Filename;
							Path Dst = new Path(local);
							FileSystem hdfs = FileSystem.get(URI
									.create(subPath), conf);
							in = hdfs.open(Src);
							output = new FileOutputStream(new File(local));
							// true-是否关闭数据流，如果是false则在finally里关闭
							// IOUtils.copyBytes(in, output, 4096, false);
							 IOUtils.copyBytes(in, output, conf);
							 output.flush();
							System.out.print(" download successed.");
						} catch (IOException e) {
							// TODO Auto-generated catch block
							e.printStackTrace();
							System.out.print(" download failed.");
						} finally {
							IOUtils.closeStream(in);
							IOUtils.closeStream(output);
						}
					}
				}
			}
		} catch (Exception e) {
		} finally {
			System.out.println("the number of files is :" + i);
		}
		return true;
	}

	//下载目录下所有文件，方法2： 按字节复制
	public static boolean downFromHdfsDir2(String hdfsSrc, String localDst)
			throws IOException {
		Configuration conf = new Configuration();
		Path dstpath = new Path(hdfsSrc);
		int i = 1;
		FileSystem fs = FileSystem.get(URI.create(hdfsSrc), conf);
		try {
			String subPath = "";
			FileStatus[] fList = fs.listStatus(dstpath);
			for (FileStatus f : fList) {
				if (null != f) {
					subPath = new StringBuffer()
							.append(f.getPath().getParent()).append("/")
							.append(f.getPath().getName()).toString();
					if (f.isDir()) {
						downFromHdfsDir(subPath, localDst);
					} else {
						System.out.println("/t/t" + subPath);// hdfs://54.0.88.53:8020/
						Path dst = new Path(subPath);
						i++;
						try {
							Path Src = new Path(subPath);
							String Filename = Src.getName().toString();
							String local = localDst + Filename;
							Path Dst = new Path(local);
							FileSystem localFS = FileSystem.getLocal(conf);
							FileSystem hdfs = FileSystem.get(URI
									.create(subPath), conf);
							FSDataInputStream in = hdfs.open(Src);
							FSDataOutputStream output = localFS.create(Dst);
							byte[] buf = new byte[1024];
							int readbytes = 0;
							while ((readbytes = in.read(buf)) > 0) {
								output.write(buf, 0, readbytes);
							}
							in.close();
							output.close();
						} catch (IOException e) {
							// TODO Auto-generated catch block
							e.printStackTrace();
							System.out.print(" download failed.");
						} finally {
						}
					}
				}
			}
		} catch (Exception e) {
		} finally {
			System.out.println("the number of files is :" + i);
		}
		return true;
	}

	//下载目录下所有文件，方法2： 按行复制
	public static boolean downFromHdfsDir3(String hdfsSrc, String localDst)
			throws IOException {
		Configuration conf = new Configuration();
		Path dstpath = new Path(hdfsSrc);
		int i = 1;
		FileSystem fs = FileSystem.get(URI.create(hdfsSrc), conf);
		try {
			String subPath = "";
			FileStatus[] fList = fs.listStatus(dstpath);
			for (FileStatus f : fList) {
				if (null != f) {
					subPath = new StringBuffer()
							.append(f.getPath().getParent()).append("/")
							.append(f.getPath().getName()).toString();
					if (f.isDir()) {
						downFromHdfsDir(subPath, localDst);
					} else {
						System.out.println("/t/t" + subPath);// hdfs://54.0.88.53:8020/
						Path dst = new Path(subPath);
						i++;
						try {
							Path Src = new Path(subPath);
							String Filename = Src.getName().toString();
							String local = localDst + Filename;
							Path Dst = new Path(local);
							FileSystem localFS = FileSystem.getLocal(conf);
							FileSystem hdfs = FileSystem.get(URI
									.create(subPath), conf);
							FSDataInputStream in = hdfs.open(Src);
							BufferedReader read = new BufferedReader(new InputStreamReader(in));
							BufferedWriter output=new BufferedWriter(new FileWriter(local));
							String line = null;
							while ((line = read.readLine()) != null) {
								output.append(line);
								output.newLine();
								output.flush();
							}
							in.close();
							read.close();
							output.close();
						} catch (IOException e) {
							// TODO Auto-generated catch block
							e.printStackTrace();
							System.out.print(" download failed.");
						} finally {
						}
					}
				}
			}
		} catch (Exception e) {
		} finally {
			System.out.println("the number of files is :" + i);
		}
		return true;
	}

　　一次读取整个文件

OutputStream：(一次读入整个文件) 字节

private static String readHdfsFile2(FileSystem fs, Path path, String charset)
        throws IOException {
    FSDataInputStream hdfsInStream = fs.open(path);
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    byte[] ioBuffer = new byte[1024];
    int readLen = hdfsInStream.read(ioBuffer);
    while (-1 != readLen) {
        bos.write(ioBuffer, 0, readLen);
        readLen = hdfsInStream.read(ioBuffer);
    }
    hdfsInStream.close();
    return new String(bos.toByteArray(), charset);
}

时间： 2024-10-03 23:16:08

Hdfs读取文件到本地总结的相关文章

MapReduce(十五): 从HDFS读取文件的源码分析

以Map任务读取文本数据为例: 1) LineRecordReader负责对文件分割的定位,以及对读取每一行内容的封装供用户Map任务使用.每次在定位在文件中不为0的位置时,多读取一行,因为前一个处理该位置之前的数据时,会完整把该一行已经读取并处理. 2) LineReader负责对所要访问文件输入流的数据进行每一行读取,只实现对每一行读取的逻辑. 3) DFSDataInputStream封装了DFSInputStream的实现,直接调用DFSInputStream接口完成. 4)

HDFS读取文件步骤

client调用FileSystem.open(),该FileSystem指向的实例是DistrbutedFileSystem(DFS),它通过RPC请求到Namenode. Namenode收到请求后,对于每一个块返回存有该副本的Datanode地址.并且依照"网络拓扑"来排序.(就近原则) DFS获取到BlockLocations后,可以根据当前读取偏移量计算指定DataNode并进行通讯,返回一个FSDataInputStream,该对象管理DataNode和NameNode的I

IE8上传文件时javascript读取文件的本地路径的问题（"C:\fakepath\"）的解决方案

Spark中加载本地（或者hdfs）文件以及SparkContext实例的textFile使用

默认是从hdfs读取文件,也可以指定sc.textFile("路径").在路径前面加上hdfs://表示从hdfs文件系统上读本地文件读取 sc.textFile("路径").在路径前面加上file:// 表示从本地文件系统读,如file:///home/user/spark/README.md ‍ 网上很多例子,包括官网的例子,都是用textFile来加载一个文件创建RDD,类似sc.textFile("hdfs://n1:8020/user/hdfs

HDFS的文件操作

格式化HDFS 命令:[email protected]:Hadoop$bin/hadoop namenode -format 启动HDFS 命令:[email protected]:hadoop$bin/start-dfs.sh 列出HDFS上的文件命令:[email protected]:hadoop$bin/hadoop dfs -ls 使用hadoop API public List<String[]>GetFileBolckHost(Configuration conf, Stri

HDFS读文件过程分析：读取文件的Block数据

转自http://shiyanjun.cn/archives/962.html 我们可以从java.io.InputStream类中看到,抽象出一个read方法,用来读取已经打开的InputStream实例中的字节,每次调用read方法,会读取一个字节数据,该方法抽象定义,如下所示:public abstract int read() throws IOException;Hadoop的DFSClient.DFSInputStream类实现了该抽象逻辑,如果我们清楚了如何从HDFS中读取一个文件

HDFS文件与本地文件的交互操作

1.在HDFS中创建一个新的文件夹,用于保存weblog_entries.txt hadoop fs -mkdir /data/weblogs 2.将weblog_entries.txt文件从本地文件系统复制到HDFS刚创建的新文件夹下 cd /home/data hadoop fs -copyFromLocal weblog_entries.txt /data/weblogs 3.列出HDFS上weblog_entries.txt文件的信息: hadoop fs –ls /data/weblo

HDFS 读取、写入、遍历目录获取文件全路径

1.从HDFS中读取数据 Configuration conf = getConf(); Path path = new Path(pathstr); FileSystem fs = FileSystem.get(conf); FSDataInputStream fsin= fs.open(path ); BufferedReader br =null; String line ; try{ br = new BufferedReader(new InputStreamReader(fsin))

23 遍历删除本地目录的方法，文件末尾追加内容，按行读取文件内容

1.遍历删除本地目录 /** * 递归删除非空目录 * @param file */ public static void deletNotEmptyDir(File file){ File[] files = file.listFiles(); if (files != null) { for (File f : files) { deletNotEmptyDir(f); } } file.delete(); } 2.文件末尾追加内容 /** * 在文件末尾追加字符串 * @param fil