InputFormat接口里包括两个方法:getSplits()和createRecordReader(),这两个方法分别用来定义输入分片和读取分片的方法。
1 public abstract class InputFormat<K, V> { 2 3 /** 4 * Logically split the set of input files for the job. 5 * 6 * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper} 7 * for processing.</p> 8 * 9 * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the 10 * input files are not physically split into chunks. For e.g. a split could 11 * be <i><input-file-path, start, offset></i> tuple. The InputFormat 12 * also creates the {@link RecordReader} to read the {@link InputSplit}. 13 * 14 * @param context job configuration. 15 * @return an array of {@link InputSplit}s for the job. 16 */ 17 public abstract 18 List<InputSplit> getSplits(JobContext context 19 ) throws IOException, InterruptedException; 20 21 /** 22 * Create a record reader for a given split. The framework will call 23 * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before 24 * the split is used. 25 * @param split the split to be read 26 * @param context the information about the task 27 * @return a new record reader 28 * @throws IOException 29 * @throws InterruptedException 30 */ 31 public abstract 32 RecordReader<K,V> createRecordReader(InputSplit split, 33 TaskAttemptContext context 34 ) throws IOException, 35 InterruptedException; 36 37 }
撒发生
时间: 2024-12-26 08:07:55