Spring Batch学习_ItemReaders and ItemWriters

All batch processing can be described in its most simple form as reading in large amounts of data, performing some type of calculation or transformation, and writing the result out. Spring Batch provides three key interfaces to help perform bulk reading and writing: ItemReader, ItemProcessor and ItemWriter.

ItemReader

Although a simple concept, an ItemReader is the means for providing data from many different types of input. The most general examples include:

  1. Flat File- Flat File Item Readers read lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma).
  2. XML - XML ItemReaders process XML independently of technologies used for parsing, mapping and validating objects. Input data allows for the validation of an XML file against an XSD schema.
  3. Database - A database resource is accessed to return resultsets which can be mapped to objects for processing. The default SQL ItemReaders invoke a RowMapper to return objects, keep track of the current row if restart is required, store basic statistics, and provide some transaction enhancements that will be explained later.

There are many more possibilities, but we‘ll focus on the basic ones for this chapter. A complete list of all available ItemReaders can be found in Appendix A.

ItemReader is a basic interface for generic input operations:

public interface ItemReader<T> {
    T read() throws Exception, UnexpectedInputException, ParseException;
}

The read method defines the most essential contract of the ItemReader; calling it returns one Item or null if no more items are left. An item might represent a line in a file, a row in a database, or an element in an XML file. It is generally expected that these will be mapped to a usable domain object (i.e. Trade, Foo, etc) but there is no requirement in the contract to do so.

It is expected that implementations of the ItemReader interface will be forward only. However, if the underlying resource is transactional (such as a JMS queue) then calling read may return the same logical item on subsequent calls in a rollback scenario. It is also worth noting that a lack of items to process by an ItemReader will not cause an exception to be thrown. For example, a database ItemReader that is configured with a query that returns 0 results will simply return null on the first invocation of read.

ItemWriter

ItemWriter is similar in functionality to an ItemReader, but with inverse operations. Resources still need to be located, opened and closed but they differ in that an ItemWriter writes out, rather than reading in. In the case of databases or queues these may be inserts, updates, or sends. The format of the serialization of the output is specific to each batch job.

As with ItemReader, ItemWriter is a fairly generic interface:

public interface ItemWriter<T> {
    void write(List<? extends T> items) throws Exception;
}

As with read on ItemReader, write provides the basic contract of ItemWriter; it will attempt to write out the list of items passed in as long as it is open. Because it is generally expected that items will be ‘batched‘ together into a chunk and then output, the interface accepts a list of items, rather than an item by itself. After writing out the list, any flushing that may be necessary can be performed before returning from the write method. For example, if writing to a Hibernate DAO, multiple calls to write can be made, one for each item. The writer can then call close on the hibernate Session before returning.

ItemProcessor

Spring Batch provides the ItemProcessor interface:

public interface ItemProcessor<I, O> {
    O process(I item) throws Exception;
}

An ItemProcessor is very simple; given one object, transform it and return another. The provided object may or may not be of the same type. The point is that business logic may be applied within process, and is completely up to the developer to create. An ItemProcessor can be wired directly into a step, For example, assuming an ItemReader provides a class of type Foo, and it needs to be converted to type Bar before being written out. An ItemProcessor can be written that performs the conversion:

public class Foo {}
public class Bar {
    public Bar(Foo foo) {}
}
public class FooProcessor implements ItemProcessor<Foo,Bar>{
    public Bar process(Foo foo) throws Exception {
        //Perform simple transformation, convert a Foo to a Bar
        return new Bar(foo);
    }
}
public class BarWriter implements ItemWriter<Bar>{
    public void write(List<? extends Bar> bars) throws Exception {
        //write bars
    }
}

In the very simple example above, there is a class Foo, a class Bar, and a class FooProcessor that adheres to the ItemProcessor interface. The transformation is simple, but any type of transformation could be done here. The BarWriter will be used to write out Bar objects, throwing an exception if any other type is provided. Similarly, the FooProcessor will throw an exception if anything but a Foo is provided. The FooProcessor can then be injected into a Step:

<job id="ioSampleJob">
    <step name="step1">
        <tasklet>
            <chunk reader="fooReader" processor="fooProcessor" writer="barWriter"
                   commit-interval="2"/>
        </tasklet>
    </step>
</job>

Chaining ItemProcessors

Performing a single transformation is useful in many scenarios, but what if you want to ‘chain‘ together multiple ItemProcessors? This can be accomplished using the composite pattern mentioned previously. To update the previous, single transformation, example, Foo will be transformed to Bar, which will be transformed to Foobar and written out:

public class Foo {}
public class Bar {
    public Bar(Foo foo) {}
}
public class Foobar{
    public Foobar(Bar bar) {}
}
public class FooProcessor implements ItemProcessor<Foo,Bar>{
    public Bar process(Foo foo) throws Exception {
        //Perform simple transformation, convert a Foo to a Bar
        return new Bar(foo);
    }
}
public class BarProcessor implements ItemProcessor<Bar,FooBar>{
    public FooBar process(Bar bar) throws Exception {
        return new Foobar(bar);
    }
}
public class FoobarWriter implements ItemWriter<FooBar>{
    public void write(List<? extends FooBar> items) throws Exception {
        //write items
    }
}

A FooProcessor and BarProcessor can be ‘chained‘ together to give the resultant Foobar:

CompositeItemProcessor<Foo,Foobar> compositeProcessor =
                                      new CompositeItemProcessor<Foo,Foobar>();
List itemProcessors = new ArrayList();
itemProcessors.add(new FooTransformer());
itemProcessors.add(new BarTransformer());
compositeProcessor.setDelegates(itemProcessors);

Just as with the previous example, the composite processor can be configured into the Step:

<job id="ioSampleJob">
    <step name="step1">
        <tasklet>
            <chunk reader="fooReader" processor="compositeProcessor" writer="foobarWriter"
                   commit-interval="2"/>
        </tasklet>
    </step>
</job>
<bean id="compositeItemProcessor"
      class="org.springframework.batch.item.support.CompositeItemProcessor">
    <property name="delegates">
        <list>
            <bean class="..FooProcessor" />
            <bean class="..BarProcessor" />
        </list>
    </property>
</bean>

========END========

时间: 2024-10-01 09:59:26

Spring Batch学习_ItemReaders and ItemWriters的相关文章

Spring Batch学习笔记二

此系列博客皆为学习Spring Batch时的一些笔记: Spring Batch的架构 一个Batch Job是指一系列有序的Step的集合,它们作为预定义流程的一部分而被执行: Step代表一个自定义的工作单元,它是Job的主要构件块:每一个Step由三部分组成:ItemReader.ItemProcessor.ItemWriter:这三个部分将执行在每一条被处理的记录上,ItemReader读取每一条记录,然后传递给ItemProcessor处理,最后交给ItemWriter做持久化:It

Spring Batch学习笔记三:JobRepository

此系列博客皆为学习Spring Batch时的一些笔记: Spring Batch Job在运行时有很多元数据,这些元数据一般会被保存在内存或者数据库中,由于Spring Batch在默认配置是使用HSQLDB,也就是说在Job的运行过程中,所有的元数据都被储存在内存中,在Job结束后会随着进程的结束自动消失:在这里我们推荐配置JobRepository去使用MySQL. 在这种情况下,Spring Batch在单次执行或者从一个执行到另外一个执行的时候会使用数据库去维护状态,Job执行的信息包

Spring Batch学习笔记&mdash;&mdash;steps之间共享数据

名词说明: 上下文: 执行: 执行上下文: 案例: 警告:一旦steps共享数据,这些数据就会把这些steps连接起来.努力使steps独立.如果你实在是不能独立他们,才使用下面的技术.你应该把数据共享作为steps不能独立的后备方案. 1 数据共享方式: a step存储共享数据到数据库,receiving step从数据库读取他们 Execution context(执行上下文): 使用Spring  Batch execution context 作为data容器.a step往上下文写数

Spring Batch学习

今天准备研究下Spring Batch,然后看了一系列资料,如下还是比较好的教程吧. 链接: http://www.cnblogs.com/gulvzhe/archive/2011/12/20/2295090.html 但在进行到 Spring Batch 之 Sample(CSV文件操作)(四) 时,发现了实战中的问题: outputFile.csv死活写不进结果!!! 想来想去,也尝试了n种调试,确定reader和process绝对没问题,那就writer出现问题了. 所以肯定是配置的csv

Spring Batch学习(一)介绍

为什么我们需要批处理? 我们不会总是想要立即得到需要的信息,批处理允许我们在请求处理之前就一个既定的流程开始搜集信息:比如说一个银行对账单,我们可以按月生成,并在用户查询之前开启一个批处理流程进行处理: 有时候它能让生意做得更好:比如说在线购物时,并不是说你买了一个产品零售商就立即发货,而是四五个小时后,统一发货: 更好的利用资源:让应该利用的处理能力闲置起来是一个大的浪费,我们可以定制处理让一个机器一个接一个的运行Job可以更好的利用机器的处理能力: 什么是批处理? 批处理是指在没有与用户进行

Spring Batch学习(二)架构

Spring Batch的架构 一个Batch Job是指一系列有序的Step的集合,它们作为预定义流程的一部分而被执行: Step代表一个自定义的工作单元,它是Job的主要构件块:每一个Step由三部分组成:ItemReader.ItemProcessor.ItemWriter:这三个部分将执行在每一条被处理的记录上,ItemReader读取每一条记录,然后传递给ItemProcessor处理,最后交给ItemWriter做持久化:ItemProcessor不是必须的,一个Step可以仅仅包含

Spring Batch学习(三)JobRepository

Spring Batch Job在运行时有很多元数据,这些元数据一般会被保存在内存或者数据库中,由于Spring Batch在默认配置是使用HSQLDB,也就是说在Job的运行过程中,所有的元数据都被储存在内存中,在Job结束后会随着进程的结束自动消失:在这里我们推荐配置JobRepository去使用MySQL. 在这种情况下,Spring Batch在单次执行或者从一个执行到另外一个执行的时候会使用数据库去维护状态,Job执行的信息包括Job实例.传入的参数.执行的结果.每一个Step执行的

Spring batch学习 持久化表结构详解(2)

#接上一篇 这一篇讲一下持久化需要表 batch_job_execution, batch_job_execution_context, batch_job_execution_params, batch_job_execution_seq, batch_job_instance, batch_job_seq, batch_step_execution, batch_step_execution_context, batch_step_execution_seq _seq结尾的三张表,维护bat

关于Spring batch的学习

最近在学习Spring batch相关的内容,网上也有不少Spring Batch相关的知识,不过大多都是使用xml进行配置的.这里是我用注解的方式进行相关的学习心得. 首先我们来看如何将一个文本文件中的内容导入到数据库中. 我们先来看一下我们所需要的环境.我们这里使用的是STS(Spring Tool Suite)当然也可以使用Eclipse(需要有maven插件)或者Myeclipse.我们使用的数据库是MySQl,当然也可以使用其他的数据库,根据自己的需要. 现在开始我们的文件到数据库的学