All batch processing can be described in its most simple form as reading in large amounts of data, performing some type of calculation or transformation, and writing the result out. Spring Batch provides three key interfaces to help perform bulk reading and writing: ItemReader, ItemProcessor and ItemWriter.
ItemReader
Although a simple concept, an ItemReader is the means for providing data from many different types of input. The most general examples include:
- Flat File- Flat File Item Readers read lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma).
- XML - XML ItemReaders process XML independently of technologies used for parsing, mapping and validating objects. Input data allows for the validation of an XML file against an XSD schema.
- Database - A database resource is accessed to return resultsets which can be mapped to objects for processing. The default SQL ItemReaders invoke a RowMapper to return objects, keep track of the current row if restart is required, store basic statistics, and provide some transaction enhancements that will be explained later.
There are many more possibilities, but we‘ll focus on the basic ones for this chapter. A complete list of all available ItemReaders can be found in Appendix A.
ItemReader is a basic interface for generic input operations:
public interface ItemReader<T> { T read() throws Exception, UnexpectedInputException, ParseException; }
The read method defines the most essential contract of the ItemReader; calling it returns one Item or null if no more items are left. An item might represent a line in a file, a row in a database, or an element in an XML file. It is generally expected that these will be mapped to a usable domain object (i.e. Trade, Foo, etc) but there is no requirement in the contract to do so.
It is expected that implementations of the ItemReader interface will be forward only. However, if the underlying resource is transactional (such as a JMS queue) then calling read may return the same logical item on subsequent calls in a rollback scenario. It is also worth noting that a lack of items to process by an ItemReader will not cause an exception to be thrown. For example, a database ItemReader that is configured with a query that returns 0 results will simply return null on the first invocation of read.
ItemWriter
ItemWriter is similar in functionality to an ItemReader, but with inverse operations. Resources still need to be located, opened and closed but they differ in that an ItemWriter writes out, rather than reading in. In the case of databases or queues these may be inserts, updates, or sends. The format of the serialization of the output is specific to each batch job.
As with ItemReader, ItemWriter is a fairly generic interface:
public interface ItemWriter<T> { void write(List<? extends T> items) throws Exception; }
As with read on ItemReader, write provides the basic contract of ItemWriter; it will attempt to write out the list of items passed in as long as it is open. Because it is generally expected that items will be ‘batched‘ together into a chunk and then output, the interface accepts a list of items, rather than an item by itself. After writing out the list, any flushing that may be necessary can be performed before returning from the write method. For example, if writing to a Hibernate DAO, multiple calls to write can be made, one for each item. The writer can then call close on the hibernate Session before returning.
ItemProcessor
Spring Batch provides the ItemProcessor interface:
public interface ItemProcessor<I, O> { O process(I item) throws Exception; }
An ItemProcessor is very simple; given one object, transform it and return another. The provided object may or may not be of the same type. The point is that business logic may be applied within process, and is completely up to the developer to create. An ItemProcessor can be wired directly into a step, For example, assuming an ItemReader provides a class of type Foo, and it needs to be converted to type Bar before being written out. An ItemProcessor can be written that performs the conversion:
public class Foo {} public class Bar { public Bar(Foo foo) {} } public class FooProcessor implements ItemProcessor<Foo,Bar>{ public Bar process(Foo foo) throws Exception { //Perform simple transformation, convert a Foo to a Bar return new Bar(foo); } } public class BarWriter implements ItemWriter<Bar>{ public void write(List<? extends Bar> bars) throws Exception { //write bars } }
In the very simple example above, there is a class Foo, a class Bar, and a class FooProcessor that adheres to the ItemProcessor interface. The transformation is simple, but any type of transformation could be done here. The BarWriter will be used to write out Bar objects, throwing an exception if any other type is provided. Similarly, the FooProcessor will throw an exception if anything but a Foo is provided. The FooProcessor can then be injected into a Step:
<job id="ioSampleJob"> <step name="step1"> <tasklet> <chunk reader="fooReader" processor="fooProcessor" writer="barWriter" commit-interval="2"/> </tasklet> </step> </job>
Chaining ItemProcessors
Performing a single transformation is useful in many scenarios, but what if you want to ‘chain‘ together multiple ItemProcessors? This can be accomplished using the composite pattern mentioned previously. To update the previous, single transformation, example, Foo will be transformed to Bar, which will be transformed to Foobar and written out:
public class Foo {} public class Bar { public Bar(Foo foo) {} } public class Foobar{ public Foobar(Bar bar) {} } public class FooProcessor implements ItemProcessor<Foo,Bar>{ public Bar process(Foo foo) throws Exception { //Perform simple transformation, convert a Foo to a Bar return new Bar(foo); } } public class BarProcessor implements ItemProcessor<Bar,FooBar>{ public FooBar process(Bar bar) throws Exception { return new Foobar(bar); } } public class FoobarWriter implements ItemWriter<FooBar>{ public void write(List<? extends FooBar> items) throws Exception { //write items } }
A FooProcessor and BarProcessor can be ‘chained‘ together to give the resultant Foobar:
CompositeItemProcessor<Foo,Foobar> compositeProcessor = new CompositeItemProcessor<Foo,Foobar>(); List itemProcessors = new ArrayList(); itemProcessors.add(new FooTransformer()); itemProcessors.add(new BarTransformer()); compositeProcessor.setDelegates(itemProcessors);
Just as with the previous example, the composite processor can be configured into the Step:
<job id="ioSampleJob"> <step name="step1"> <tasklet> <chunk reader="fooReader" processor="compositeProcessor" writer="foobarWriter" commit-interval="2"/> </tasklet> </step> </job> <bean id="compositeItemProcessor" class="org.springframework.batch.item.support.CompositeItemProcessor"> <property name="delegates"> <list> <bean class="..FooProcessor" /> <bean class="..BarProcessor" /> </list> </property> </bean>
========END========