走进企业级批处理框架--Springbatch

Springbatch是一个轻量级的，完全面向Spring的批处理框架，可以应用于企业级大量的数据处理系统。Spring Batch可以提供大量的，可重复的数据处理功能，包括日志记录/跟踪，事务管理，作业处理统计工作重新启动、跳过，和资源管理等重要功能。它能使业务人员专注于核心业务的开发，而将重复性的耗时工作交给系统自动处理。如数据的倒入，导出，数据的复制等工作。本文将通过一个简单的文件复制的小例子介绍SpringBatch的工作原理。首先来看相关的核心代码和配置：

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:batch="http://www.springframework.org/schema/batch"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tx="http://www.springframework.org/schema/tx" xmlns:aop="http://www.springframework.org/schema/aop"
	xmlns:context="http://www.springframework.org/schema/context"
	xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
		http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd
		http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
		http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd
		http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd"
	default-autowire="byName">
	<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
		<property name="jobRepository" ref="jobRepository" />
	</bean>
	<bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
	</bean>
	<bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
	<batch:job id="iconvJob">
		<batch:step id="iconvStep">
			<batch:tasklet transaction-manager="transactionManager">
				<batch:chunk reader="iconvItemReader" writer="iconvItemWriter" processor="iconvItemProcessor"
					commit-interval="1" />
			</batch:tasklet>
		</batch:step>
	</batch:job>
	<context:property-placeholder location="classpath:files.properties" />
	<bean id="iconvItemReader" class="com.inetpsa.batch.iconv.IconvItemReader">
		<property name="input" value="file:${input.file}" />
		<property name="charset" value="UTF-8" />
	</bean>
	<bean id="iconvItemWriter" class="com.inetpsa.batch.iconv.IconvItemWriter">
		<property name="output" value="file:${output.file}" />
		<property name="charset" value="UTF-8" />
	</bean>
	<bean id="iconvItemProcessor" class="com.inetpsa.batch.iconv.IconvItemProcessor"/>
</beans>

springbatch的核心配置都体现在了这个配置文件中，每一个batch都会包含有一个（或多个job），每一个job中都定义了我们所要完成这个job所要执行的步骤，也就是job中的step，每一个step的完成都需要相应的持久化机制的支持，而jobRepository担当的就是持久化机制提供者的身份。体现在配置文件中，jobLauncher所起的作用就是由外部控制器调用开启一个job，当一个job开启之后就进入实质性的step执行阶段每一个step的执行都是由ItemReader首先读取数据然后返回给ItemProcessor进行处理然后返回给ItemWriter进行输出。我们可以显示的给每一个Step定义ItemReader,ItermProcessor和ItemWriter.如同配置文件中所定义的那样，我们制定了三个类作分别集成相应的基类用于实现自定义的读取，处理和输出。下面来看具体的实现。

public class ShellItemReader implements ItemReader<InputStream> {

	private String input;
	private InputStream item = null;

	public InputStream read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
		if (this.item == null) {
			this.item = this.input == null ? System.in : new URL(this.input).openStream();
			return this.item;
		}
		return null;
	}

	public void setInput(String input) {
		this.input = input;
	}
}

ShellItemReader类读取配置文件中配置的文件，然后将输入流返回给ShellIterProcessor进行处理：

public class ShellItemProcessor implements ItemProcessor<InputStream, InputStream> {

	private List<String> command;
	public InputStream process(InputStream item) throws Exception {
		final ProcessBuilder pb = new ProcessBuilder(this.command);
		pb.redirectErrorStream(true);
		final Process process = pb.start();
		IOUtils.copy(item, process.getOutputStream());
		IOUtils.closeQuietly(item);
		IOUtils.closeQuietly(process.getOutputStream());
		return process.getInputStream();
	}

	public void setCommand(List<String> command) {
		this.command = command;
	}
}

ShellItemProcessor将输入流进行处理然后返回给ShellItemWriter进行输出：

public class ShellItemWriter implements ItemWriter<InputStream> {

	private String output;

	public void setOutput(String output) {
		this.output = output;
	}

	public void write(List<? extends InputStream> items) throws Exception {
		OutputStream os = System.out;
		if (this.output != null) {
			final URL url = new URL(this.output);
			if (url.getProtocol().equals("file")) {
				os = new FileOutputStream(url.getPath());
			} else {
				os = url.openConnection().getOutputStream();
			}
		}
		for (final InputStream is : items) {
			IOUtils.copy(is, os);
			IOUtils.closeQuietly(is);
		}
		IOUtils.closeQuietly(os);
	}

}

如果是对数据库进行或文件读取数据的时候，ShellItemReader的read()操作每次都会读取一条记录然后交给ShellItemProcessor进行操作，当处理的记录数达到配置的commit-interval值的时候将处理后的数据交给ShellItermWriter进行一次数据。在执行每个Step的时候我们同样配置了事务处理，以便在程序出错的时候进行回滚。以上这些就是springbatch执行的简单流程。

时间： 2024-10-11 21:01:31

走进企业级批处理框架--Springbatch

走进企业级批处理框架--Springbatch的相关文章

SpringBatch批处理框架+mysql仓库+web监控实录

【转】大数据批处理框架 Spring Batch全面解析

异步并行批处理框架设计的一些思考（转）

异步并行批处理框架设计的一些思考

万树IT：Spring Batch批处理框架技巧，让你不再重复造轮子

企业级应用框架(二)三层架构之数据访问层的封装与抽象

企业级应用框架(三)三层架构之数据访问层的改进以及测试DOM的发布

图书简介：Spring Batch批处理框架

企业级应用框架(一) 三层架构之解耦