Spring Batch Essentials: A Comprehensive Guide to Building and Integrating Batch Applications

What is Spring Batch? Exploring Its Core Features
Spring Batch is a module of Spring Framework, designed to offer Java developers efficient data processing solutions. I won’t go into much detail about the technical capabilities of Spring Batch but try to mention its key features and implement a simple batch application to show how easy it is to start with.
Key features include:
- Batch Processing Models: Spring Batch provides a robust and user-friendly design for developing batch applications. This design is not only easy to use but also highly reliable. As you delve deeper into Spring Batch, you’ll encounter concepts such as chunk-oriented processing, steps, jobs, flows, job repositories, and listeners. These elements are straightforward to learn and offer significant assistance to developers.
- Integration with Spring Framework: As a submodule of the Spring Framework, Spring Batch seamlessly integrates with it. This integration means that if you are already familiar with the Spring Framework and develop software using it, integrating batch applications into your web applications and learning Spring Batch from scratch should be a straightforward process.
- Efficiency and Performance: Spring Batch is optimized for processing large data sets, ensuring high performance in handling substantial volumes of data. Furthermore, it offers a range of scaling options to facilitate parallel processing of your data, enhancing efficiency in complex data operations.
- Customizability: Spring Batch provides extensive customization options for most of its components, allowing you to build highly flexible batch applications tailored to your specific use cases. This adaptability ensures that Spring Batch can meet diverse and unique requirements in various application scenarios.
Why/When to use Spring Batch?
There are many alternatives to Spring Batch such as Apache Hadoop, Quartz, AWS Batch, etc. Spring Batch is good if you find the following considerations suitable for your case:
- Familiarity with Spring Framework: If you are new to Spring Ecosystem, Spring Batch’s learning curve may be steep for you. However, if you are familiar with Spring, that case becomes an advantage.
- Use Cases and Scale: Spring Batch is often the preferred choice for processing small to medium-sized data sets. While it offers adequate scalability options, it may not be the optimal solution for the largest data sets, where other batch processing tools are specifically designed to excel. Additionally, for both performance and reliability, distributed batch processing solutions, with their stateful design, can often provide superior outcomes.
- Flexible Processing Patterns: The flexible and customizable nature of Spring Batch supports a wide range of processing patterns, from simple data migration to complex business logic. This flexibility makes it a versatile tool for various batch processing scenarios, not limited to specific types of tasks or industries.
Before We Start
- I am going to implement a sample Spring Batch application step by step. You may follow the instructions to implement your own application with me, or find the final version of the application in this link.
- Before we dive in, if you’re new to Spring Batch, I highly recommend checking out this documentation to acquaint yourself with some of the technical details (No need for a deep dive). This will make the learning process smoother and more enjoyable for you.
Starting with Spring Batch: A Step-by-Step Implementation Guide
Spring Initializr
Let’s start with creating a project from this spring initializr link.
Dependencies
After generating the project using the Initializr, we need to add below h2 in-memory database dependency to give Spring Batch a job repository. Spring Batch forces us to use a job repository to save metadata of running jobs and steps into it. You can find detailed information about the job repositories in Spring Batch documents I have mentioned above.
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
Code
Chunk-oriented processing is a design to process data by partitions called chunks. It uses readers, processors, and writers to configure a step. Now, let’s create empty versions of these classes:
package com.demo.springbatchsampleapp.batch;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;
public class CustomItemReader implements ItemReader<String> {
@Override
public String read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
// TODO Auto-generated method stub
return null;
}
}
package com.demo.springbatchsampleapp.batch;
import org.springframework.batch.item.ItemProcessor;
public class CustomItemProcessor implements ItemProcessor<String, String> {
@Override
public String process(String item) throws Exception {
// TODO Auto-generated method stub
return null;
}
}
package com.demo.springbatchsampleapp.batch;
import org.springframework.batch.item.Chunk;
import org.springframework.batch.item.ItemWriter;
public class CustomItemWriter implements ItemWriter<String> {
@Override
public void write(Chunk<? extends String> chunk) throws Exception {
// TODO Auto-generated method stub
}
}
Now, let’s create a configuration class to create necessary beans for job and step configurations.
package com.demo.springbatchsampleapp.batch.conf;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.transaction.PlatformTransactionManager;
import com.demo.springbatchsampleapp.batch.CustomItemProcessor;
import com.demo.springbatchsampleapp.batch.CustomItemReader;
import com.demo.springbatchsampleapp.batch.CustomItemWriter;
@Configuration
public class BatchConfig {
@Bean
public CustomItemReader reader() {
return new CustomItemReader();
}
@Bean
public CustomItemProcessor processor() {
return new CustomItemProcessor();
}
@Bean
public CustomItemWriter writer() {
return new CustomItemWriter();
}
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("step1", jobRepository).<String, String>chunk(0, transactionManager).reader(reader())
.processor(processor()).writer(writer()).build();
}
@Bean
public Job demoJob(JobRepository jobRepository) {
return new JobBuilder("demoJob", jobRepository).flow(step1(null, null)).end().build();
}
}
After these steps, if we run the application, an empty job will be run and complete as shown in the logs below:
2023-12-12T17:35:25.121+03:00 INFO 15392 --- [ main] c.d.s.SpringBatchSampleAppApplication : Starting SpringBatchSampleAppApplication using Java 17.0.2 with PID 15392 (C:\projects\spring-batch-sample-app\target\classes started by batuo in C:\projects\spring-batch-sample-app)
2023-12-12T17:35:25.123+03:00 INFO 15392 --- [ main] c.d.s.SpringBatchSampleAppApplication : No active profile set, falling back to 1 default profile: "default"
2023-12-12T17:35:25.599+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.boot.autoconfigure.jdbc.DataSourceConfiguration$Hikari' of type [org.springframework.boot.autoconfigure.jdbc.DataSourceConfiguration$Hikari] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.633+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'spring.datasource-org.springframework.boot.autoconfigure.jdbc.DataSourceProperties' of type [org.springframework.boot.autoconfigure.jdbc.DataSourceProperties] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.634+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration$PooledDataSourceConfiguration' of type [org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration$PooledDataSourceConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.635+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'jdbcConnectionDetails' of type [org.springframework.boot.autoconfigure.jdbc.PropertiesJdbcConnectionDetails] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.689+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'dataSource' of type [com.zaxxer.hikari.HikariDataSource] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.693+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.boot.autoconfigure.jdbc.DataSourceTransactionManagerAutoConfiguration$JdbcTransactionManagerConfiguration' of type [org.springframework.boot.autoconfigure.jdbc.DataSourceTransactionManagerAutoConfiguration$JdbcTransactionManagerConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.697+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.boot.autoconfigure.transaction.TransactionManagerCustomizationAutoConfiguration' of type [org.springframework.boot.autoconfigure.transaction.TransactionManagerCustomizationAutoConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.702+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'transactionExecutionListeners' of type [org.springframework.boot.autoconfigure.transaction.ExecutionListenersTransactionManagerCustomizer] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.705+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'spring.transaction-org.springframework.boot.autoconfigure.transaction.TransactionProperties' of type [org.springframework.boot.autoconfigure.transaction.TransactionProperties] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.706+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'platformTransactionManagerCustomizers' of type [org.springframework.boot.autoconfigure.transaction.TransactionManagerCustomizers] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.711+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'transactionManager' of type [org.springframework.jdbc.support.JdbcTransactionManager] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.713+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'spring.batch-org.springframework.boot.autoconfigure.batch.BatchProperties' of type [org.springframework.boot.autoconfigure.batch.BatchProperties] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). Is this bean getting eagerly injected into a currently created BeanPostProcessor [jobRegistryBeanPostProcessor]? Check the corresponding BeanPostProcessor declaration and its dependencies.
2023-12-12T17:35:25.719+03:00 WARN 15392 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.boot.autoconfigure.batch.BatchAutoConfiguration$SpringBootBatchConfiguration' of type [org.springframework.boot.autoconfigure.batch.BatchAutoConfiguration$SpringBootBatchConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying). The currently created BeanPostProcessor [jobRegistryBeanPostProcessor] is declared through a non-static factory method on that class; consider declaring it as static instead.
2023-12-12T17:35:25.743+03:00 INFO 15392 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2023-12-12T17:35:25.900+03:00 INFO 15392 --- [ main] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Added connection conn0: url=jdbc:h2:mem:f944b20b-4f15-4b85-9491-2d23133ce134 user=SA
2023-12-12T17:35:25.901+03:00 INFO 15392 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2023-12-12T17:35:26.013+03:00 INFO 15392 --- [ main] o.s.b.c.step.builder.SimpleStepBuilder : Setting commit interval to default value (1)
2023-12-12T17:35:26.137+03:00 INFO 15392 --- [ main] c.d.s.SpringBatchSampleAppApplication : Started SpringBatchSampleAppApplication in 1.308 seconds (process running for 1.557)
2023-12-12T17:35:26.139+03:00 INFO 15392 --- [ main] o.s.b.a.b.JobLauncherApplicationRunner : Running default command line with: []
2023-12-12T17:35:26.170+03:00 INFO 15392 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=demoJob]] launched with the following parameters: [{}]
2023-12-12T17:35:26.194+03:00 INFO 15392 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [step1]
2023-12-12T17:35:26.206+03:00 INFO 15392 --- [ main] o.s.batch.core.step.AbstractStep : Step: [step1] executed in 12ms
2023-12-12T17:35:26.214+03:00 INFO 15392 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=demoJob]] completed with the following parameters: [{}] and the following status: [COMPLETED] in 28ms
2023-12-12T17:35:26.218+03:00 INFO 15392 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2023-12-12T17:35:26.220+03:00 INFO 15392 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
Well, this was our first empty batch application. Now, let’s modify our reader, processor, and writer classes to do simple operations on some list of strings.
I don’t want to add so many code blocks. Here is the commit to create a simple not empty job. With given changes, reader will read a list and return elements one by one to the processor. Then processor will return the uppercase version of the string to the writer and all of them will print the item they got. So the logs of the batch job will be something like that:
2023-12-13T11:23:47.521+03:00 INFO 9252 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=demoJob]] launched with the following parameters: [{}]
2023-12-13T11:23:47.544+03:00 INFO 9252 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [step1]
2023-12-13T11:23:47.551+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemReader : Reading the next item from the list which is: word1
2023-12-13T11:23:47.553+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemProcessor : Processing the item: word1
2023-12-13T11:23:47.554+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemWriter : Writing the item: WORD1
2023-12-13T11:23:47.555+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemReader : Reading the next item from the list which is: word2
2023-12-13T11:23:47.555+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemProcessor : Processing the item: word2
2023-12-13T11:23:47.555+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemWriter : Writing the item: WORD2
2023-12-13T11:23:47.556+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemReader : Reading the next item from the list which is: word3
2023-12-13T11:23:47.556+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemProcessor : Processing the item: word3
2023-12-13T11:23:47.556+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemWriter : Writing the item: WORD3
2023-12-13T11:23:47.557+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemReader : Reading the next item from the list which is: word4
2023-12-13T11:23:47.557+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemProcessor : Processing the item: word4
2023-12-13T11:23:47.557+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemWriter : Writing the item: WORD4
2023-12-13T11:23:47.558+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemReader : Reading the next item from the list which is: word5
2023-12-13T11:23:47.558+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemProcessor : Processing the item: word5
2023-12-13T11:23:47.558+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemWriter : Writing the item: WORD5
2023-12-13T11:23:47.559+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemReader : Reading the next item from the list which is: word6
2023-12-13T11:23:47.559+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemProcessor : Processing the item: word6
2023-12-13T11:23:47.559+03:00 INFO 9252 --- [ main] c.d.s.batch.CustomItemWriter : Writing the item: WORD6
2023-12-13T11:23:47.562+03:00 INFO 9252 --- [ main] o.s.batch.core.step.AbstractStep : Step: [step1] executed in 18ms
2023-12-13T11:23:47.568+03:00 INFO 9252 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=demoJob]] completed with the following parameters: [{}] and the following status: [COMPLETED] in 34ms
A little note, if you change the chunk size in the BatchConfiguration.java class, the job will process the input list not one by one but in parts. Give it a try.
Running Your Spring Batch Application
There are several ways to run a batch application. There is no correct way of doing so. You should choose the best approach based on your needs. Let’s examine some of them.
Spring Scheduler
We may simply use a “@Scheduled” annotation provided by Spring Framework to run jobs periodically at given specific time and dates. Using this approach is so easy. However, you should handle cases such as exceptions, notifications, running chained jobs, etc. yourself. In some businesses an unnoticed failed job may be a big problem. Check this document to learn more about Spring’s scheduler.
Using a Job Scheduling Tool
A job scheduling tool such as UC4 may also be used to schedule jobs. Such tools offer solutions for batch-specific problems I mentioned before. Yet, in that case you need to learn how to use such products too. Most of the big tech firms use such softwares to run and manage their batch applications.
Integrating Spring Batch: Enhancing Your Web Applications
Well, now that we know how to implement and run a job using Spring Batch, we should decide how we should deploy and run batch applications in a software environment.
At the company I work for, we usually develop web services using Spring Boot. However, any company that wants to process high volumes of data, might need to develop batch applications. Batches are essential to process large amounts of data on a scheduled basis (weekly, daily, etc.). Thus, my company also works with batch applications.
In the above sections, we analyzed the technical advantages of Spring Batch. Now, let’s investigate the problem of deploying batch applications from the perspective of a dev-ops engineer. As I mentioned, we mainly develop web services with Spring Boot, and our batch applications are related to those services. When a developer opens the repository of a web application, wouldn’t it be a good idea to also show him the related batch projects and their source codes, or maybe use common classes to keep web and batch applications integrated? Well, it depends :). Yet, in some cases, we decided that would be a pretty beneficial approach.
Have you ever heard of Maven archetypes? Here, check it. We will use a custom archetype.
In this project, I created a simple multi-module app to show you how to integrate a batch application with a web application. The idea is using a common parent pom, and using modules such as common-lib together. There are advantages and disadvantages of that approach.
Advantages
- Related Web and Batch applications are managed together, and changes in the common logic will reflect on both applications.
- Batch and Web jars will be installed separately, they can be used on the same or different servers when needed.
- Spring’s dependency injections may be used in common-lib classes.
- Developers who are new to the project will see the whole logic when they investigate through the repository.
- Using the same classes in different applications forces a solid software architecture.
- You will have to keep maintaining and updating your batch services to keep them integrated with web applications. Note that most of the batch jobs don’t get renewed for years.
Disadvantages
- This approach is not a good fit in a micro-service architecture since it violates some of its basic principles such as loose coupling.
- Installing and running several jars from the same project and deploying them to different servers would require some extra work, if you don’t have such a design ready in your company.
- Testing becomes more important in such a scenario. Because, when common-lib is changed both applications would be installed again and one may make sure changes for a service don’t corrupt the other. Better testing may also be considered as an advantage :).
- Using the same classes in different applications brings the need for a more complex software architecture.
Conclusion
Batch applications are essential for companies that process large amounts of data. Spring Batch is a good alternative as a batch processing framework. Solutions for deploying and running batch applications are at least as important as implementing them. Thank you for staying with me so far. Hope to see you in my next article.