Adding LLM to Spring Boot : Start using AI today

Web developers are curious about how to integrate the power of Large Language Models (LLMs) into their projects. It's becoming increasingly clear that LLMs will play an important role in shaping user experiences across all kinds of apps. However, a lot of engineers also do not where to start and how to start experimenting.
In this article we will provide you simple steps and code to start using open source LLMs with your app. Before you understand how to use LLMs in your app, please watch our high level design video that describe how AI can be used in your app.
Ollama
Ollama is an open source tool that makes it simple for you to run any open source model on your server. Open source models such as Meta’s LLama, Google’s Gemma etc. are essentially very large files that you load in memory and using their own APIs you ask them questions. Each open source model has a different interface, different capabilities.
As an application developer it is best not to concern yourself with low level details of the model apis. However, you might want to experiment with different models to figure out which one is better for your needs. Y
Ollama is a wrapper around these open source models and lets you pick and chose the model and create a uniform interface around it.
| Model | Parameters | Size | Download |
| Llama 3.3 | 70B | 43GB | ollama run llama3.3 |
| Llama 3.2 | 3B | 2.0GB | ollama run llama3.2 |
| Llama 3.2 | 1B | 1.3GB | ollama run llama3.2:1b |
| Llama 3.2 Vision | 11B | 7.9GB | ollama run llama3.2-vision |
| Llama 3.2 Vision | 90B | 55GB | ollama run llama3.2-vision:90b |
| Llama 3.1 | 8B | 4.7GB | ollama run llama3.1 |
| Llama 3.1 | 405B | 231GB | ollama run llama3.1:405b |
| Phi 3 Mini | 3.8B | 2.3GB | ollama run phi3 |
| Phi 3 Medium | 14B | 7.9GB | ollama run phi3:medium |
| Gemma 2 | 2B | 1.6GB | ollama run gemma2:2b |
| Gemma 2 | 9B | 5.5GB | ollama run gemma2 |
| Gemma 2 | 27B | 16GB | ollama run gemma2:27b |
| Mistral | 7B | 4.1GB | ollama run mistral |
| Moondream 2 | 1.4B | 829MB | ollama run moondream |
| Neural Chat | 7B | 4.1GB | ollama run neural-chat |
| Starling | 7B | 4.1GB | ollama run starling-lm |
| Code Llama | 7B | 3.8GB | ollama run codellama |
| Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
| LLaVA | 7B | 4.5GB | ollama run llava |
| Solar | 10.7B | 6.1GB | ollama run solar |
This gives you an idea of how useful Ollama is.
Using Ollama with Spring Boot
Ollama is a library that you can install as a command line tool. Once done, you can use that interface to start a server that gives you a REST based interface.
Install ollama
curl -fsSL https://ollama.com/install.sh | sh
Start the ollama server
./ollama serve
# In a seprate shell
./ollama run llama3.2
You can then query this server.
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt":"Why is the sky blue?"
}'
Integrating with Spring Boot
To use ollama with spring boot you should create a separate server that just runs Ollama and use REST interface to make your Spring Boot talk to the server.
Spring Boot has a helpful library that allows you to interface with Ollama server. Just add the following line to your gradle.build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter'
}
Once you add this dependency you can configure the ollama parameters right into your spring boot application.
Configuring Ollama parameters.
| Property | Description | Default |
| spring.ai.ollama.base-url | Base URL where Ollama API server is running. | localhost:11434 |
Then you can define your custom controller where you can call the Ollama Chat api.
@RestController
public class ChatController {
private final OllamaChatModel chatModel;
@Autowired
public ChatController(OllamaChatModel chatModel) {
this.chatModel = chatModel;
}
@GetMapping("/ai/generate")
public Map<String,String> generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
return Map.of("generation", this.chatModel.call(message));
}
@GetMapping("/ai/generateStream")
public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
Prompt prompt = new Prompt(new UserMessage(message));
return this.chatModel.stream(prompt);
}
}
OllamaChatModel is a standard API to chat with the model. UserMessage implies the prompt that comes from User. We will cover the actual usage of these APIs in future posts. But this gives you a good starting point.
Function calling with Ollama
Ollama is smart enough to actually call methods based on output of the LLM. For example you might want to ask LLM current temperature in a city and then present it to the user. For this LLM might have to call a method inside your app. One way to do this is by creating a function and letting LLM know that it is available. When it needs to be called is smartly determined by LLM.
@SpringBootApplication
public class OllamaApplication {
public static void main(String[] args) {
SpringApplication.run(OllamaApplication.class, args);
}
@Bean
CommandLineRunner runner(ChatClient.Builder chatClientBuilder) {
return args -> {
var chatClient = chatClientBuilder.build();
var response = chatClient.prompt()
.user("What is the weather in Amsterdam and Paris?")
.functions("weatherFunction") // reference by bean name.
.call()
.content();
System.out.println(response);
};
}
@Bean
@Description("Get the weather in location")
public Function<WeatherRequest, WeatherResponse> weatherFunction() {
return new MockWeatherService();
}
public static class MockWeatherService implements Function<WeatherRequest, WeatherResponse> {
public record WeatherRequest(String location, String unit) {}
public record WeatherResponse(double temp, String unit) {}
@Override
public WeatherResponse apply(WeatherRequest request) {
double temperature = request.location().contains("Amsterdam") ? 20 : 25;
return new WeatherResponse(temperature, request.unit);
}
}
}
Source : https://spring.io/blog/2024/07/26/spring-ai-with-ollama-tool-support
Conclusion
Spring boot and Ollama play well together and make it extremely simple to use AI in your apps.




