Development Notes from a Spring AI Demo project with Ollama and PGvector
I first came across Spring AI while watching Spring I/O 2024 (30–31 May, Barcelona) presentations on YouTube:
- Introducing Spring AI by Christian Tzolov / Mark Pollack @ Spring I/O 2024
- Concerto for Java and AI — Building Production-Ready LLM Applications by Thomas Vitale @ Spring I/O
Here is the full playlist for Spring I/O 2024, if you are interested.
You can also reach the presentations and GitHub links for the live coding projects from the video comments.
There is also another YouTube video from Devoxx Belgium 2024 (7–11 October):
I wanted to give it a try and develop a sample Spring AI project. I will give you some code samples and details, in this post. I will also include other useful information and other links you may visit to get deeper knowledge.
Ollama Installation
I decided to go on with a local installation of the models with Ollama (instead of calling APIs and adding authentication information in my application.properties file). For this, I installed Ollama app on my Mac (of course, you can also use Docker images or install with Homebrew). From Ollama download page, download and install the app. You can then run a model with the following command (run — it first installs and then runs the model):
ollama run llama3.2
If you do not want to run the model yet, just run the “pull” command:
ollama pull deepseek-r1:8b
I started & finished my development using “llama3.2” (you can visit Ollama search page for the models listed by their popularity) but be aware that if you are going to use function calling — tools, you need to find a model that supports it or you will be getting an error like me (while trying to use “deepseek-r1:8b”):
You can filter tool supporting models in Ollama search page:
You need to have the model up and running before you start your application (you can reach it on port 11434).
VectorDB Installation
To implement RAG — Retrieval-augmented generation capabilities with AI, you need to store the embeddings for semantic search. At first, I decided to use ElasticSearch and Kibana, but I ran into some errors trying to run ElasticSearch on my Apple M3 Max. I did not want to spend so much time on it so I switched to “PGvector” with PostgreSQL.
You can read more in Spring AI’s official reference for PgAdmin support:
PGvector is an open-source extension for PostgreSQL that enables storing and searching over machine learning-generated embeddings. It provides different capabilities that let users identify both exact and approximate nearest neighbors. It is designed to work seamlessly with other PostgreSQL features, including indexing and querying.
Firstly, I installed PostgreSQL with Homebrew. You can follow this tutorial from Datacamp step by step. Do not skip the step that adds PostgreSQL to the system path.
Create your superuser and connect to PostgreSQL (I did not give my super user a password):
createuser -s postgres
psql postgres
Now you should install PGvector with Homebrew.
brew install pgvector
After that, connect to PostgreSQL (if you are not connected) and then run the following command:
CREATE EXTENSION vector;
I also installed PgAdmin (a management tool for PostgreSQL) with Homebrew. You can follow the instructions in this post. You can use it as a GUI to inspect your tables and records.
If you would like use ElasticSearch (and Kibana), you can read the following post (I actually did my RAG implementation following this tutorial):
Spring AI — Dependencies & Properties
When I started coding, I used “spring-ai.version = 1.0.0-M5”. On February 14, they released a newer version (1.0.0-M6), you can read the details here (see that the editor is Mark Pollack, one of the presenters in the YouTube videos).
Using Maven, these dependencies should be included:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
With the dependencyManagement section:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
The following are my configurations for Ollama in my application.properties file. You can find more in the official documents.
#********** OLLAMA CONFIG **********#
spring.ai.ollama.base-url=http://localhost:11434/
spring.ai.ollama.init.pull-model-strategy=always
spring.ai.ollama.chat.options.model=llama3.2
spring.ai.ollama.chat.options.temperature=0.7
“pull-model-strategy”: enables automatic model pulling at startup time but it is recommended to pre-download to avoid delays).
“temperature”: increasing the temperature will make the model answer more creatively but be careful about hallucinations.
When we talk about “temperature”, we also come across “top-p”; you may be familiar with them if you have tried OpenAI’s playground. You balance creativity and reality using these two parameters.
You do not need to set them right away in “application.properties”; you can go with the default values (by not adding any related config to the properties) and change them while building your ChatClient:
@Bean
ChatClient chatClient(ChatClient.Builder chatClientBuilder) {
return chatClientBuilder
.defaultOptions(ChatOptions.builder().temperature(0.7).build())
.defaultAdvisors(new SimpleLoggerAdvisor())
.build();
}
There is also OllamaChatModel class which is a ChatModel implementation for Ollama. Ollama allows developers to run large language models and generate embeddings locally. It supports open-source models available on Ollama AI Library and on Hugging Face. But I preferred ChatClient over OllamaChatModel; since ChatClient will provide flexibility to switch models/providers.
I added two dependencies for my REST APIs, “spring-boot-starter-web” and “spring-boot-starter-webflux” (AI chat response streaming enabled — for SSE API endpoints).
PGvector — Dependencies & Properties
To implement RAG capabilities, I added the following dependencies:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
PGvector properties are defined as below:
#********** SPRING AI VECTORSTORE CONFIG **********#
spring.ai.vectorstore.pgvector.initialize-schema=true
# spring.ai.vectorstore.pgvector.schema-validation=true
spring.ai.vectorstore.pgvector.remove-existing-vector-store-table=true
# IVFFlat (Inverted File Flat) index, HNSW (Hierarchical Navigable Small World) index
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
# spring.ai.vectorstore.pgvector.dimensions=1024
# Optional: Controls how documents are batched for embedding
spring.ai.vectorstore.pgvector.batching-strategy=TOKEN_COUNT
# Optional: Maximum number of documents per batch
spring.ai.vectorstore.pgvector.max-document-batch-size=10000
“remove-existing-vector-store-table”: Deletes the existing vector_store table (vector_store — default vector store table name, can change it with “table-name” property) on start up. I enabled this setting since mine will be a demo application, where one can upload a document and ask questions about its content. You may upload a user manual document and ask questions related to the usage of the product.
“index-type”: Nearest neighbor search index type.
NONE - exact nearest neighbor search
IVFFlat - index divides vectors into lists, and then searches a subset of those lists that are closest to the query vector. It has faster build times and uses less memory than HNSW, but has lower query performance.
HNSW - creates a multilayer graph. It has slower build times and uses more memory than IVFFlat, but has better query performance.
“distance-type”: Search distance type. Its value can be one of COSINE_DISTANCE (default), EUCLIDEAN_DISTANCE or NEGATIVE_INNER_PRODUCT.
“dimensions”: Embeddings dimension. If not specified explicitly the PgVectorStore will retrieve the dimensions form the provided EmbeddingModel (for available embeddings, please visit here). Dimensions are set to the embedding column the on table creation. If you change the dimensions your would have to re-create the vector_store table as well.
“batching-strategy”: Strategy for batching documents when calculating embeddings. Options are TOKEN_COUNT (default) or FIXED_SIZE.
“max-document-batch-size” : Maximum number of documents to process in a single batch.
If you would like to get more details, visit the official documentation link.
Here, I will be processing a PDF document for RAG so I also added another dependency:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
Here is the implementation to read the document page by page (with configuration such as withNumberOfBottomTextLinesToDelete or withNumberOfTopPagesToSkipBeforeDelete). We then send them to TokenTextSplitter and load to our vectorStore.
package dev.nils.spring.ai.service;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.RetrievalAugmentationAdvisor;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.ExtractedTextFormatter;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.reader.pdf.config.PdfDocumentReaderConfig;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.core.io.FileUrlResource;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
@Slf4j
@RequiredArgsConstructor
@Service
public class RagService {
private final VectorStore vectorStore;
private final ChatClient ragChatClient;
private final RetrievalAugmentationAdvisor retrievalAugmentationAdvisor;
public void ingestPDF(String fileURL) throws MalformedURLException {
Resource pdfResource = new FileUrlResource(new URL(fileURL));
// Spring AI utility class to read a PDF file page by page
// Extract
PagePdfDocumentReader pdfReader = new PagePdfDocumentReader(pdfResource,
PdfDocumentReaderConfig.builder()
.withPageExtractedTextFormatter(ExtractedTextFormatter.builder()
.withNumberOfBottomTextLinesToDelete(3) // Specifies that the bottom 3 lines of text on each page should be deleted.
.withNumberOfTopPagesToSkipBeforeDelete(1) // Indicates that the text deletion rule should not apply to the first page.
.build())
.withPagesPerDocument(1)
.build());
// Transform
TokenTextSplitter tokenTextSplitter = new TokenTextSplitter();
log.info("Parsing document, splitting, creating embeddings, and storing in vector store...");
// tag as external knowledge in the vector store's metadata
List<Document> splitDocuments = tokenTextSplitter.split(pdfReader.read());
for (Document splitDocument: splitDocuments) { // footnotes
splitDocument.getMetadata().put("filename", pdfResource.getFilename());
splitDocument.getMetadata().put("version", 1);
}
// Sending batch of documents to vector store
// Load
vectorStore.write(splitDocuments);
log.info("Done parsing document, splitting, creating embeddings and storing in vector store.");
}
public String queryLLM(String question) {
return ragChatClient.prompt()
.advisors(retrievalAugmentationAdvisor)
.user(question)
.call()
.content();
}
}
We are now able to use it as our advisor for RAG chat.
Spring AI Components — Advisors
You can define your advisor as a bean and provide it to your “ChatClient” during “prompt” (our advisor is retrievalAugmentationAdvisor):
package dev.nils.spring.ai.config;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.RetrievalAugmentationAdvisor;
import org.springframework.ai.chat.client.advisor.SimpleLoggerAdvisor;
import org.springframework.ai.rag.generation.augmentation.ContextualQueryAugmenter;
import org.springframework.ai.rag.retrieval.search.VectorStoreDocumentRetriever;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class RagChatConfiguration {
@Bean
ChatClient ragChatClient(ChatClient.Builder chatClientBuilder) {
return chatClientBuilder
.defaultAdvisors(
new SimpleLoggerAdvisor()
)
.build();
}
@Bean
RetrievalAugmentationAdvisor retrievalAugmentationAdvisor(VectorStore vectorStore) {
VectorStoreDocumentRetriever documentRetriever = VectorStoreDocumentRetriever.builder()
.vectorStore(vectorStore)
.similarityThreshold(0.50)
.topK(5)
.build();
ContextualQueryAugmenter queryAugmenter = ContextualQueryAugmenter.builder()
.allowEmptyContext(true)
.build();
return RetrievalAugmentationAdvisor.builder()
.documentRetriever(documentRetriever)
.queryAugmenter(queryAugmenter)
.build();
}
}
public String queryLLM(String question) {
return ragChatClient.prompt()
.advisors(retrievalAugmentationAdvisor)
.user(question)
.call()
.content();
}
You can also define default advisors with “ChatClient.Builder”:
@Bean
ChatClient chatClient(ChatClient.Builder chatClientBuilder) {
return chatClientBuilder
.defaultOptions(ChatOptions.builder().temperature(0.7).build())
.defaultAdvisors(new SimpleLoggerAdvisor())
.build();
}
SimpleLoggerAdvisor is a logger advisor that logs the request and response messages.
You can also implement your custom Advisor. You can visit this Baeldung post for examples.
Spring AI Components — Stateful Chat
You can create stateful chats with Spring AI. For this, you can use “MessageChatMemoryAdvisor”. Memory is retrieved and added as a collection of messages to the prompt. We can control the history size (chatHistoryWindowSize — here the value is given as 10). defaultConversationId value is given as DEFAULT_CHAT_MEMORY_CONVERSATION_ID (value is “default”). Here, the memory location class is “InMemoryChatMemory” since it a demo app. ChatMemory provides methods to add messages to a conversation, retrieve messages from a conversation, and clear the conversation history.
package dev.nils.spring.ai.config;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.SimpleLoggerAdvisor;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import static org.springframework.ai.chat.client.advisor.AbstractChatMemoryAdvisor.DEFAULT_CHAT_MEMORY_CONVERSATION_ID;
@Configuration
public class StatefulChatConfiguration {
@Bean
ChatClient statefulChatClient(ChatClient.Builder chatClientBuilder, ChatMemory statefulChatMemory) {
return chatClientBuilder
.defaultAdvisors(
// Chat memory helps us keep context when using the chatbot for up to 10 previous messages.
new MessageChatMemoryAdvisor(statefulChatMemory, DEFAULT_CHAT_MEMORY_CONVERSATION_ID, 10),
new SimpleLoggerAdvisor()
)
.build();
}
@Bean
public ChatMemory statefulChatMemory() {
return new InMemoryChatMemory();
}
}
Spring AI Components — Functions
You can use “function calling” as part of Spring AI.
In the project, as part of function calling, we will get help from database operations. Define your repository methods:
package dev.nils.spring.ai.service;
import dev.nils.spring.ai.dto.AddAccountRequest;
import dev.nils.spring.ai.dto.AddedAccountResponse;
import dev.nils.spring.ai.dto.GetAccountsResponse;
import dev.nils.spring.ai.entity.Account;
import dev.nils.spring.ai.repository.AccountRepository;
import lombok.RequiredArgsConstructor;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.stereotype.Service;
@RequiredArgsConstructor
@Service
public class AIDataProvider {
private final AccountRepository accountRepository;
public GetAccountsResponse getAllAccounts() {
return new GetAccountsResponse(accountRepository.findAll());
}
public AddedAccountResponse addAccount(AddAccountRequest request) {
Account account = accountRepository.save(request.account());
return new AddedAccountResponse(account);
}
}
Declare them in a configuration with “@Description” annotation. The @Description annotation helps the model understand when to call the function:
package dev.nils.spring.ai.config;
import dev.nils.spring.ai.dto.AccountRequest;
import dev.nils.spring.ai.dto.AddAccountRequest;
import dev.nils.spring.ai.dto.AddedAccountResponse;
import dev.nils.spring.ai.dto.GetAccountsResponse;
import dev.nils.spring.ai.service.AIDataProvider;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Description;
import java.util.function.Function;
@Configuration
public class AIFunctionConfiguration {
// The @Description annotation helps the model understand when to call the function
@Bean
@Description("List the accounts existing in the database")
public Function<AccountRequest, GetAccountsResponse> listAccounts(AIDataProvider aiDataProvider) {
return request -> aiDataProvider.getAllAccounts();
}
@Bean
@Description("""
Add a new user to the database. \
The account must include a name and a surname as two separate words""")
public Function<AddAccountRequest, AddedAccountResponse> addAccount(AIDataProvider aiDataProvider) {
return aiDataProvider::addAccount;
}
}
Finally, you can define your “functions” associated with your use case.
@Configuration
public class FunctionCallingChatConfiguration {
@Value("classpath:/prompts/function-calling.st")
private Resource functionCallingPromptResource;
@Bean
ChatClient functionCallingChatClient(ChatClient.Builder chatClientBuilder, ChatMemory functionCallingChatMemory) {
return chatClientBuilder
.defaultAdvisors(
// Chat memory helps us keep context when using the chatbot for up to 10 previous messages.
new MessageChatMemoryAdvisor(functionCallingChatMemory, DEFAULT_CHAT_MEMORY_CONVERSATION_ID, 10),
new SimpleLoggerAdvisor()
)
.defaultSystem(functionCallingPromptResource)
.defaultFunctions("listAccounts", "addAccount")
.build();
}
@Bean
public ChatMemory functionCallingChatMemory() {
return new InMemoryChatMemory();
}
}
You may notice a resource path named as “functionCallingPromptResource”. You can use such a text file to create a ChatClient with a default system text. The default system text provides instructions to the AI Model on how to process and respond.
You are a friendly AI assistant designed to help with the management of an account datastore called ADS.
Your job is to answer questions about and to perform actions on the user's behalf, mainly around accounts.
You are required to answer an a professional manner. If you don't know the answer, politely tell the user
you don't know the answer, then ask the user a followup question to try and clarify the question they are asking.
If you do know the answer, provide the answer but do not provide any additional followup questions.
For accounts - provide the correct data.
I actually updated the system text resource file used in the Petclinic App example.
Now, you can ask your application to “list all the accounts” or “add account named ‘James Brown’”.
Spring AI Components — Structured Output
You can structure the output of a chat (for compatibility with a REST API response):
public ActorMovies getActorMoviesByActorName(String actor) {
return generalChatClient.prompt()
.user(u -> u.text("Generate a filmography for the actor {actor}").param("actor", actor))
.call()
.entity(ActorMovies.class);
}
public List<ActorMovies> getActorMoviesByActorName(List<String> actorList) {
return generalChatClient.prompt()
.user(u -> u.text("Generate a filmography for the actors {actors}").param("actors", String.join(",", actorList)))
.call()
.entity(new ParameterizedTypeReference<>() {});
You can define a class such as ActorMovies or use ParameterizedTypeReference to enable capturing and passing a generic Type.
public record ActorMovies(String actor, List<String> movies) {
}
Why not Docker?
I came across “spring-boot-docker-compose” usage while I was reading the following Spring AI example:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-docker-compose</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
Spring Boot 3 will automatically check for container readiness. Containers can take some time to become fully ready. This feature frees us to use the “healthcheck” command to see whether a container is ready.
You can include your docker-compose file (if not in root folder, then give relative path via “file” setting).
#********** DOCKER CONFIG **********#
spring.docker.compose.lifecycle-management=start-and-stop
# If running the Ollama Docker Instance separately, then set this property
spring.docker.compose.enabled=false
spring.docker.compose.file=docker/docker-compose.yml
But since models take time to get pulled, I skipped using it and set “enabled” property to “false”. This is one of the reasons I hesitated to use TestContainers for testing purposes (but I will try them as the next step).
“lifecycle-management”: Options are NONE (Don’t start or stop Docker Compose), START_AND_STOP (Start if it’s not running and stop it when the JVM exits) and START_ONLY (Start Docker Compose if it’s not running.).
Here is the official documentation. There enabling/disabling of it, is suggested by using a Spring profile (may be helpful for development purposes).
There is also org.springframework.ai:spring-ai-spring-boot-docker-compose but there were not many search results about its usage. You can visit the official documentation here.
Reference Projects That I Used During Implementation:
My Project
I will do some clean up and refactoring, then publish it on GitHub. I will share it here, then.
Future Implementations:
Here are the next steps I am planning to establish:
- Supporting multiple models for different purposes.
- Making ElasticSearch work :)
- Trying “Evaluation Testing” for AI Model Evaluation as described in the Official Spring AI docs: https://docs.spring.io/spring-ai/reference/api/testing.html
- Using Testcontainers for testing purposes. There is an example with Spring AI in this Baeldung post: https://www.baeldung.com/spring-ai-ollama-hugging-face-models
If you would like to get more information about Testcontainers, you can read my blog post:
5. Trying “spring-ai-spring-boot-docker-compose”.
Happy Coding!
References:
https://spring.io/blog/2024/09/26/ai-meets-spring-petclinic-implementing-an-ai-assistant-with-spring-ai-part-i
https://docs.rapidapp.io/blog/integrating-spring-ai-with-vector-databases
https://www.baeldung.com/spring-ai-ollama-chatgpt-like-chatbot
https://www.elastic.co/search-labs/blog/java-rag-spring-ai-es
https://docs.spring.io/spring-ai/reference/api/vectordbs/pgvector.html#pgvector-properties
https://www.baeldung.com/docker-compose-support-spring-boot