Integrating Gen-AI into your applications - Part 1 - The foundations

Integrating generative AI into your existing applications is one of the major challenges of all businesses. Generative AI has a lot of capabilities and it has simplified how we can use machine learning solutions in usual business workflows. In past this was extremely complex as it involved training different models on different data to solve specific problems. In this article series we are going to explore this problem in more detail with several concrete use cases.
The foundations
Generative AI could mean many things. Large Language Models or LLMs is the most basic type of generative ai technology but we might also have something like Imagen or Midjourney or Sora or Veo. But very likely we are going to use a LLM for a lot of generative ai tasks and hence we will focus on LLM usecases first.
Once you have decided that you need to use GenAI in your application the next job is for the product team to figure out which are some of the areas where Gen AI can help your product get better and more useful. We will create a more structured framework for this later, but first we need to create an engineering foundation that different product teams can rely on to meet their needs.
LLM service
Our advice to organizations is to have single entry point into all your Gen Ai related work. This means a single team is responsible for maintaining the Gen AI inference infra in the organization and owns a generic set of inference APIs other teams could use.
This is sensible because LLMs themselves have extremely simple interface. They take free from text as input and produce text as output. There is really not much to it ultimately.
An LLM service is responsible for running inference for everyone against the models hosted by this service. Such a microservice is maintained by a single team and provides service to multiple other teams in the organization.
Experimentation, analytics and AI safety can thus be implemented at this entry point more uniformly. Also given that inference is expensive we might also want to assign quotas accordingly to each consumer depending on the business need.
Even if you are using third party APIs are like Google’s Gemini, you still need to control access to such a system and standardize how other teams use these GenAI technologies. So it makes sense to create an LLM wrapper around it.
Session storage
LLMs are all about context and hence we might also want to store some contextual data about an LLM prompt. This can be done using a concept called session. This is especially useful if you are going for a conversational AI and LLM needs to access the previous state of the prompt output. Hence we need a good way for the consumers to inform the LLM about the context of the query.
This can be modeled how we model a session in typical web applications. Each consumer “creates” and “ends” a session and the backend service is able to chain all the events that happen with that session Id.
Analytics and logging
Since Gen-AI technologies are experimental, we need a lot of real time data about how our Gen AI solutions are doing in wild. We can achieve this by logging all Gen AI interactions to a central place where they can be analyzed at a very high level of granularity by our researchers. This data can also be used to fine tune the future models.
Conclusion
We have put together a video with more details of this sort of LLM Service.
Next Part: Data collection, fine tuning and experimentation.




