Discussion about this post

User's avatar
Subramanyam Rekhandar's avatar

I am gaining huge knowledge for monolithic vs micro architecture related and mostly i can used monolithic architecture to build LLM Or RAG Application. Thank You for sharing valuable content for architectures.

Expand full comment
Daniel Manzke's avatar

Funny to see how the AI world slowly hits the normal engineering issues.

Architecture, Scaling, Caching, โ€ฆ

I would recommend anyone to put the LLM into their service. I would recommend to treat it always as an external service.

A lot of the points are true, but there are more. What if you want to test a different model? What about automatic testing? Wanna try it against the real OpenAI?

Use OpenAI REST API as your boundary. Most LLM providers are supporting it.

Another big issue what Iโ€™m seeing is the scalability of the LLM (the GPU). While a CPU with more threads can do more in parallel, a GPU is quite limited. You mainly scale via the amount of them.

Separating your service and the LLM has one big drawback. You can scale your services faster than the LLM.

So testing the handling of a lot of requests in a service to service setup becomes crucial.

Expand full comment
4 more comments...

No posts