There are many ways of creating LLM clusters
-
Special ClGPUStack
Introducing GPUStack 0.2: heterogeneous distributed inference, CPU inference and scheduling strategies – GPUStack.ai
While the above can all be used in Disposable Node, we prefer clustering at a higher level:
-
Open WebUI
Merge responses from multiple language models at theUser Interfacelevel
How to enable merged model responses? · open-webui/open-webui · Discussion #4934 · GitHub -
WilmerAI
Merge responses from multiple language models at theWork Flowlevel
GitHub - SomeOddCodeGuy/WilmerAI: WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only create practical chatbots, but to extend any kind of application that connects to an LLM via REST API. Wilmer sits between your app and your many LLM APIs, so that you can manipulate prompts as needed.