There are many ways of creating LLM clusters
-
Special ClGPUStack
Introducing GPUStack 0.2: heterogeneous distributed inference, CPU inference and scheduling strategies – GPUStack.ai
While the above can all be used in Disposable Node, we prefer clustering at a higher level:
-
Open WebUI
Merge responses from multiple language models at theUser Interface
level
How to enable merged model responses? · open-webui/open-webui · Discussion #4934 · GitHub -
WilmerAI
Merge responses from multiple language models at theWork Flow
level
GitHub - SomeOddCodeGuy/WilmerAI: What If Language Models Expertly Routed All Inference? WilmerAI allows prompts to be routed to specialized workflows based on the domain chosen by your LLM. Also allows chat Assistants to be powered by multiple LLMs working in tandem to generate a response. Compatible with Socg's Offline Wikipedia Article API.