Distributed Language Model

Team Compute TEAM SECURITY

admin May 5, 2025, 5:53am 1

There are many ways of creating LLM clusters

VLLM
Distributed Inference and Serving - vLLM
Special ClGPUStack
Introducing GPUStack 0.2: heterogeneous distributed inference, CPU inference and scheduling strategies – GPUStack.ai

While the above can all be used in Disposable Node, we prefer clustering at a higher level:

Open WebUI
Merge responses from multiple language models at the User Interface level
How to enable merged model responses? · open-webui/open-webui · Discussion #4934 · GitHub
WilmerAI
Merge responses from multiple language models at the Work Flow level
GitHub - SomeOddCodeGuy/WilmerAI: What If Language Models Expertly Routed All Inference? WilmerAI allows prompts to be routed to specialized workflows based on the domain chosen by your LLM. Also allows chat Assistants to be powered by multiple LLMs working in tandem to generate a response. Compatible with Socg's Offline Wikipedia Article API.

admin May 5, 2025, 6:19am 2

Model Routing

GitHub - Not-Diamond/awesome-ai-model-routing: A curated list of awesome approaches to AI model routing