Distributed
AI workloads can be distributed across many processors.
Distribution
-
Distrubuted Llama
GitHub - b4rtaz/distributed-llama: Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference. -
Exolabs
https://exolabs.net/ -
Petals
GitHub - bigscience-workshop/petals: 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Decentralized Distributed LLM- Using AI in Cost Effective and Environment Friendly way | by Vishal Mysore | Data Scientists from Future | Medium -
AI Horde
https://stablehorde.net/
Parallel
Multiple GPUs can be used with llama.cpp now: