GLM is a family of LLM released by Z.ai
GLM-4.7-Flash
GLM-4.7-Flash is a 30B-A3B MoE thinking model that is very competitive with others of similar size.
| Benchmark | GLM-4.7-Flash | Nemotron-3-Nano-30B-A3B | Qwen3-30B-A3B-Thinking-2507 | GPT-OSS-20B |
|---|---|---|---|---|
| AIME 25 | 91.6 | 89.1 | 85.0 | 91.7 |
| GPQA | 75.2 | 73.0 | 73.4 | 71.5 |
| LCB v6 | 64.0 | 66.0 | 61.0 | |
| HLE | 14.4 | 10.6 | 9.8 | 10.9 |
| SWE-bench | 59.2 | 38.8 | 22.0 | 34.0 |
| τ²-Bench | 79.5 | 49.0 | 49.0 | 47.7 |
| BrowseComp | 42.8 | 2.29 | 28.3 |
Ollama
- glm-4.7-flash:q8_0
- Note we are only using 32K out of 198K context window