Qwen

Qwen

Recommended Models:

Qwen 2.5

The recent Qwen2.5 release has pushed open source large language models (LLMs) to new heights, beating the previous open source leader Llama 3.1 across a number of benchmarks.

Qwen-2.5-7B-Q4 model is now available by default (along with Llama-3.1-8B-Q4 model) on most public Compute Asset e,g. model.aunsw.88.io

For Compute Assets with 16GB+ of VRAM on GPUs, running Qwen2.5-14B-Q8 model is recommended.

Licenses

Becareful that the 3B and 72B variants of Qwen 2.5 ave some restrictions on commercial use, the others licensed under Apache 2.0 which is ok for most type of use.

Limited Resources

For those with low-end GPUs:

QwQ

Notes:

  • Content Window bigger than 8K tokens needs special YaRN setup.
    魔搭社区

Recommended:

  • num_ctx of 8192 - context length
  • top_k of 30 - considers top 30 tokens
  • temperature of 0.6
  • top_p of 0.95 - considers top 95% tokens (higher more diverse, lower more focused)

Qwen 3

Qwen 3 is a hybrid model, so it can operate in either /think or /no_think mode. Qwen 3 also has Dense models and MoE (mixture of experts) models.

qwen3:32b-q4_K_M

32b is the top dense model available.
41GB. 79%GPU 21%CPU

qwen3:14b-q4_K_M

14b is the recommended Dense model for most use cases.
15GB 100%GPU

qwen3:30b-a3b-q4_K_M

30b is the a small MoE model.
27GB 100%GPU