Qwen

Qwen

Recommended Qwen Model

  • qwen3-vl:30b-a3b-instruct-q8_0 (62GB) - for apps requiring 8-bits

Legacy Qwen Models

Still supported but not recommended for new installs.

  • qwq
  • qwen2.5
  • qwen2.5-vl

MoE Models

Qwen 3 MoE models need more bits in quantisation, try keep to q8 or above for MoE models.

Qwen 3 Dense models still functions with acceptable quality at quantisations down to q4.

Quantisation

Qwen 2.5

The recent Qwen2.5 release has pushed open source large language models (LLMs) to new heights, beating the previous open source leader Llama 3.1 across a number of benchmarks.

Qwen-2.5-7B-Q4 model is now available by default (along with Llama-3.1-8B-Q4 model) on most public Compute Asset e,g. model.aunsw.88.io

For Compute Assets with 16GB+ of VRAM on GPUs, running Qwen2.5-14B-Q8 model is recommended.

Licenses

Becareful that the 3B and 72B variants of Qwen 2.5 ave some restrictions on commercial use, the others licensed under Apache 2.0 which is ok for most type of use.

Limited Resources

For those with low-end GPUs:

Ollama - qwen2.5-14b (9GB)

QwQ

Notes:

Recommended:

  • num_ctx of 8192 - context length
  • top_k of 30 - considers top 30 tokens
  • temperature of 0.6
  • top_p of 0.95 - considers top 95% tokens (higher more diverse, lower more focused)

Ollama - qwq-32b (20GB)

Qwen 3

Qwen 3 is a hybrid model, so it can operate in either /think or /no_think mode. Qwen 3 also has Dense models and MoE (mixture of experts) models.

qwen3:32b-q4_K_M

32b is the top dense model available.
41GB. 79%GPU 21%CPU

qwen3:14b-q4_K_M

14b is the recommended Dense model for most use cases.
15GB 100%GPU

qwen3:30b-a3b-q4_K_M

30b is the a small MoE model.
27GB 100%GPU

Qwen2.5 VL

For the 26.02 release the Advanced Vision Model is Qwen2.5-VL-72B-Instruct-4bit.

Deployment

Ollama

MLX

References

Qwen3-VL

Qwen3-VL supports text, images, videos and agents

Qwen3-vl vision transformers (ViTs) employs full self-attention, whereas Qwen2.5-VL's ViTs primarily uses the more efficient window-based attention.

Integrated Learning

Joint pretraining on text and images teaches the model deeper relationships between concepts, boosting its language skills beyond what a text-only model learns.

Long Context

Features a massive 256K-token context window for both text and interleaved inputs.

Many Languages

Supports 119 languages and dialects (32 of them have OCR ability).

Language Family Languages & Dialects
Indo-European English, French, Portuguese, German, Romanian, Swedish, Danish, Bulgarian, Russian, Czech, Greek, Ukrainian, Spanish, Dutch, Slovak, Croatian, Polish, Lithuanian, Norwegian Bokmål, Norwegian Nynorsk, Persian, Slovenian, Gujarati, Latvian, Italian, Occitan, Nepali, Marathi, Belarusian, Serbian, Luxembourgish, Venetian, Assamese, Welsh, Silesian, Asturian, Chhattisgarhi, Awadhi, Maithili, Bhojpuri, Sindhi, Irish, Faroese, Hindi, Punjabi, Bengali, Oriya, Tajik, Eastern Yiddish, Lombard, Ligurian, Sicilian, Friulian, Sardinian, Galician, Catalan, Icelandic, Tosk Albanian, Limburgish, Dari, Afrikaans, Macedonian, Sinhala, Urdu, Magahi, Bosnian, Armenian
Sino-Tibetan Chinese (Simplified Chinese, Traditional Chinese, Cantonese), Burmese
Afro-Asiatic Arabic (Standard, Najdi, Levantine, Egyptian, Moroccan, Mesopotamian, Ta’izzi-Adeni, Tunisian), Hebrew, Maltese
Austronesian Indonesian, Malay, Tagalog, Cebuano, Javanese, Sundanese, Minangkabau, Balinese, Banjar, Pangasinan, Iloko, Waray (Philippines)
Dravidian Tamil, Telugu, Kannada, Malayalam
Turkic Turkish, North Azerbaijani, Northern Uzbek, Kazakh, Bashkir, Tatar
Tai-Kadai Thai, Lao
Uralic Finnish, Estonian, Hungarian
Austroasiatic Vietnamese, Khmer
Other Japanese, Korean, Georgian, Basque, Haitian, Papiamento, Kabuverdianu, Tok Pisin, Swahili