Loading GPU data...
Loading GPU data...
Quick Answer: RX 7900 XTX offers 24GB VRAM and starts around $939.99. It delivers approximately 66 tokens/sec on meta-llama/Llama-Guard-3-1B. It typically draws 355W under load.
RX 7900 XTX gives AMD builders a 24GB option with competitive throughput for 7B–13B LLMs and diffusion workloads. Use ROCm-compatible stacks like llama.cpp or vLLM (AMD fork).
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| meta-llama/Llama-Guard-3-1B | Q4 | 66.33 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 64.98 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 64.08 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q4 | 64.05 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 61.58 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 60.78 tok/sEstimated Auto-generated benchmark | 1GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 59.35 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 56.49 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 55.82 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 49.24 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q4 | 46.00 tok/sEstimated Auto-generated benchmark | 1GB |
| LiquidAI/LFM2-1.2B | Q4 | 44.68 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q8 | 44.59 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B | Q8 | 44.42 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 44.32 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 44.03 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 43.85 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q8 | 43.84 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 43.83 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | Q4 | 42.31 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 42.29 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 42.14 tok/sEstimated Auto-generated benchmark | 1GB |
| google-t5/t5-3b | Q4 | 41.62 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 41.51 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 41.35 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 41.15 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 40.80 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 40.26 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 40.11 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 40.06 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 39.65 tok/sEstimated Auto-generated benchmark | 2GB |
| allenai/OLMo-2-0425-1B | Q8 | 39.41 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 39.38 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 39.27 tok/sEstimated Auto-generated benchmark | 1GB |
| bigcode/starcoder2-3b | Q4 | 39.06 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B | Q4 | 38.94 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 38.61 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 38.27 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 38.26 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 37.84 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 37.79 tok/sEstimated Auto-generated benchmark | 2GB |
| inference-net/Schematron-3B | Q4 | 37.57 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/VibeVoice-1.5B | Q4 | 37.29 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 36.31 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Base | Q4 | 36.21 tok/sEstimated Auto-generated benchmark | 2GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 36.16 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2b | Q8 | 35.92 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 35.89 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 35.73 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-0.5B | Q4 | 35.54 tok/sEstimated Auto-generated benchmark | 3GB |
| LiquidAI/LFM2-1.2B | Q8 | 35.33 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 34.70 tok/sEstimated Auto-generated benchmark | 3GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 34.42 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 34.34 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 34.30 tok/sEstimated Auto-generated benchmark | 2GB |
| skt/kogpt2-base-v2 | Q4 | 33.04 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 32.98 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 32.79 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 32.66 tok/sEstimated Auto-generated benchmark | 3GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 32.42 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 32.41 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 32.38 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 32.34 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 32.24 tok/sEstimated Auto-generated benchmark | 3GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 32.19 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 32.16 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 32.11 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q4 | 32.07 tok/sEstimated Auto-generated benchmark | 3GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 32.06 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 32.00 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 31.68 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 31.59 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 31.51 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B | Q4 | 31.44 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 31.42 tok/sEstimated Auto-generated benchmark | 5GB |
| numind/NuExtract-1.5 | Q4 | 31.39 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 31.25 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 31.24 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 31.16 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 31.11 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B | Q8 | 31.07 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-8B | Q4 | 30.95 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 30.77 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-2-2b-it | Q8 | 30.74 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 30.73 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 30.66 tok/sEstimated Auto-generated benchmark | 3GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 30.66 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 30.63 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 30.44 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 30.40 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 30.32 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 30.29 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 30.28 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 30.23 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 30.12 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 30.08 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 30.01 tok/sEstimated Auto-generated benchmark | 4GB |
| parler-tts/parler-tts-large-v1 | Q4 | 29.90 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 29.89 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 29.87 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 29.85 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 29.75 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B | Q8 | 29.75 tok/sEstimated Auto-generated benchmark | 3GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 29.75 tok/sEstimated Auto-generated benchmark | 4GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 29.72 tok/sEstimated Auto-generated benchmark | 3GB |
| liuhaotian/llava-v1.5-7b | Q4 | 29.71 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 29.68 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 29.61 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.1-8B | Q4 | 29.43 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 29.39 tok/sEstimated Auto-generated benchmark | 3GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 29.35 tok/sEstimated Auto-generated benchmark | 5GB |
| rednote-hilab/dots.ocr | Q4 | 29.27 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 29.25 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2 | Q4 | 29.23 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 29.20 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 29.19 tok/sEstimated Auto-generated benchmark | 3GB |
| microsoft/phi-2 | Q4 | 29.15 tok/sEstimated Auto-generated benchmark | 4GB |
| google-t5/t5-3b | Q8 | 29.15 tok/sEstimated Auto-generated benchmark | 3GB |
| huggyllama/llama-7b | Q4 | 29.09 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 28.92 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 28.91 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q4 | 28.91 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 28.90 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 28.88 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 28.86 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 28.83 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 28.79 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 28.68 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 28.68 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 28.64 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 28.59 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 28.47 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 28.43 tok/sEstimated Auto-generated benchmark | 4GB |
| google/gemma-3-270m-it | Q4 | 28.39 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 28.28 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 28.22 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 28.13 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 28.08 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-research/PowerMoE-3b | Q8 | 28.07 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 28.06 tok/sEstimated Auto-generated benchmark | 3GB |
| distilbert/distilgpt2 | Q4 | 27.99 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 27.87 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 27.86 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 27.79 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 27.65 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 27.58 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 27.50 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 27.50 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 27.45 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 27.45 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 27.44 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 27.38 tok/sEstimated Auto-generated benchmark | 4GB |
| petals-team/StableBeluga2 | Q4 | 27.35 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 27.31 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | Q8 | 27.31 tok/sEstimated Auto-generated benchmark | 3GB |
| openai-community/gpt2-large | Q4 | 27.27 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 27.26 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 27.17 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 27.16 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 27.12 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 27.10 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 27.02 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 26.92 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 26.89 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen3-4B-Base | Q8 | 26.87 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 26.73 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 26.70 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 26.53 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 26.52 tok/sEstimated Auto-generated benchmark | 4GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 26.43 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 26.27 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 25.82 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 25.78 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 25.77 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q8 | 25.54 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 25.42 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-14B | Q4 | 24.89 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 24.73 tok/sEstimated Auto-generated benchmark | 5GB |
| google/gemma-2-9b-it | Q4 | 24.50 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 24.31 tok/sEstimated Auto-generated benchmark | 6GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 24.15 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 24.00 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q8 | 23.97 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 23.88 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 23.63 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/VibeVoice-1.5B | Q8 | 23.62 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-4B | Q8 | 23.50 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 23.38 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 23.37 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-14B-Base | Q4 | 23.13 tok/sEstimated Auto-generated benchmark | 7GB |
| bigscience/bloomz-560m | Q8 | 23.12 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 23.11 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-1.7B | Q8 | 23.06 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 23.05 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 23.00 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 22.97 tok/sEstimated Auto-generated benchmark | 7GB |
| EleutherAI/pythia-70m-deduped | Q8 | 22.92 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-large | Q8 | 22.92 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 22.65 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 22.64 tok/sEstimated Auto-generated benchmark | 5GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 22.63 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 22.61 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 22.59 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 22.56 tok/sEstimated Auto-generated benchmark | 5GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 22.52 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 22.52 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 22.50 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 22.48 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 22.43 tok/sEstimated Auto-generated benchmark | 5GB |
| vikhyatk/moondream2 | Q8 | 22.38 tok/sEstimated Auto-generated benchmark | 7GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 22.38 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 22.35 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 22.32 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 22.29 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 22.26 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 22.22 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 22.18 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | Q8 | 22.17 tok/sEstimated Auto-generated benchmark | 7GB |
| liuhaotian/llava-v1.5-7b | Q8 | 22.12 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 22.12 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 22.07 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 22.04 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2 | Q8 | 22.02 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 21.99 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-2 | Q8 | 21.97 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 21.91 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 21.91 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 21.82 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 21.79 tok/sEstimated Auto-generated benchmark | 7GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 21.73 tok/sEstimated Auto-generated benchmark | 10GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 21.63 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2-0.5B | Q8 | 21.62 tok/sEstimated Auto-generated benchmark | 5GB |
| zai-org/GLM-4.5-Air | Q8 | 21.60 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 21.60 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 21.59 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 21.53 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 21.53 tok/sEstimated Auto-generated benchmark | 6GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 21.44 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 21.36 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-Base | Q8 | 21.33 tok/sEstimated Auto-generated benchmark | 8GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 21.32 tok/sEstimated Auto-generated benchmark | 8GB |
| dicta-il/dictalm2.0-instruct | Q8 | 21.17 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 21.09 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 21.04 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 20.98 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-14B | Q4 | 20.91 tok/sEstimated Auto-generated benchmark | 7GB |
| skt/kogpt2-base-v2 | Q8 | 20.90 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 20.84 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 20.79 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 20.75 tok/sEstimated Auto-generated benchmark | 10GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 20.72 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 20.68 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 20.57 tok/sEstimated Auto-generated benchmark | 8GB |
| parler-tts/parler-tts-large-v1 | Q8 | 20.54 tok/sEstimated Auto-generated benchmark | 7GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 20.49 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/phi-4 | Q8 | 20.42 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 20.40 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/DialoGPT-medium | Q8 | 20.35 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 20.34 tok/sEstimated Auto-generated benchmark | 7GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 20.29 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 20.18 tok/sEstimated Auto-generated benchmark | 6GB |
| EleutherAI/gpt-neo-125m | Q8 | 20.08 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-medium | Q8 | 20.08 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 20.07 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 20.05 tok/sEstimated Auto-generated benchmark | 8GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 20.05 tok/sEstimated Auto-generated benchmark | 8GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 20.04 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 19.95 tok/sEstimated Auto-generated benchmark | 8GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 19.93 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 19.88 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 19.83 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 19.82 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 19.81 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 19.74 tok/sEstimated Auto-generated benchmark | 10GB |
| meta-llama/Llama-3.1-8B | Q8 | 19.70 tok/sEstimated Auto-generated benchmark | 8GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 19.68 tok/sEstimated Auto-generated benchmark | 8GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 19.67 tok/sEstimated Auto-generated benchmark | 9GB |
| ibm-granite/granite-docling-258M | Q8 | 19.63 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 19.63 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 19.61 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 19.57 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.6-FP8 | Q8 | 19.55 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 19.46 tok/sEstimated Auto-generated benchmark | 9GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 19.44 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-2-7b-hf | Q8 | 19.43 tok/sEstimated Auto-generated benchmark | 7GB |
| numind/NuExtract-1.5 | Q8 | 19.42 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 19.24 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 19.19 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-270m-it | Q8 | 19.14 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 19.14 tok/sEstimated Auto-generated benchmark | 8GB |
| google/gemma-2-9b-it | Q8 | 19.11 tok/sEstimated Auto-generated benchmark | 11GB |
| petals-team/StableBeluga2 | Q8 | 19.02 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 19.01 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 18.93 tok/sEstimated Auto-generated benchmark | 8GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 18.80 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 18.67 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 18.47 tok/sEstimated Auto-generated benchmark | 8GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 18.45 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B | Q8 | 18.40 tok/sEstimated Auto-generated benchmark | 8GB |
| openai/gpt-oss-20b | Q4 | 18.35 tok/sEstimated Auto-generated benchmark | 10GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 18.25 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 17.55 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 17.43 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 17.39 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B | Q4 | 17.31 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 17.23 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-32B | Q4 | 17.01 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 16.91 tok/sEstimated Auto-generated benchmark | 19GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 16.90 tok/sEstimated Auto-generated benchmark | 15GB |
| google/gemma-2-27b-it | Q4 | 16.90 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 16.86 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 16.77 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-32B | Q4 | 16.72 tok/sEstimated Auto-generated benchmark | 16GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 16.71 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-14B | Q8 | 16.61 tok/sEstimated Auto-generated benchmark | 14GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 16.50 tok/sEstimated Auto-generated benchmark | 19GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 16.47 tok/sEstimated Auto-generated benchmark | 16GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 16.46 tok/sEstimated Auto-generated benchmark | 16GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 16.41 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/QwQ-32B-Preview | Q4 | 16.29 tok/sEstimated Auto-generated benchmark | 19GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 16.26 tok/sEstimated Auto-generated benchmark | 15GB |
| codellama/CodeLlama-34b-hf | Q4 | 16.19 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 16.03 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 15.79 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-14B | Q8 | 15.67 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 15.54 tok/sEstimated Auto-generated benchmark | 17GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 15.49 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 15.41 tok/sEstimated Auto-generated benchmark | 16GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 15.40 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 15.37 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 15.02 tok/sEstimated Auto-generated benchmark | 19GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 14.84 tok/sEstimated Auto-generated benchmark | 20GB |
| Qwen/Qwen3-14B-Base | Q8 | 14.60 tok/sEstimated Auto-generated benchmark | 14GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 14.27 tok/sEstimated Auto-generated benchmark | 20GB |
| openai/gpt-oss-20b | Q8 | 14.16 tok/sEstimated Auto-generated benchmark | 20GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 13.06 tok/sEstimated Auto-generated benchmark | 20GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | Not supported | — | 79GB (have 24GB) |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | Not supported | — | 40GB (have 24GB) |
| 01-ai/Yi-1.5-34B-Chat | Q8 | Not supported | — | 39GB (have 24GB) |
| 01-ai/Yi-1.5-34B-Chat | Q4 | Fits comfortably | 16.41 tok/sEstimated | 20GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | Fits comfortably | 19.67 tok/sEstimated | 9GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | Fits comfortably | 29.35 tok/sEstimated | 5GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | Not supported | — | 79GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | Not supported | — | 40GB (have 24GB) |
| microsoft/Phi-3-medium-128k-instruct | Q8 | Fits comfortably | 15.37 tok/sEstimated | 16GB (have 24GB) |
| microsoft/Phi-3-medium-128k-instruct | Q4 | Fits comfortably | 24.15 tok/sEstimated | 8GB (have 24GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 24.73 tok/sEstimated | 5GB (have 24GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 40.80 tok/sEstimated | 3GB (have 24GB) |
| google/gemma-2-9b-it | Q8 | Fits comfortably | 19.11 tok/sEstimated | 11GB (have 24GB) |
| google/gemma-2-9b-it | Q4 | Fits comfortably | 24.50 tok/sEstimated | 6GB (have 24GB) |
| google/gemma-2-27b-it | Q8 | Not supported | — | 31GB (have 24GB) |
| google/gemma-2-27b-it | Q4 | Fits comfortably | 16.90 tok/sEstimated | 16GB (have 24GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Not supported | — | 25GB (have 24GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | Fits comfortably | 17.23 tok/sEstimated | 13GB (have 24GB) |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | Not supported | — | 138GB (have 24GB) |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | Not supported | — | 69GB (have 24GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | Not supported | — | 158GB (have 24GB) |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | Not supported | — | 79GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 28.13 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 39.38 tok/sEstimated | 2GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 21.53 tok/sEstimated | 9GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 30.29 tok/sEstimated | 5GB (have 24GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 79GB (have 24GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 40GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 79GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 40GB (have 24GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | Not supported | — | 38GB (have 24GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | Fits comfortably | 16.50 tok/sEstimated | 19GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | Not supported | — | 264GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | Not supported | — | 132GB (have 24GB) |
| deepseek-ai/DeepSeek-V2.5 | Q8 | Not supported | — | 264GB (have 24GB) |
| deepseek-ai/DeepSeek-V2.5 | Q4 | Not supported | — | 132GB (have 24GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | Not supported | — | 82GB (have 24GB) |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | Not supported | — | 41GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 19.46 tok/sEstimated | 9GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 31.42 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 15.54 tok/sEstimated | 17GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 23.11 tok/sEstimated | 9GB (have 24GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 37GB (have 24GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits comfortably | 16.91 tok/sEstimated | 19GB (have 24GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | Not supported | — | 37GB (have 24GB) |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | Fits comfortably | 15.02 tok/sEstimated | 19GB (have 24GB) |
| Qwen/QwQ-32B-Preview | Q8 | Not supported | — | 37GB (have 24GB) |
| Qwen/QwQ-32B-Preview | Q4 | Fits comfortably | 16.29 tok/sEstimated | 19GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 82GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 41GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | Not supported | — | 80GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | Not supported | — | 40GB (have 24GB) |
| ai-forever/ruGPT-3.5-13B | Q8 | Fits comfortably | 15.49 tok/sEstimated | 13GB (have 24GB) |
| ai-forever/ruGPT-3.5-13B | Q4 | Fits comfortably | 22.52 tok/sEstimated | 7GB (have 24GB) |
| baichuan-inc/Baichuan-M2-32B | Q8 | Not supported | — | 32GB (have 24GB) |
| baichuan-inc/Baichuan-M2-32B | Q4 | Fits comfortably | 16.47 tok/sEstimated | 16GB (have 24GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 22.65 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 31.68 tok/sEstimated | 4GB (have 24GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 19.81 tok/sEstimated | 8GB (have 24GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 26.70 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 21.91 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 31.24 tok/sEstimated | 4GB (have 24GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | Fits comfortably | 13.06 tok/sEstimated | 20GB (have 24GB) |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | Fits comfortably | 20.75 tok/sEstimated | 10GB (have 24GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 20.72 tok/sEstimated | 7GB (have 24GB) |
| BSC-LT/salamandraTA-7b-instruct | Q4 | Fits comfortably | 32.19 tok/sEstimated | 4GB (have 24GB) |
| dicta-il/dictalm2.0-instruct | Q8 | Fits comfortably | 21.17 tok/sEstimated | 7GB (have 24GB) |
| dicta-il/dictalm2.0-instruct | Q4 | Fits comfortably | 29.25 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | Not supported | — | 30GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | Fits comfortably | 18.45 tok/sEstimated | 15GB (have 24GB) |
| GSAI-ML/LLaDA-8B-Base | Q8 | Fits comfortably | 20.05 tok/sEstimated | 8GB (have 24GB) |
| GSAI-ML/LLaDA-8B-Base | Q4 | Fits comfortably | 28.79 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | — | 30GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Fits comfortably | 18.25 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen2-0.5B-Instruct | Q8 | Fits comfortably | 23.37 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 36.31 tok/sEstimated | 3GB (have 24GB) |
| deepseek-ai/DeepSeek-V3 | Q8 | Fits comfortably | 21.36 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-V3 | Q4 | Fits comfortably | 28.88 tok/sEstimated | 4GB (have 24GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | Fits comfortably | 23.63 tok/sEstimated | 5GB (have 24GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 32.79 tok/sEstimated | 3GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | — | 30GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | Fits comfortably | 16.03 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | Fits comfortably | 17.39 tok/sEstimated | 15GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | Not supported | — | 30GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | Fits comfortably | 16.26 tok/sEstimated | 15GB (have 24GB) |
| AI-MO/Kimina-Prover-72B | Q8 | Not supported | — | 72GB (have 24GB) |
| AI-MO/Kimina-Prover-72B | Q4 | Not supported | — | 36GB (have 24GB) |
| apple/OpenELM-1_1B-Instruct | Q8 | Fits comfortably | 40.11 tok/sEstimated | 1GB (have 24GB) |
| apple/OpenELM-1_1B-Instruct | Q4 | Fits comfortably | 59.35 tok/sEstimated | 1GB (have 24GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 20.05 tok/sEstimated | 8GB (have 24GB) |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 27.65 tok/sEstimated | 4GB (have 24GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | Fits comfortably | 19.44 tok/sEstimated | 9GB (have 24GB) |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | Fits comfortably | 26.43 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-3B | Q8 | Fits comfortably | 31.07 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-3B | Q4 | Fits comfortably | 44.03 tok/sEstimated | 2GB (have 24GB) |
| lmsys/vicuna-7b-v1.5 | Q8 | Fits comfortably | 20.29 tok/sEstimated | 7GB (have 24GB) |
| lmsys/vicuna-7b-v1.5 | Q4 | Fits comfortably | 32.06 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-2-13b-chat-hf | Q8 | Fits comfortably | 16.86 tok/sEstimated | 13GB (have 24GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 21.44 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | — | 80GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | Not supported | — | 40GB (have 24GB) |
| unsloth/gemma-3-1b-it | Q8 | Fits comfortably | 43.84 tok/sEstimated | 1GB (have 24GB) |
| unsloth/gemma-3-1b-it | Q4 | Fits comfortably | 64.98 tok/sEstimated | 1GB (have 24GB) |
| bigcode/starcoder2-3b | Q8 | Fits comfortably | 26.73 tok/sEstimated | 3GB (have 24GB) |
| bigcode/starcoder2-3b | Q4 | Fits comfortably | 39.06 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | Not supported | — | 80GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | Not supported | — | 40GB (have 24GB) |
| ibm-granite/granite-docling-258M | Q8 | Fits comfortably | 19.63 tok/sEstimated | 7GB (have 24GB) |
| ibm-granite/granite-docling-258M | Q4 | Fits comfortably | 30.40 tok/sEstimated | 4GB (have 24GB) |
| skt/kogpt2-base-v2 | Q8 | Fits comfortably | 20.90 tok/sEstimated | 7GB (have 24GB) |
| skt/kogpt2-base-v2 | Q4 | Fits comfortably | 33.04 tok/sEstimated | 4GB (have 24GB) |
| google/gemma-3-270m-it | Q8 | Fits comfortably | 19.14 tok/sEstimated | 7GB (have 24GB) |
| google/gemma-3-270m-it | Q4 | Fits comfortably | 28.39 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | Fits comfortably | 24.00 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | Fits comfortably | 37.84 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen2.5-32B | Q8 | Not supported | — | 32GB (have 24GB) |
| Qwen/Qwen2.5-32B | Q4 | Fits comfortably | 17.01 tok/sEstimated | 16GB (have 24GB) |
| parler-tts/parler-tts-large-v1 | Q8 | Fits comfortably | 20.54 tok/sEstimated | 7GB (have 24GB) |
| parler-tts/parler-tts-large-v1 | Q4 | Fits comfortably | 29.90 tok/sEstimated | 4GB (have 24GB) |
| EleutherAI/pythia-70m-deduped | Q8 | Fits comfortably | 22.92 tok/sEstimated | 7GB (have 24GB) |
| EleutherAI/pythia-70m-deduped | Q4 | Fits comfortably | 27.50 tok/sEstimated | 4GB (have 24GB) |
| microsoft/VibeVoice-1.5B | Q8 | Fits comfortably | 23.62 tok/sEstimated | 5GB (have 24GB) |
| microsoft/VibeVoice-1.5B | Q4 | Fits comfortably | 37.29 tok/sEstimated | 3GB (have 24GB) |
| ibm-granite/granite-3.3-2b-instruct | Q8 | Fits comfortably | 34.30 tok/sEstimated | 2GB (have 24GB) |
| ibm-granite/granite-3.3-2b-instruct | Q4 | Fits comfortably | 43.83 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | — | 72GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q4 | Not supported | — | 36GB (have 24GB) |
| liuhaotian/llava-v1.5-7b | Q8 | Fits comfortably | 22.12 tok/sEstimated | 7GB (have 24GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 29.71 tok/sEstimated | 4GB (have 24GB) |
| google/gemma-2b | Q8 | Fits comfortably | 35.92 tok/sEstimated | 2GB (have 24GB) |
| google/gemma-2b | Q4 | Fits comfortably | 46.00 tok/sEstimated | 1GB (have 24GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | Fits comfortably | 22.63 tok/sEstimated | 7GB (have 24GB) |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | Fits comfortably | 29.75 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-235B-A22B | Q8 | Not supported | — | 235GB (have 24GB) |
| Qwen/Qwen3-235B-A22B | Q4 | Not supported | — | 118GB (have 24GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | Fits comfortably | 20.57 tok/sEstimated | 8GB (have 24GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | Fits comfortably | 26.52 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-4-mini-instruct | Q8 | Fits comfortably | 21.79 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-4-mini-instruct | Q4 | Fits comfortably | 30.44 tok/sEstimated | 4GB (have 24GB) |
| llamafactory/tiny-random-Llama-3 | Q8 | Fits comfortably | 22.52 tok/sEstimated | 7GB (have 24GB) |
| llamafactory/tiny-random-Llama-3 | Q4 | Fits comfortably | 32.42 tok/sEstimated | 4GB (have 24GB) |
| HuggingFaceH4/zephyr-7b-beta | Q8 | Fits comfortably | 23.05 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 29.20 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | Fits comfortably | 27.10 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | Fits comfortably | 38.26 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | Fits comfortably | 17.55 tok/sEstimated | 15GB (have 24GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | Fits comfortably | 19.14 tok/sEstimated | 8GB (have 24GB) |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | Fits comfortably | 26.27 tok/sEstimated | 4GB (have 24GB) |
| unsloth/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 42.14 tok/sEstimated | 1GB (have 24GB) |
| unsloth/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 61.58 tok/sEstimated | 1GB (have 24GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | Fits comfortably | 20.98 tok/sEstimated | 8GB (have 24GB) |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | Fits comfortably | 31.11 tok/sEstimated | 4GB (have 24GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | Not supported | — | 90GB (have 24GB) |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | Not supported | — | 45GB (have 24GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | Fits comfortably | 21.09 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | Fits comfortably | 27.17 tok/sEstimated | 4GB (have 24GB) |
| numind/NuExtract-1.5 | Q8 | Fits comfortably | 19.42 tok/sEstimated | 7GB (have 24GB) |
| numind/NuExtract-1.5 | Q4 | Fits comfortably | 31.39 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | Fits comfortably | 22.50 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 30.77 tok/sEstimated | 4GB (have 24GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 20.49 tok/sEstimated | 7GB (have 24GB) |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 27.16 tok/sEstimated | 4GB (have 24GB) |
| huggyllama/llama-7b | Q8 | Fits comfortably | 22.61 tok/sEstimated | 7GB (have 24GB) |
| huggyllama/llama-7b | Q4 | Fits comfortably | 29.09 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | Fits comfortably | 22.26 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | Fits comfortably | 27.86 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-3-mini-128k-instruct | Q8 | Fits comfortably | 22.18 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-3-mini-128k-instruct | Q4 | Fits comfortably | 30.28 tok/sEstimated | 4GB (have 24GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 21.82 tok/sEstimated | 7GB (have 24GB) |
| sshleifer/tiny-gpt2 | Q4 | Fits comfortably | 32.41 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-Guard-3-8B | Q8 | Fits comfortably | 18.67 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Llama-Guard-3-8B | Q4 | Fits comfortably | 30.12 tok/sEstimated | 4GB (have 24GB) |
| openai-community/gpt2-xl | Q8 | Fits comfortably | 22.17 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 27.50 tok/sEstimated | 4GB (have 24GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Fits comfortably | 16.71 tok/sEstimated | 14GB (have 24GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 22.38 tok/sEstimated | 7GB (have 24GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | Not supported | — | 70GB (have 24GB) |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | Not supported | — | 35GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | Fits comfortably | 25.82 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | Fits comfortably | 37.79 tok/sEstimated | 2GB (have 24GB) |
| ibm-research/PowerMoE-3b | Q8 | Fits comfortably | 28.07 tok/sEstimated | 3GB (have 24GB) |
| ibm-research/PowerMoE-3b | Q4 | Fits comfortably | 42.29 tok/sEstimated | 2GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | Fits comfortably | 25.78 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | Fits comfortably | 39.65 tok/sEstimated | 2GB (have 24GB) |
| unsloth/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 29.61 tok/sEstimated | 3GB (have 24GB) |
| unsloth/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 40.06 tok/sEstimated | 2GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | Fits comfortably | 23.88 tok/sEstimated | 4GB (have 24GB) |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | Fits comfortably | 40.26 tok/sEstimated | 2GB (have 24GB) |
| meta-llama/Llama-3.2-3B | Q8 | Fits comfortably | 29.75 tok/sEstimated | 3GB (have 24GB) |
| meta-llama/Llama-3.2-3B | Q4 | Fits comfortably | 42.31 tok/sEstimated | 2GB (have 24GB) |
| EleutherAI/gpt-neo-125m | Q8 | Fits comfortably | 20.08 tok/sEstimated | 7GB (have 24GB) |
| EleutherAI/gpt-neo-125m | Q4 | Fits comfortably | 30.08 tok/sEstimated | 4GB (have 24GB) |
| codellama/CodeLlama-34b-hf | Q8 | Not supported | — | 34GB (have 24GB) |
| codellama/CodeLlama-34b-hf | Q4 | Fits comfortably | 16.19 tok/sEstimated | 17GB (have 24GB) |
| meta-llama/Llama-Guard-3-1B | Q8 | Fits comfortably | 43.85 tok/sEstimated | 1GB (have 24GB) |
| meta-llama/Llama-Guard-3-1B | Q4 | Fits comfortably | 66.33 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen2-1.5B-Instruct | Q8 | Fits comfortably | 22.56 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2-1.5B-Instruct | Q4 | Fits comfortably | 31.51 tok/sEstimated | 3GB (have 24GB) |
| google/gemma-2-2b-it | Q8 | Fits comfortably | 30.74 tok/sEstimated | 2GB (have 24GB) |
| google/gemma-2-2b-it | Q4 | Fits comfortably | 49.24 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen2.5-14B | Q8 | Fits comfortably | 15.67 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen2.5-14B | Q4 | Fits comfortably | 20.91 tok/sEstimated | 7GB (have 24GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | Not supported | — | 32GB (have 24GB) |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | Fits comfortably | 16.46 tok/sEstimated | 16GB (have 24GB) |
| microsoft/Phi-3.5-mini-instruct | Q8 | Fits comfortably | 22.32 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-3.5-mini-instruct | Q4 | Fits comfortably | 31.16 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Base | Q8 | Fits comfortably | 26.87 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Base | Q4 | Fits comfortably | 36.21 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen2-7B-Instruct | Q8 | Fits comfortably | 19.82 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2-7B-Instruct | Q4 | Fits comfortably | 28.68 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-2-7b-chat-hf | Q8 | Fits comfortably | 20.84 tok/sEstimated | 7GB (have 24GB) |
| meta-llama/Llama-2-7b-chat-hf | Q4 | Fits comfortably | 30.01 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-14B-Base | Q8 | Fits comfortably | 14.60 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen3-14B-Base | Q4 | Fits comfortably | 23.13 tok/sEstimated | 7GB (have 24GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | Fits comfortably | 21.32 tok/sEstimated | 8GB (have 24GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 27.58 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-3.5-vision-instruct | Q8 | Fits comfortably | 19.57 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 29.87 tok/sEstimated | 4GB (have 24GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | Fits comfortably | 19.19 tok/sEstimated | 7GB (have 24GB) |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | Fits comfortably | 28.92 tok/sEstimated | 4GB (have 24GB) |
| rinna/japanese-gpt-neox-small | Q8 | Fits comfortably | 22.22 tok/sEstimated | 7GB (have 24GB) |
| rinna/japanese-gpt-neox-small | Q4 | Fits comfortably | 32.16 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | Fits comfortably | 25.42 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | Fits comfortably | 34.34 tok/sEstimated | 3GB (have 24GB) |
| IlyaGusev/saiga_llama3_8b | Q8 | Fits comfortably | 21.60 tok/sEstimated | 8GB (have 24GB) |
| IlyaGusev/saiga_llama3_8b | Q4 | Fits comfortably | 27.02 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-30B-A3B | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-30B-A3B | Q4 | Fits comfortably | 17.31 tok/sEstimated | 15GB (have 24GB) |
| deepseek-ai/DeepSeek-R1 | Q8 | Fits comfortably | 19.63 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-R1 | Q4 | Fits comfortably | 28.59 tok/sEstimated | 4GB (have 24GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 20.68 tok/sEstimated | 7GB (have 24GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 28.28 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-8B-FP8 | Q8 | Fits comfortably | 20.79 tok/sEstimated | 8GB (have 24GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 31.25 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | Fits comfortably | 16.90 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen3-Embedding-4B | Q8 | Fits comfortably | 25.77 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-Embedding-4B | Q4 | Fits comfortably | 35.73 tok/sEstimated | 2GB (have 24GB) |
| microsoft/Phi-4-multimodal-instruct | Q8 | Fits comfortably | 20.04 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-4-multimodal-instruct | Q4 | Fits comfortably | 32.34 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-8B-Base | Q8 | Fits comfortably | 21.33 tok/sEstimated | 8GB (have 24GB) |
| Qwen/Qwen3-8B-Base | Q4 | Fits comfortably | 26.53 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-0.6B-Base | Q8 | Fits comfortably | 22.29 tok/sEstimated | 6GB (have 24GB) |
| Qwen/Qwen3-0.6B-Base | Q4 | Fits comfortably | 32.24 tok/sEstimated | 3GB (have 24GB) |
| openai-community/gpt2-medium | Q8 | Fits comfortably | 20.08 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2-medium | Q4 | Fits comfortably | 28.91 tok/sEstimated | 4GB (have 24GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 19.93 tok/sEstimated | 7GB (have 24GB) |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 29.89 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-Math-1.5B | Q8 | Fits comfortably | 22.64 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-Math-1.5B | Q4 | Fits comfortably | 35.89 tok/sEstimated | 3GB (have 24GB) |
| HuggingFaceTB/SmolLM-135M | Q8 | Fits comfortably | 21.99 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceTB/SmolLM-135M | Q4 | Fits comfortably | 30.66 tok/sEstimated | 4GB (have 24GB) |
| unsloth/gpt-oss-20b-BF16 | Q8 | Fits comfortably | 14.84 tok/sEstimated | 20GB (have 24GB) |
| unsloth/gpt-oss-20b-BF16 | Q4 | Fits comfortably | 19.74 tok/sEstimated | 10GB (have 24GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | Not supported | — | 70GB (have 24GB) |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | Not supported | — | 35GB (have 24GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 18.80 tok/sEstimated | 8GB (have 24GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 30.73 tok/sEstimated | 4GB (have 24GB) |
| zai-org/GLM-4.5-Air | Q8 | Fits comfortably | 21.60 tok/sEstimated | 7GB (have 24GB) |
| zai-org/GLM-4.5-Air | Q4 | Fits comfortably | 32.11 tok/sEstimated | 4GB (have 24GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | Fits comfortably | 22.48 tok/sEstimated | 7GB (have 24GB) |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | Fits comfortably | 27.87 tok/sEstimated | 4GB (have 24GB) |
| LiquidAI/LFM2-1.2B | Q8 | Fits comfortably | 35.33 tok/sEstimated | 2GB (have 24GB) |
| LiquidAI/LFM2-1.2B | Q4 | Fits comfortably | 44.68 tok/sEstimated | 1GB (have 24GB) |
| mistralai/Mistral-7B-v0.1 | Q8 | Fits comfortably | 19.24 tok/sEstimated | 7GB (have 24GB) |
| mistralai/Mistral-7B-v0.1 | Q4 | Fits comfortably | 32.00 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-32B-Instruct | Q8 | Not supported | — | 32GB (have 24GB) |
| Qwen/Qwen2.5-32B-Instruct | Q4 | Fits comfortably | 15.41 tok/sEstimated | 16GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | Fits comfortably | 19.83 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | Fits comfortably | 27.31 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.1-8B | Q8 | Fits comfortably | 19.70 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Llama-3.1-8B | Q4 | Fits comfortably | 29.43 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-V3.1 | Q8 | Fits comfortably | 19.01 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-V3.1 | Q4 | Fits comfortably | 27.26 tok/sEstimated | 4GB (have 24GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 20.42 tok/sEstimated | 7GB (have 24GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 27.44 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 28.06 tok/sEstimated | 3GB (have 24GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 38.61 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 21.62 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 31.44 tok/sEstimated | 3GB (have 24GB) |
| MiniMaxAI/MiniMax-M2 | Q8 | Fits comfortably | 22.97 tok/sEstimated | 7GB (have 24GB) |
| MiniMaxAI/MiniMax-M2 | Q4 | Fits comfortably | 31.59 tok/sEstimated | 4GB (have 24GB) |
| microsoft/DialoGPT-medium | Q8 | Fits comfortably | 20.35 tok/sEstimated | 7GB (have 24GB) |
| microsoft/DialoGPT-medium | Q4 | Fits comfortably | 28.86 tok/sEstimated | 4GB (have 24GB) |
| zai-org/GLM-4.6-FP8 | Q8 | Fits comfortably | 19.55 tok/sEstimated | 7GB (have 24GB) |
| zai-org/GLM-4.6-FP8 | Q4 | Fits comfortably | 29.68 tok/sEstimated | 4GB (have 24GB) |
| HuggingFaceTB/SmolLM2-135M | Q8 | Fits comfortably | 21.91 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceTB/SmolLM2-135M | Q4 | Fits comfortably | 28.90 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 19.68 tok/sEstimated | 8GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 30.32 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-2-7b-hf | Q8 | Fits comfortably | 19.43 tok/sEstimated | 7GB (have 24GB) |
| meta-llama/Llama-2-7b-hf | Q4 | Fits comfortably | 27.38 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 20.34 tok/sEstimated | 7GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 30.63 tok/sEstimated | 4GB (have 24GB) |
| microsoft/phi-2 | Q8 | Fits comfortably | 21.97 tok/sEstimated | 7GB (have 24GB) |
| microsoft/phi-2 | Q4 | Fits comfortably | 29.15 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | Not supported | — | 70GB (have 24GB) |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | Not supported | — | 35GB (have 24GB) |
| Qwen/Qwen2.5-0.5B | Q8 | Fits comfortably | 23.97 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-0.5B | Q4 | Fits comfortably | 35.54 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen3-14B | Q8 | Fits comfortably | 16.61 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen3-14B | Q4 | Fits comfortably | 24.89 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 18.93 tok/sEstimated | 8GB (have 24GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 28.68 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | Not supported | — | 70GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | Not supported | — | 35GB (have 24GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | Fits comfortably | 19.95 tok/sEstimated | 8GB (have 24GB) |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | Fits comfortably | 28.47 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 17.43 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 21.04 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-1.5B | Q8 | Fits comfortably | 25.54 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-1.5B | Q4 | Fits comfortably | 32.07 tok/sEstimated | 3GB (have 24GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | Fits comfortably | 28.08 tok/sEstimated | 4GB (have 24GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 36.16 tok/sEstimated | 2GB (have 24GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | Fits comfortably | 14.27 tok/sEstimated | 20GB (have 24GB) |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | Fits comfortably | 21.73 tok/sEstimated | 10GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | Fits comfortably | 22.59 tok/sEstimated | 5GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | Fits comfortably | 30.66 tok/sEstimated | 3GB (have 24GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | Fits comfortably | 18.47 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | Fits comfortably | 26.92 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-Reranker-0.6B | Q8 | Fits comfortably | 21.53 tok/sEstimated | 6GB (have 24GB) |
| Qwen/Qwen3-Reranker-0.6B | Q4 | Fits comfortably | 29.85 tok/sEstimated | 3GB (have 24GB) |
| rednote-hilab/dots.ocr | Q8 | Fits comfortably | 19.88 tok/sEstimated | 7GB (have 24GB) |
| rednote-hilab/dots.ocr | Q4 | Fits comfortably | 29.27 tok/sEstimated | 4GB (have 24GB) |
| google-t5/t5-3b | Q8 | Fits comfortably | 29.15 tok/sEstimated | 3GB (have 24GB) |
| google-t5/t5-3b | Q4 | Fits comfortably | 41.62 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | Not supported | — | 30GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | Fits comfortably | 16.77 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen3-4B | Q8 | Fits comfortably | 23.50 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 38.94 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-1.7B | Q8 | Fits comfortably | 23.06 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-1.7B | Q4 | Fits comfortably | 32.98 tok/sEstimated | 4GB (have 24GB) |
| openai-community/gpt2-large | Q8 | Fits comfortably | 22.92 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2-large | Q4 | Fits comfortably | 27.27 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-3-mini-4k-instruct | Q8 | Fits comfortably | 20.07 tok/sEstimated | 7GB (have 24GB) |
| microsoft/Phi-3-mini-4k-instruct | Q4 | Fits comfortably | 27.45 tok/sEstimated | 4GB (have 24GB) |
| allenai/OLMo-2-0425-1B | Q8 | Fits comfortably | 39.41 tok/sEstimated | 1GB (have 24GB) |
| allenai/OLMo-2-0425-1B | Q4 | Fits comfortably | 64.08 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | Not supported | — | 80GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | Not supported | — | 40GB (have 24GB) |
| Qwen/Qwen3-32B | Q8 | Not supported | — | 32GB (have 24GB) |
| Qwen/Qwen3-32B | Q4 | Fits comfortably | 16.72 tok/sEstimated | 16GB (have 24GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 23.38 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 32.66 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-7B | Q8 | Fits comfortably | 19.61 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-7B | Q4 | Fits comfortably | 27.79 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Meta-Llama-3-8B | Q8 | Fits comfortably | 20.40 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Meta-Llama-3-8B | Q4 | Fits comfortably | 28.43 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.2-1B | Q8 | Fits comfortably | 44.42 tok/sEstimated | 1GB (have 24GB) |
| meta-llama/Llama-3.2-1B | Q4 | Fits comfortably | 64.05 tok/sEstimated | 1GB (have 24GB) |
| petals-team/StableBeluga2 | Q8 | Fits comfortably | 19.02 tok/sEstimated | 7GB (have 24GB) |
| petals-team/StableBeluga2 | Q4 | Fits comfortably | 27.35 tok/sEstimated | 4GB (have 24GB) |
| vikhyatk/moondream2 | Q8 | Fits comfortably | 22.38 tok/sEstimated | 7GB (have 24GB) |
| vikhyatk/moondream2 | Q4 | Fits comfortably | 28.64 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 30.23 tok/sEstimated | 3GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 41.15 tok/sEstimated | 2GB (have 24GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | Not supported | — | 70GB (have 24GB) |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | Not supported | — | 35GB (have 24GB) |
| distilbert/distilgpt2 | Q8 | Fits comfortably | 23.00 tok/sEstimated | 7GB (have 24GB) |
| distilbert/distilgpt2 | Q4 | Fits comfortably | 27.99 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | Not supported | — | 32GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | Fits comfortably | 15.79 tok/sEstimated | 16GB (have 24GB) |
| inference-net/Schematron-3B | Q8 | Fits comfortably | 27.31 tok/sEstimated | 3GB (have 24GB) |
| inference-net/Schematron-3B | Q4 | Fits comfortably | 37.57 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-8B | Q8 | Fits comfortably | 18.40 tok/sEstimated | 8GB (have 24GB) |
| Qwen/Qwen3-8B | Q4 | Fits comfortably | 30.95 tok/sEstimated | 4GB (have 24GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | Fits comfortably | 22.04 tok/sEstimated | 7GB (have 24GB) |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | Fits comfortably | 27.12 tok/sEstimated | 4GB (have 24GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | Fits comfortably | 29.72 tok/sEstimated | 3GB (have 24GB) |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | Fits comfortably | 41.35 tok/sEstimated | 2GB (have 24GB) |
| bigscience/bloomz-560m | Q8 | Fits comfortably | 23.12 tok/sEstimated | 7GB (have 24GB) |
| bigscience/bloomz-560m | Q4 | Fits comfortably | 28.83 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-3B-Instruct | Q8 | Fits comfortably | 26.89 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-3B-Instruct | Q4 | Fits comfortably | 41.51 tok/sEstimated | 2GB (have 24GB) |
| openai/gpt-oss-120b | Q8 | Not supported | — | 120GB (have 24GB) |
| openai/gpt-oss-120b | Q4 | Not supported | — | 60GB (have 24GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | Fits comfortably | 44.32 tok/sEstimated | 1GB (have 24GB) |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | Fits comfortably | 60.78 tok/sEstimated | 1GB (have 24GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | Fits comfortably | 28.22 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | Fits comfortably | 38.27 tok/sEstimated | 2GB (have 24GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | Fits comfortably | 22.12 tok/sEstimated | 7GB (have 24GB) |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | Fits comfortably | 32.38 tok/sEstimated | 4GB (have 24GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | Fits comfortably | 39.27 tok/sEstimated | 1GB (have 24GB) |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | Fits comfortably | 55.82 tok/sEstimated | 1GB (have 24GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 22.07 tok/sEstimated | 7GB (have 24GB) |
| facebook/opt-125m | Q4 | Fits comfortably | 27.45 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | Fits comfortably | 22.43 tok/sEstimated | 5GB (have 24GB) |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | Fits comfortably | 34.70 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen3-Embedding-0.6B | Q8 | Fits comfortably | 24.31 tok/sEstimated | 6GB (have 24GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 29.19 tok/sEstimated | 3GB (have 24GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 44.59 tok/sEstimated | 1GB (have 24GB) |
| google/gemma-3-1b-it | Q4 | Fits comfortably | 56.49 tok/sEstimated | 1GB (have 24GB) |
| openai/gpt-oss-20b | Q8 | Fits comfortably | 14.16 tok/sEstimated | 20GB (have 24GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 18.35 tok/sEstimated | 10GB (have 24GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | Not supported | — | 34GB (have 24GB) |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | Fits comfortably | 15.40 tok/sEstimated | 17GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | Fits comfortably | 21.63 tok/sEstimated | 8GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 28.91 tok/sEstimated | 4GB (have 24GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 22.35 tok/sEstimated | 5GB (have 24GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 34.42 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen3-0.6B | Q8 | Fits comfortably | 20.18 tok/sEstimated | 6GB (have 24GB) |
| Qwen/Qwen3-0.6B | Q4 | Fits comfortably | 29.39 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 21.59 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q4 | Fits comfortably | 29.75 tok/sEstimated | 4GB (have 24GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 22.02 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 29.23 tok/sEstimated | 4GB (have 24GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
On qwen3-30B Q4, Vulkan decode hits ~117 tok/sec once a 32K context fills, while ROCm drops to ~12 tok/sec—making Vulkan the faster option for long prompts.
Source: Reddit – /r/LocalLLaMA (mrdpho0)
The same benchmarks show Vulkan prompt prefill at ~486 tok/s on Windows drivers versus ~432 tok/s on ROCm, highlighting the driver advantage.
Source: Reddit – /r/LocalLLaMA (mrdpho0)
Yes. Builders highlight Ryzen AI 395 mini-PCs with RX 7900-class GPUs that can load 70B Q8 contexts—something 24 GB NVIDIA cards can’t do—though throughput is slower.
Source: Reddit – /r/LocalLLaMA (mqupq0a)
Not yet—FlashAttention under Vulkan falls back to the CPU on 7900 XTX, so enabling it doesn’t improve throughput the way it does on NVIDIA cards.
Source: Reddit – /r/LocalLLaMA (mrdpho0)
RX 7900 XTX offers 24 GB GDDR6 and a 355 W TBP. On 3 Nov 2025 Amazon listed it at $899 in stock, Newegg at $949 in stock, and Best Buy at $999 out of stock.
Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.
Explore how RX 6800 XT stacks up for local inference workloads.
Explore how RTX 3080 stacks up for local inference workloads.