Loading GPU specifications...
Loading GPU benchmarks...
Quick Answer: RX 7900 XTX offers 24GB VRAM and starts around $99.99. It delivers approximately 192 tokens/sec on LiquidAI/LFM2-1.2B. It typically draws 355W under load.
RX 7900 XTX gives AMD builders a 24GB option with competitive throughput for 7B–13B LLMs and diffusion workloads. Use ROCm-compatible stacks like llama.cpp or vLLM (AMD fork).
Buy directly on Amazon with fast shipping and reliable customer service.
Rotate out primary variants whenever validation flags an issue.
| Model | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|
| LiquidAI/LFM2-1.2B | Q4 | 191.79 tok/sEstimated Auto-generated benchmark | 1GB |
| google-bert/bert-base-uncased | Q4 | 191.30 tok/sEstimated Auto-generated benchmark | 1GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | 189.80 tok/sEstimated Auto-generated benchmark | 2GB |
| inference-net/Schematron-3B | Q4 | 189.02 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | Q4 | 188.35 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Llama-3.2-3B-Instruct | Q4 | 187.94 tok/sEstimated Auto-generated benchmark | 2GB |
| tencent/HunyuanOCR | Q4 | 187.85 tok/sEstimated Auto-generated benchmark | 1GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q4 | 185.53 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-1B-Instruct | Q4 | 185.19 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2-2b-it | Q4 | 184.48 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-1B-Instruct | Q4 | 183.47 tok/sEstimated Auto-generated benchmark | 1GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q4 | 182.40 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-Guard-3-1B | Q4 | 180.51 tok/sEstimated Auto-generated benchmark | 1GB |
| google/embeddinggemma-300m | Q4 | 179.00 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q4 | 176.80 tok/sEstimated Auto-generated benchmark | 2GB |
| ibm-research/PowerMoE-3b | Q4 | 176.07 tok/sEstimated Auto-generated benchmark | 2GB |
| facebook/sam3 | Q4 | 175.61 tok/sEstimated Auto-generated benchmark | 1GB |
| WeiboAI/VibeThinker-1.5B | Q4 | 174.55 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 174.49 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2b | Q4 | 172.48 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-3-1b-it | Q4 | 172.26 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/gemma-3-1b-it | Q4 | 171.34 tok/sEstimated Auto-generated benchmark | 1GB |
| nari-labs/Dia2-2B | Q4 | 171.11 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/DeepSeek-OCR | Q4 | 169.64 tok/sEstimated Auto-generated benchmark | 2GB |
| google-t5/t5-3b | Q4 | 169.24 tok/sEstimated Auto-generated benchmark | 2GB |
| apple/OpenELM-1_1B-Instruct | Q4 | 165.20 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-granite/granite-3.3-2b-instruct | Q4 | 164.38 tok/sEstimated Auto-generated benchmark | 1GB |
| meta-llama/Llama-3.2-3B | Q4 | 164.14 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q4 | 162.81 tok/sEstimated Auto-generated benchmark | 1GB |
| allenai/OLMo-2-0425-1B | Q4 | 162.47 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B | Q4 | 160.01 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-Coder-1.5B | Q4 | 159.73 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 159.62 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q4 | 159.49 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B | Q4 | 159.20 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q4 | 158.98 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q4 | 158.96 tok/sEstimated Auto-generated benchmark | 4GB |
| huggyllama/llama-7b | Q4 | 158.75 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q4 | 158.60 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 158.58 tok/sEstimated Auto-generated benchmark | 2GB |
| rednote-hilab/dots.ocr | Q4 | 158.57 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q4 | 158.30 tok/sEstimated Auto-generated benchmark | 4GB |
| GSAI-ML/LLaDA-8B-Instruct | Q4 | 157.79 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-docling-258M | Q4 | 157.66 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | 157.33 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q4 | 157.07 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-128k-instruct | Q4 | 157.01 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3-0324 | Q4 | 156.81 tok/sEstimated Auto-generated benchmark | 4GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | 156.77 tok/sEstimated Auto-generated benchmark | 4GB |
| numind/NuExtract-1.5 | Q4 | 156.34 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B | Q4 | 156.18 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Reranker-0.6B | Q4 | 155.85 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | 155.33 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.5-Air | Q4 | 155.26 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-Guard-3-8B | Q4 | 155.19 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-xl | Q4 | 155.17 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Meta-Llama-3-8B | Q4 | 155.16 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | 154.54 tok/sEstimated Auto-generated benchmark | 4GB |
| hmellor/tiny-random-LlamaForCausalLM | Q4 | 154.45 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q4 | 154.12 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B-Base | Q4 | 153.53 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q4 | 153.34 tok/sEstimated Auto-generated benchmark | 4GB |
| zai-org/GLM-4.6-FP8 | Q4 | 153.26 tok/sEstimated Auto-generated benchmark | 4GB |
| MiniMaxAI/MiniMax-M2 | Q4 | 153.14 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q4 | 152.76 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/gpt-neo-125m | Q4 | 152.72 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 152.36 tok/sEstimated Auto-generated benchmark | 4GB |
| BSC-LT/salamandraTA-7b-instruct | Q4 | 152.30 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-1.7B | Q4 | 151.81 tok/sEstimated Auto-generated benchmark | 4GB |
| sshleifer/tiny-gpt2 | Q4 | 151.73 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q4 | 151.72 tok/sEstimated Auto-generated benchmark | 2GB |
| black-forest-labs/FLUX.1-dev | Q4 | 151.61 tok/sEstimated Auto-generated benchmark | 4GB |
| ibm-granite/granite-3.3-8b-instruct | Q4 | 151.60 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B-Base | Q4 | 151.57 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-7B-Instruct | Q4 | 151.53 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-4B | Q4 | 151.27 tok/sEstimated Auto-generated benchmark | 2GB |
| openai-community/gpt2 | Q4 | 151.17 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-1.5B | Q4 | 151.01 tok/sEstimated Auto-generated benchmark | 3GB |
| black-forest-labs/FLUX.2-dev | Q4 | 150.82 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | Q4 | 150.77 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q4 | 150.61 tok/sEstimated Auto-generated benchmark | 2GB |
| Tongyi-MAI/Z-Image-Turbo | Q4 | 150.53 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-8B | Q4 | 150.26 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q4 | 150.01 tok/sEstimated Auto-generated benchmark | 2GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q4 | 149.88 tok/sEstimated Auto-generated benchmark | 4GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | 149.85 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q4 | 149.64 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-large | Q4 | 148.60 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q4 | 148.58 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-vision-instruct | Q4 | 148.42 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-medium | Q4 | 148.27 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q4 | 147.68 tok/sEstimated Auto-generated benchmark | 4GB |
| skt/kogpt2-base-v2 | Q4 | 147.67 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceH4/zephyr-7b-beta | Q4 | 147.42 tok/sEstimated Auto-generated benchmark | 4GB |
| tencent/HunyuanVideo-1.5 | Q4 | 147.24 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-FP8 | Q4 | 147.13 tok/sEstimated Auto-generated benchmark | 4GB |
| bigscience/bloomz-560m | Q4 | 147.04 tok/sEstimated Auto-generated benchmark | 4GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q4 | 146.76 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Base | Q4 | 146.65 tok/sEstimated Auto-generated benchmark | 2GB |
| GSAI-ML/LLaDA-8B-Base | Q4 | 146.49 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-Embedding-0.6B | Q4 | 146.14 tok/sEstimated Auto-generated benchmark | 3GB |
| lmsys/vicuna-7b-v1.5 | Q4 | 146.10 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-4 | Q4 | 145.33 tok/sEstimated Auto-generated benchmark | 4GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q4 | 145.23 tok/sEstimated Auto-generated benchmark | 2GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q4 | 145.11 tok/sEstimated Auto-generated benchmark | 4GB |
| rinna/japanese-gpt-neox-small | Q4 | 144.91 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q4 | 144.36 tok/sEstimated Auto-generated benchmark | 4GB |
| distilbert/distilgpt2 | Q4 | 144.34 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/VibeVoice-1.5B | Q4 | 144.31 tok/sEstimated Auto-generated benchmark | 3GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | 143.50 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-3-270m-it | Q4 | 143.27 tok/sEstimated Auto-generated benchmark | 4GB |
| EleutherAI/pythia-70m-deduped | Q4 | 142.80 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/phi-2 | Q4 | 142.79 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q4 | 142.41 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/DialoGPT-small | Q4 | 142.26 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1 | Q4 | 142.19 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B-Instruct | Q4 | 142.19 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-1.5B-Instruct | Q4 | 141.93 tok/sEstimated Auto-generated benchmark | 3GB |
| parler-tts/parler-tts-large-v1 | Q4 | 141.75 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-mini-instruct | Q4 | 141.35 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-4-multimodal-instruct | Q4 | 141.20 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-2-7b-hf | Q4 | 140.70 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | 140.00 tok/sEstimated Auto-generated benchmark | 2GB |
| petals-team/StableBeluga2 | Q4 | 139.59 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-V3 | Q4 | 139.08 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q4 | 138.74 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q4 | 138.67 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3.5-mini-instruct | Q4 | 138.46 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-7B | Q4 | 138.22 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-7B-Instruct | Q4 | 138.12 tok/sEstimated Auto-generated benchmark | 4GB |
| facebook/opt-125m | Q4 | 137.53 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | 137.43 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q4 | 136.94 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-R1-0528 | Q4 | 136.82 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM2-135M | Q4 | 136.52 tok/sEstimated Auto-generated benchmark | 4GB |
| IlyaGusev/saiga_llama3_8b | Q4 | 136.39 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-0.5B | Q4 | 135.97 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2-0.5B | Q4 | 135.96 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-2-7b-chat-hf | Q4 | 135.86 tok/sEstimated Auto-generated benchmark | 4GB |
| allenai/Olmo-3-7B-Think | Q4 | 135.84 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-0.6B | Q4 | 135.31 tok/sEstimated Auto-generated benchmark | 3GB |
| liuhaotian/llava-v1.5-7b | Q4 | 135.17 tok/sEstimated Auto-generated benchmark | 4GB |
| microsoft/Phi-3-mini-4k-instruct | Q4 | 135.13 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-8B-Base | Q4 | 134.91 tok/sEstimated Auto-generated benchmark | 4GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | 134.49 tok/sEstimated Auto-generated benchmark | 2GB |
| openai-community/gpt2-medium | Q4 | 134.18 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen-Image-Edit-2509 | Q4 | 134.15 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q4 | 133.52 tok/sEstimated Auto-generated benchmark | 4GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q4 | 133.41 tok/sEstimated Auto-generated benchmark | 3GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q4 | 133.23 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-1B-Instruct | Q8 | 132.69 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q4 | 132.53 tok/sEstimated Auto-generated benchmark | 2GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | Q8 | 131.85 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-V3.1 | Q4 | 131.72 tok/sEstimated Auto-generated benchmark | 4GB |
| bigcode/starcoder2-3b | Q8 | 129.43 tok/sEstimated Auto-generated benchmark | 3GB |
| Qwen/Qwen2.5-3B | Q8 | 129.29 tok/sEstimated Auto-generated benchmark | 3GB |
| google/gemma-2-2b-it | Q8 | 128.53 tok/sEstimated Auto-generated benchmark | 2GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | 127.79 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-Guard-3-1B | Q8 | 126.93 tok/sEstimated Auto-generated benchmark | 1GB |
| google/embeddinggemma-300m | Q8 | 126.23 tok/sEstimated Auto-generated benchmark | 1GB |
| nari-labs/Dia2-2B | Q8 | 125.84 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-research/PowerMoE-3b | Q8 | 125.32 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-1B-Instruct | Q8 | 124.00 tok/sEstimated Auto-generated benchmark | 1GB |
| inference-net/Schematron-3B | Q8 | 123.51 tok/sEstimated Auto-generated benchmark | 3GB |
| ibm-granite/granite-3.3-2b-instruct | Q8 | 122.77 tok/sEstimated Auto-generated benchmark | 2GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Q8 | 121.22 tok/sEstimated Auto-generated benchmark | 1GB |
| facebook/sam3 | Q8 | 121.20 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | Q8 | 120.18 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | Q8 | 119.97 tok/sEstimated Auto-generated benchmark | 1GB |
| EssentialAI/rnj-1 | Q4 | 119.84 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/Phi-3-medium-128k-instruct | Q4 | 119.69 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 119.44 tok/sEstimated Auto-generated benchmark | 7GB |
| allenai/OLMo-2-0425-1B | Q8 | 118.68 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Llama-3.2-3B-Instruct | Q8 | 117.42 tok/sEstimated Auto-generated benchmark | 3GB |
| apple/OpenELM-1_1B-Instruct | Q8 | 117.34 tok/sEstimated Auto-generated benchmark | 1GB |
| google-bert/bert-base-uncased | Q8 | 117.14 tok/sEstimated Auto-generated benchmark | 1GB |
| Qwen/Qwen2.5-3B-Instruct | Q8 | 117.01 tok/sEstimated Auto-generated benchmark | 3GB |
| deepseek-ai/DeepSeek-OCR | Q8 | 116.70 tok/sEstimated Auto-generated benchmark | 4GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 116.24 tok/sEstimated Auto-generated benchmark | 3GB |
| google-t5/t5-3b | Q8 | 115.88 tok/sEstimated Auto-generated benchmark | 3GB |
| tencent/HunyuanOCR | Q8 | 115.88 tok/sEstimated Auto-generated benchmark | 2GB |
| LiquidAI/LFM2-1.2B | Q8 | 115.52 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | Q8 | 114.05 tok/sEstimated Auto-generated benchmark | 3GB |
| WeiboAI/VibeThinker-1.5B | Q8 | 112.86 tok/sEstimated Auto-generated benchmark | 2GB |
| google/gemma-2-9b-it | Q4 | 112.79 tok/sEstimated Auto-generated benchmark | 5GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q4 | 112.32 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q8 | 111.61 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/gemma-3-1b-it | Q8 | 111.56 tok/sEstimated Auto-generated benchmark | 1GB |
| liuhaotian/llava-v1.5-7b | Q8 | 111.47 tok/sEstimated Auto-generated benchmark | 7GB |
| swiss-ai/Apertus-8B-Instruct-2509 | Q8 | 111.40 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-0.5B-Instruct | Q8 | 111.05 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | Q8 | 111.01 tok/sEstimated Auto-generated benchmark | 4GB |
| openai-community/gpt2-medium | Q8 | 110.97 tok/sEstimated Auto-generated benchmark | 7GB |
| black-forest-labs/FLUX.1-dev | Q8 | 110.97 tok/sEstimated Auto-generated benchmark | 8GB |
| tencent/HunyuanVideo-1.5 | Q8 | 110.76 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen2.5-14B | Q4 | 110.63 tok/sEstimated Auto-generated benchmark | 7GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | Q8 | 110.62 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 110.36 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | 110.30 tok/sEstimated Auto-generated benchmark | 7GB |
| google/gemma-3-1b-it | Q8 | 110.01 tok/sEstimated Auto-generated benchmark | 1GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | Q8 | 109.96 tok/sEstimated Auto-generated benchmark | 9GB |
| EleutherAI/pythia-70m-deduped | Q8 | 109.94 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-chat-hf | Q8 | 109.84 tok/sEstimated Auto-generated benchmark | 7GB |
| numind/NuExtract-1.5 | Q8 | 109.84 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Embedding-4B | Q8 | 109.78 tok/sEstimated Auto-generated benchmark | 4GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q8 | 109.75 tok/sEstimated Auto-generated benchmark | 5GB |
| bigscience/bloomz-560m | Q8 | 109.63 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B-Instruct | Q8 | 109.38 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q4 | 109.34 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-2-13b-chat-hf | Q4 | 109.15 tok/sEstimated Auto-generated benchmark | 7GB |
| IlyaGusev/saiga_llama3_8b | Q8 | 109.12 tok/sEstimated Auto-generated benchmark | 9GB |
| openai-community/gpt2-xl | Q8 | 108.48 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q4 | 108.41 tok/sEstimated Auto-generated benchmark | 4GB |
| vikhyatk/moondream2 | Q8 | 108.30 tok/sEstimated Auto-generated benchmark | 7GB |
| MiniMaxAI/MiniMax-M2 | Q8 | 107.90 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen-Image-Edit-2509 | Q8 | 107.87 tok/sEstimated Auto-generated benchmark | 8GB |
| Tongyi-MAI/Z-Image-Turbo | Q8 | 107.80 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Embedding-0.6B | Q8 | 107.80 tok/sEstimated Auto-generated benchmark | 6GB |
| microsoft/phi-2 | Q8 | 107.74 tok/sEstimated Auto-generated benchmark | 7GB |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | 107.21 tok/sEstimated Auto-generated benchmark | 5GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | Q8 | 107.18 tok/sEstimated Auto-generated benchmark | 5GB |
| OpenPipe/Qwen3-14B-Instruct | Q4 | 107.16 tok/sEstimated Auto-generated benchmark | 7GB |
| rednote-hilab/dots.ocr | Q8 | 107.10 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-mini-instruct | Q8 | 106.74 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-multimodal-instruct | Q8 | 106.72 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B | Q8 | 106.42 tok/sEstimated Auto-generated benchmark | 9GB |
| GSAI-ML/LLaDA-8B-Instruct | Q8 | 106.34 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-1.5B | Q8 | 106.34 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-Math-1.5B | Q8 | 105.92 tok/sEstimated Auto-generated benchmark | 5GB |
| microsoft/VibeVoice-1.5B | Q8 | 105.89 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-14B-Base | Q4 | 105.45 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B-Base | Q8 | 105.42 tok/sEstimated Auto-generated benchmark | 7GB |
| rinna/japanese-gpt-neox-small | Q8 | 105.34 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | 105.32 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-8B-FP8 | Q8 | 105.30 tok/sEstimated Auto-generated benchmark | 9GB |
| hmellor/tiny-random-LlamaForCausalLM | Q8 | 105.18 tok/sEstimated Auto-generated benchmark | 7GB |
| ai-forever/ruGPT-3.5-13B | Q4 | 105.17 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.5-Air | Q8 | 105.15 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-4k-instruct | Q8 | 105.08 tok/sEstimated Auto-generated benchmark | 7GB |
| GSAI-ML/LLaDA-8B-Base | Q8 | 104.99 tok/sEstimated Auto-generated benchmark | 9GB |
| google/gemma-3-270m-it | Q8 | 104.96 tok/sEstimated Auto-generated benchmark | 7GB |
| parler-tts/parler-tts-large-v1 | Q8 | 104.68 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | Q8 | 104.38 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3.5-vision-instruct | Q8 | 104.07 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-0.5B | Q8 | 103.86 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen3-0.6B-Base | Q8 | 103.71 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | 103.44 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-8B | Q8 | 103.28 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-V3 | Q8 | 103.15 tok/sEstimated Auto-generated benchmark | 7GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q8 | 103.07 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2-0.5B | Q8 | 103.06 tok/sEstimated Auto-generated benchmark | 5GB |
| EleutherAI/gpt-neo-125m | Q8 | 103.02 tok/sEstimated Auto-generated benchmark | 7GB |
| sshleifer/tiny-gpt2 | Q8 | 103.00 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-3.3-8b-instruct | Q8 | 102.98 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/phi-4 | Q8 | 102.80 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3.1 | Q8 | 102.46 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1-0528 | Q8 | 102.39 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-14B-Instruct | Q4 | 101.65 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-8B-Base | Q8 | 101.47 tok/sEstimated Auto-generated benchmark | 9GB |
| petals-team/StableBeluga2 | Q8 | 101.43 tok/sEstimated Auto-generated benchmark | 7GB |
| distilbert/distilgpt2 | Q8 | 101.22 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-7B | Q8 | 101.13 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-1.7B | Q8 | 101.06 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | Q8 | 101.01 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-14B | Q4 | 100.94 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Meta-Llama-3-8B | Q8 | 100.82 tok/sEstimated Auto-generated benchmark | 9GB |
| openai-community/gpt2-large | Q8 | 100.48 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-3-mini-128k-instruct | Q8 | 100.44 tok/sEstimated Auto-generated benchmark | 7GB |
| ibm-granite/granite-docling-258M | Q8 | 100.23 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Instruct-2507 | Q8 | 99.97 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-4B | Q8 | 99.96 tok/sEstimated Auto-generated benchmark | 4GB |
| mistralai/Mistral-7B-v0.1 | Q8 | 99.84 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | 99.38 tok/sEstimated Auto-generated benchmark | 4GB |
| HuggingFaceTB/SmolLM-135M | Q8 | 99.31 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-2-7b-hf | Q8 | 99.22 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-V3-0324 | Q8 | 99.06 tok/sEstimated Auto-generated benchmark | 7GB |
| black-forest-labs/FLUX.2-dev | Q8 | 98.90 tok/sEstimated Auto-generated benchmark | 8GB |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | 98.83 tok/sEstimated Auto-generated benchmark | 3GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | Q8 | 98.49 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 98.49 tok/sEstimated Auto-generated benchmark | 7GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Q8 | 98.41 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.1 | Q8 | 98.21 tok/sEstimated Auto-generated benchmark | 7GB |
| deepseek-ai/DeepSeek-R1 | Q8 | 97.81 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-Coder-1.5B | Q8 | 97.66 tok/sEstimated Auto-generated benchmark | 5GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | Q8 | 97.47 tok/sEstimated Auto-generated benchmark | 4GB |
| dicta-il/dictalm2.0-instruct | Q8 | 97.35 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-Reranker-0.6B | Q8 | 97.28 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-7B-Instruct | Q8 | 97.28 tok/sEstimated Auto-generated benchmark | 8GB |
| Qwen/Qwen3-Embedding-8B | Q8 | 97.11 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | Q8 | 97.05 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | Q8 | 96.72 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Base | Q8 | 96.69 tok/sEstimated Auto-generated benchmark | 4GB |
| llamafactory/tiny-random-Llama-3 | Q8 | 96.57 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-0.6B | Q8 | 96.52 tok/sEstimated Auto-generated benchmark | 6GB |
| HuggingFaceH4/zephyr-7b-beta | Q8 | 96.47 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 96.43 tok/sEstimated Auto-generated benchmark | 9GB |
| BSC-LT/salamandraTA-7b-instruct | Q8 | 96.41 tok/sEstimated Auto-generated benchmark | 7GB |
| zai-org/GLM-4.6-FP8 | Q8 | 96.40 tok/sEstimated Auto-generated benchmark | 7GB |
| mistralai/Mistral-7B-Instruct-v0.2 | Q8 | 96.27 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2.5-1.5B-Instruct | Q8 | 96.19 tok/sEstimated Auto-generated benchmark | 5GB |
| Qwen/Qwen2.5-0.5B-Instruct | Q8 | 95.73 tok/sEstimated Auto-generated benchmark | 5GB |
| allenai/Olmo-3-7B-Think | Q8 | 95.51 tok/sEstimated Auto-generated benchmark | 8GB |
| openai-community/gpt2 | Q8 | 95.39 tok/sEstimated Auto-generated benchmark | 7GB |
| huggyllama/llama-7b | Q8 | 95.21 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-1.5B-Instruct | Q8 | 94.85 tok/sEstimated Auto-generated benchmark | 5GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | Q8 | 94.53 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-small | Q8 | 94.39 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen2-7B-Instruct | Q8 | 94.34 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/DialoGPT-medium | Q8 | 94.19 tok/sEstimated Auto-generated benchmark | 7GB |
| facebook/opt-125m | Q8 | 94.09 tok/sEstimated Auto-generated benchmark | 7GB |
| meta-llama/Llama-Guard-3-8B | Q8 | 93.78 tok/sEstimated Auto-generated benchmark | 9GB |
| skt/kogpt2-base-v2 | Q8 | 93.77 tok/sEstimated Auto-generated benchmark | 7GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | Q8 | 93.21 tok/sEstimated Auto-generated benchmark | 4GB |
| lmsys/vicuna-7b-v1.5 | Q8 | 93.19 tok/sEstimated Auto-generated benchmark | 7GB |
| HuggingFaceTB/SmolLM2-135M | Q8 | 92.98 tok/sEstimated Auto-generated benchmark | 7GB |
| microsoft/Phi-4-mini-instruct | Q8 | 92.86 tok/sEstimated Auto-generated benchmark | 7GB |
| Qwen/Qwen3-4B-Thinking-2507 | Q8 | 92.71 tok/sEstimated Auto-generated benchmark | 4GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Q8 | 92.59 tok/sEstimated Auto-generated benchmark | 7GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | Q8 | 91.67 tok/sEstimated Auto-generated benchmark | 9GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q4 | 86.76 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q4 | 86.45 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q4 | 83.34 tok/sEstimated Auto-generated benchmark | 15GB |
| NousResearch/Hermes-3-Llama-3.1-8B | Q8 | 83.30 tok/sEstimated Auto-generated benchmark | 9GB |
| mistralai/Ministral-3-14B-Instruct-2512 | Q8 | 82.85 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 82.26 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q4 | 82.18 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-14B-Base | Q8 | 81.52 tok/sEstimated Auto-generated benchmark | 14GB |
| meta-llama/Llama-2-13b-chat-hf | Q8 | 81.03 tok/sEstimated Auto-generated benchmark | 13GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q4 | 80.96 tok/sEstimated Auto-generated benchmark | 15GB |
| EssentialAI/rnj-1 | Q8 | 80.19 tok/sEstimated Auto-generated benchmark | 10GB |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | 80.08 tok/sEstimated Auto-generated benchmark | 11GB |
| openai/gpt-oss-20b | Q4 | 79.69 tok/sEstimated Auto-generated benchmark | 10GB |
| OpenPipe/Qwen3-14B-Instruct | Q8 | 79.54 tok/sEstimated Auto-generated benchmark | 14GB |
| openai/gpt-oss-safeguard-20b | Q4 | 78.93 tok/sEstimated Auto-generated benchmark | 11GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q4 | 78.80 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q4 | 78.63 tok/sEstimated Auto-generated benchmark | 10GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | Q8 | 76.73 tok/sEstimated Auto-generated benchmark | 10GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | 76.44 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-14B | Q8 | 76.26 tok/sEstimated Auto-generated benchmark | 14GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q4 | 75.24 tok/sEstimated Auto-generated benchmark | 15GB |
| ai-forever/ruGPT-3.5-13B | Q8 | 75.11 tok/sEstimated Auto-generated benchmark | 13GB |
| unsloth/gpt-oss-20b-BF16 | Q4 | 74.98 tok/sEstimated Auto-generated benchmark | 10GB |
| Qwen/Qwen3-14B | Q8 | 74.34 tok/sEstimated Auto-generated benchmark | 14GB |
| google/gemma-2-27b-it | Q4 | 73.57 tok/sEstimated Auto-generated benchmark | 14GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q4 | 73.15 tok/sEstimated Auto-generated benchmark | 15GB |
| allenai/OLMo-2-0425-1B | FP16 | 72.87 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.1-8B-Instruct | Q8 | 72.59 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-3B | FP16 | 72.36 tok/sEstimated Auto-generated benchmark | 6GB |
| LiquidAI/LFM2-1.2B | FP16 | 72.33 tok/sEstimated Auto-generated benchmark | 4GB |
| unsloth/Llama-3.2-3B-Instruct | FP16 | 72.20 tok/sEstimated Auto-generated benchmark | 6GB |
| google-t5/t5-3b | FP16 | 72.11 tok/sEstimated Auto-generated benchmark | 6GB |
| nari-labs/Dia2-2B | FP16 | 72.06 tok/sEstimated Auto-generated benchmark | 5GB |
| ibm-granite/granite-3.3-2b-instruct | FP16 | 72.05 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen3-30B-A3B | Q4 | 72.03 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-OCR | FP16 | 71.93 tok/sEstimated Auto-generated benchmark | 7GB |
| tencent/HunyuanOCR | FP16 | 71.60 tok/sEstimated Auto-generated benchmark | 3GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 71.12 tok/sEstimated Auto-generated benchmark | 6GB |
| facebook/sam3 | FP16 | 70.89 tok/sEstimated Auto-generated benchmark | 2GB |
| Qwen/Qwen2.5-14B-Instruct | Q8 | 70.87 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/Llama-3.2-1B-Instruct | FP16 | 70.28 tok/sEstimated Auto-generated benchmark | 2GB |
| microsoft/Phi-3-medium-128k-instruct | Q8 | 69.57 tok/sEstimated Auto-generated benchmark | 14GB |
| google/gemma-3-1b-it | FP16 | 69.40 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-3B | FP16 | 68.96 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-2-9b-it | Q8 | 68.84 tok/sEstimated Auto-generated benchmark | 10GB |
| google/embeddinggemma-300m | FP16 | 68.67 tok/sEstimated Auto-generated benchmark | 1GB |
| ibm-research/PowerMoE-3b | FP16 | 68.29 tok/sEstimated Auto-generated benchmark | 6GB |
| apple/OpenELM-1_1B-Instruct | FP16 | 68.19 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-Guard-3-1B | FP16 | 68.02 tok/sEstimated Auto-generated benchmark | 2GB |
| google-bert/bert-base-uncased | FP16 | 67.27 tok/sEstimated Auto-generated benchmark | 1GB |
| google/gemma-2b | FP16 | 64.92 tok/sEstimated Auto-generated benchmark | 4GB |
| inference-net/Schematron-3B | FP16 | 64.86 tok/sEstimated Auto-generated benchmark | 6GB |
| Qwen/Qwen2.5-3B-Instruct | FP16 | 64.35 tok/sEstimated Auto-generated benchmark | 6GB |
| context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | FP16 | 63.88 tok/sEstimated Auto-generated benchmark | 6GB |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | 63.14 tok/sEstimated Auto-generated benchmark | 6GB |
| unsloth/gemma-3-1b-it | FP16 | 62.26 tok/sEstimated Auto-generated benchmark | 2GB |
| bigcode/starcoder2-3b | FP16 | 62.15 tok/sEstimated Auto-generated benchmark | 6GB |
| google/gemma-2-2b-it | FP16 | 62.05 tok/sEstimated Auto-generated benchmark | 4GB |
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | FP16 | 61.88 tok/sEstimated Auto-generated benchmark | 2GB |
| meta-llama/Llama-3.2-1B | FP16 | 61.32 tok/sEstimated Auto-generated benchmark | 2GB |
| openai/gpt-oss-20b | Q8 | 60.79 tok/sEstimated Auto-generated benchmark | 20GB |
| meta-llama/Llama-3.2-1B-Instruct | FP16 | 60.63 tok/sEstimated Auto-generated benchmark | 2GB |
| unsloth/Meta-Llama-3.1-8B-Instruct | FP16 | 60.59 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | Q8 | 60.57 tok/sEstimated Auto-generated benchmark | 20GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit | FP16 | 60.47 tok/sEstimated Auto-generated benchmark | 9GB |
| unsloth/gpt-oss-20b-BF16 | Q8 | 60.40 tok/sEstimated Auto-generated benchmark | 20GB |
| hmellor/tiny-random-LlamaForCausalLM | FP16 | 60.39 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/DialoGPT-small | FP16 | 60.39 tok/sEstimated Auto-generated benchmark | 15GB |
| WeiboAI/VibeThinker-1.5B | FP16 | 60.38 tok/sEstimated Auto-generated benchmark | 4GB |
| Qwen/Qwen2.5-Math-1.5B | FP16 | 60.35 tok/sEstimated Auto-generated benchmark | 11GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 60.30 tok/sEstimated Auto-generated benchmark | 17GB |
| meta-llama/Meta-Llama-3-8B | FP16 | 60.30 tok/sEstimated Auto-generated benchmark | 17GB |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | 60.11 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2.5-0.5B-Instruct | FP16 | 60.10 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-1.7B | FP16 | 60.03 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | 59.96 tok/sEstimated Auto-generated benchmark | 31GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit | FP16 | 59.87 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen-Image-Edit-2509 | FP16 | 59.81 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-4B-Instruct-2507 | FP16 | 59.77 tok/sEstimated Auto-generated benchmark | 9GB |
| meta-llama/Llama-3.2-3B-Instruct | FP16 | 59.76 tok/sEstimated Auto-generated benchmark | 7GB |
| openai-community/gpt2-xl | FP16 | 59.74 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-0.5B | FP16 | 59.67 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-Embedding-8B | FP16 | 59.60 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-1.5B | FP16 | 59.49 tok/sEstimated Auto-generated benchmark | 11GB |
| microsoft/phi-4 | FP16 | 59.19 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B | FP16 | 59.12 tok/sEstimated Auto-generated benchmark | 17GB |
| tencent/HunyuanVideo-1.5 | FP16 | 58.98 tok/sEstimated Auto-generated benchmark | 16GB |
| microsoft/DialoGPT-medium | FP16 | 58.94 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2-large | FP16 | 58.88 tok/sEstimated Auto-generated benchmark | 15GB |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | 58.88 tok/sEstimated Auto-generated benchmark | 11GB |
| Qwen/Qwen3-4B-Thinking-2507 | FP16 | 58.85 tok/sEstimated Auto-generated benchmark | 9GB |
| microsoft/Phi-3.5-vision-instruct | FP16 | 58.76 tok/sEstimated Auto-generated benchmark | 15GB |
| openai-community/gpt2 | FP16 | 58.71 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/mistral-7b-v0.3-bnb-4bit | FP16 | 58.61 tok/sEstimated Auto-generated benchmark | 15GB |
| MiniMaxAI/MiniMax-M2 | FP16 | 58.50 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | 58.46 tok/sEstimated Auto-generated benchmark | 31GB |
| rinna/japanese-gpt-neox-small | FP16 | 58.43 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-1.5B-Instruct | FP16 | 58.41 tok/sEstimated Auto-generated benchmark | 11GB |
| deepseek-ai/DeepSeek-V3 | FP16 | 58.29 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Q8 | 58.28 tok/sEstimated Auto-generated benchmark | 31GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 58.26 tok/sEstimated Auto-generated benchmark | 17GB |
| lmsys/vicuna-7b-v1.5 | FP16 | 58.20 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Base | FP16 | 58.07 tok/sEstimated Auto-generated benchmark | 9GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 58.07 tok/sEstimated Auto-generated benchmark | 17GB |
| ibm-granite/granite-3.3-8b-instruct | FP16 | 57.86 tok/sEstimated Auto-generated benchmark | 17GB |
| IlyaGusev/saiga_llama3_8b | FP16 | 57.59 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-8B-FP8 | FP16 | 57.48 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-Reranker-0.6B | FP16 | 57.42 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-0.6B-Base | FP16 | 57.37 tok/sEstimated Auto-generated benchmark | 13GB |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | 57.35 tok/sEstimated Auto-generated benchmark | 23GB |
| Qwen/Qwen2.5-Coder-1.5B | FP16 | 57.20 tok/sEstimated Auto-generated benchmark | 11GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | Q8 | 57.18 tok/sEstimated Auto-generated benchmark | 31GB |
| HuggingFaceH4/zephyr-7b-beta | FP16 | 57.16 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-0528 | FP16 | 57.14 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-8B-Base | FP16 | 57.08 tok/sEstimated Auto-generated benchmark | 17GB |
| GSAI-ML/LLaDA-8B-Instruct | FP16 | 57.05 tok/sEstimated Auto-generated benchmark | 17GB |
| facebook/opt-125m | FP16 | 57.00 tok/sEstimated Auto-generated benchmark | 15GB |
| EleutherAI/gpt-neo-125m | FP16 | 57.00 tok/sEstimated Auto-generated benchmark | 15GB |
| sshleifer/tiny-gpt2 | FP16 | 56.95 tok/sEstimated Auto-generated benchmark | 15GB |
| distilbert/distilgpt2 | FP16 | 56.53 tok/sEstimated Auto-generated benchmark | 15GB |
| black-forest-labs/FLUX.1-dev | FP16 | 56.47 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 56.43 tok/sEstimated Auto-generated benchmark | 16GB |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | FP16 | 56.41 tok/sEstimated Auto-generated benchmark | 11GB |
| skt/kogpt2-base-v2 | FP16 | 56.30 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-2-7b-chat-hf | FP16 | 56.19 tok/sEstimated Auto-generated benchmark | 15GB |
| llamafactory/tiny-random-Llama-3 | FP16 | 56.13 tok/sEstimated Auto-generated benchmark | 15GB |
| swiss-ai/Apertus-8B-Instruct-2509 | FP16 | 56.09 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/phi-2 | FP16 | 56.08 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q4 | 55.90 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2-7B-Instruct | FP16 | 55.51 tok/sEstimated Auto-generated benchmark | 15GB |
| liuhaotian/llava-v1.5-7b | FP16 | 55.42 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-0.6B | FP16 | 55.39 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen2.5-7B-Instruct | FP16 | 55.26 tok/sEstimated Auto-generated benchmark | 15GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q4 | 55.22 tok/sEstimated Auto-generated benchmark | 34GB |
| google/gemma-2-27b-it | Q8 | 55.09 tok/sEstimated Auto-generated benchmark | 28GB |
| microsoft/Phi-3-mini-4k-instruct | FP16 | 55.00 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | Q8 | 54.99 tok/sEstimated Auto-generated benchmark | 31GB |
| codellama/CodeLlama-34b-hf | Q4 | 54.80 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | FP16 | 54.72 tok/sEstimated Auto-generated benchmark | 11GB |
| parler-tts/parler-tts-large-v1 | FP16 | 54.62 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-Coder-7B-Instruct | FP16 | 54.53 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceTB/SmolLM-135M | FP16 | 54.43 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B | FP16 | 54.38 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/Qwen2-1.5B-Instruct | FP16 | 54.34 tok/sEstimated Auto-generated benchmark | 11GB |
| deepseek-ai/DeepSeek-V3-0324 | FP16 | 54.28 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-4-multimodal-instruct | FP16 | 54.26 tok/sEstimated Auto-generated benchmark | 15GB |
| microsoft/Phi-4-mini-instruct | FP16 | 54.20 tok/sEstimated Auto-generated benchmark | 15GB |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | FP16 | 54.07 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B-Instruct | FP16 | 54.03 tok/sEstimated Auto-generated benchmark | 11GB |
| ibm-granite/granite-docling-258M | FP16 | 53.98 tok/sEstimated Auto-generated benchmark | 15GB |
| allenai/Olmo-3-7B-Think | FP16 | 53.96 tok/sEstimated Auto-generated benchmark | 16GB |
| trl-internal-testing/tiny-LlamaForCausalLM-3.2 | FP16 | 53.90 tok/sEstimated Auto-generated benchmark | 15GB |
| trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | FP16 | 53.82 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Llama-2-7b-hf | FP16 | 53.76 tok/sEstimated Auto-generated benchmark | 15GB |
| moonshotai/Kimi-K2-Thinking | Q4 | 53.74 tok/sEstimated Auto-generated benchmark | 489GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 53.72 tok/sEstimated Auto-generated benchmark | 8GB |
| mistralai/Mistral-7B-Instruct-v0.1 | FP16 | 53.57 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-32B | Q4 | 53.53 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen3-Embedding-4B | FP16 | 53.52 tok/sEstimated Auto-generated benchmark | 9GB |
| BSC-LT/salamandraTA-7b-instruct | FP16 | 53.43 tok/sEstimated Auto-generated benchmark | 15GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q4 | 53.39 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-30B-A3B | Q8 | 53.38 tok/sEstimated Auto-generated benchmark | 31GB |
| openai/gpt-oss-safeguard-20b | Q8 | 53.26 tok/sEstimated Auto-generated benchmark | 22GB |
| microsoft/VibeVoice-1.5B | FP16 | 53.26 tok/sEstimated Auto-generated benchmark | 11GB |
| GSAI-ML/LLaDA-8B-Base | FP16 | 53.23 tok/sEstimated Auto-generated benchmark | 17GB |
| HuggingFaceTB/SmolLM2-135M | FP16 | 53.17 tok/sEstimated Auto-generated benchmark | 15GB |
| bigscience/bloomz-560m | FP16 | 53.16 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-v0.1 | FP16 | 53.13 tok/sEstimated Auto-generated benchmark | 15GB |
| vikhyatk/moondream2 | FP16 | 53.06 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 52.97 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit | FP16 | 52.95 tok/sEstimated Auto-generated benchmark | 9GB |
| Qwen/QwQ-32B-Preview | Q4 | 52.92 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-1.7B-Base | FP16 | 52.92 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-Embedding-0.6B | FP16 | 52.87 tok/sEstimated Auto-generated benchmark | 13GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | Q8 | 52.85 tok/sEstimated Auto-generated benchmark | 31GB |
| black-forest-labs/FLUX.2-dev | FP16 | 52.79 tok/sEstimated Auto-generated benchmark | 16GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q4 | 52.63 tok/sEstimated Auto-generated benchmark | 25GB |
| trl-internal-testing/tiny-random-LlamaForCausalLM | FP16 | 52.59 tok/sEstimated Auto-generated benchmark | 15GB |
| petals-team/StableBeluga2 | FP16 | 52.51 tok/sEstimated Auto-generated benchmark | 15GB |
| unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | FP16 | 52.47 tok/sEstimated Auto-generated benchmark | 17GB |
| openai-community/gpt2-medium | FP16 | 52.45 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen3-4B-Thinking-2507-FP8 | FP16 | 52.42 tok/sEstimated Auto-generated benchmark | 9GB |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | FP16 | 52.30 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 52.17 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 51.97 tok/sEstimated Auto-generated benchmark | 17GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | FP16 | 51.83 tok/sEstimated Auto-generated benchmark | 15GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q4 | 51.81 tok/sEstimated Auto-generated benchmark | 34GB |
| google/gemma-3-270m-it | FP16 | 51.59 tok/sEstimated Auto-generated benchmark | 15GB |
| dicta-il/dictalm2.0-instruct | FP16 | 51.55 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2-0.5B | FP16 | 51.53 tok/sEstimated Auto-generated benchmark | 11GB |
| zai-org/GLM-4.5-Air | FP16 | 51.52 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | 51.52 tok/sEstimated Auto-generated benchmark | 31GB |
| huggyllama/llama-7b | FP16 | 51.43 tok/sEstimated Auto-generated benchmark | 15GB |
| Qwen/Qwen2.5-7B | FP16 | 51.34 tok/sEstimated Auto-generated benchmark | 15GB |
| baichuan-inc/Baichuan-M2-32B | Q4 | 51.25 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/DeepSeek-V2.5 | Q4 | 51.20 tok/sEstimated Auto-generated benchmark | 328GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 51.18 tok/sEstimated Auto-generated benchmark | 17GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | Q8 | 51.10 tok/sEstimated Auto-generated benchmark | 31GB |
| numind/NuExtract-1.5 | FP16 | 51.10 tok/sEstimated Auto-generated benchmark | 15GB |
| zai-org/GLM-4.6-FP8 | FP16 | 50.85 tok/sEstimated Auto-generated benchmark | 15GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | Q8 | 50.81 tok/sEstimated Auto-generated benchmark | 20GB |
| meta-llama/Llama-3.1-8B | FP16 | 50.78 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3.5-mini-instruct | FP16 | 50.46 tok/sEstimated Auto-generated benchmark | 15GB |
| EleutherAI/pythia-70m-deduped | FP16 | 50.39 tok/sEstimated Auto-generated benchmark | 15GB |
| rednote-hilab/dots.ocr | FP16 | 50.35 tok/sEstimated Auto-generated benchmark | 15GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit | FP16 | 50.33 tok/sEstimated Auto-generated benchmark | 17GB |
| Tongyi-MAI/Z-Image-Turbo | FP16 | 50.28 tok/sEstimated Auto-generated benchmark | 16GB |
| Qwen/Qwen2.5-32B-Instruct | Q4 | 50.27 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/DeepSeek-R1 | FP16 | 50.20 tok/sEstimated Auto-generated benchmark | 15GB |
| mistralai/Mistral-7B-Instruct-v0.2 | FP16 | 50.11 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-V3.1 | FP16 | 50.06 tok/sEstimated Auto-generated benchmark | 15GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q4 | 50.00 tok/sEstimated Auto-generated benchmark | 17GB |
| microsoft/Phi-3-mini-128k-instruct | FP16 | 49.93 tok/sEstimated Auto-generated benchmark | 15GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q4 | 49.70 tok/sEstimated Auto-generated benchmark | 16GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | 49.54 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen3-32B | Q4 | 49.19 tok/sEstimated Auto-generated benchmark | 16GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 48.29 tok/sEstimated Auto-generated benchmark | 34GB |
| 01-ai/Yi-1.5-34B-Chat | Q4 | 47.57 tok/sEstimated Auto-generated benchmark | 18GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 46.95 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q4 | 46.84 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 45.11 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen2.5-14B-Instruct | FP16 | 44.43 tok/sEstimated Auto-generated benchmark | 30GB |
| OpenPipe/Qwen3-14B-Instruct | FP16 | 44.26 tok/sEstimated Auto-generated benchmark | 29GB |
| Qwen/Qwen3-14B | FP16 | 42.91 tok/sEstimated Auto-generated benchmark | 29GB |
| meta-llama/Llama-2-13b-chat-hf | FP16 | 42.58 tok/sEstimated Auto-generated benchmark | 27GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 42.10 tok/sEstimated Auto-generated benchmark | 17GB |
| Qwen/Qwen2.5-14B | FP16 | 42.05 tok/sEstimated Auto-generated benchmark | 29GB |
| microsoft/Phi-3-medium-128k-instruct | FP16 | 41.52 tok/sEstimated Auto-generated benchmark | 29GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | 41.15 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen3-14B-Base | FP16 | 40.67 tok/sEstimated Auto-generated benchmark | 29GB |
| mistralai/Ministral-3-14B-Instruct-2512 | FP16 | 40.27 tok/sEstimated Auto-generated benchmark | 32GB |
| EssentialAI/rnj-1 | FP16 | 40.15 tok/sEstimated Auto-generated benchmark | 19GB |
| google/gemma-2-9b-it | FP16 | 39.39 tok/sEstimated Auto-generated benchmark | 20GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 39.14 tok/sEstimated Auto-generated benchmark | 68GB |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | 38.52 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen2.5-32B | Q8 | 38.28 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 38.13 tok/sEstimated Auto-generated benchmark | 33GB |
| Qwen/Qwen3-32B | Q8 | 38.03 tok/sEstimated Auto-generated benchmark | 33GB |
| ai-forever/ruGPT-3.5-13B | FP16 | 37.90 tok/sEstimated Auto-generated benchmark | 27GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | Q8 | 37.85 tok/sEstimated Auto-generated benchmark | 68GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 37.84 tok/sEstimated Auto-generated benchmark | 17GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | Q8 | 37.58 tok/sEstimated Auto-generated benchmark | 33GB |
| 01-ai/Yi-1.5-34B-Chat | Q8 | 37.51 tok/sEstimated Auto-generated benchmark | 35GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 37.40 tok/sEstimated Auto-generated benchmark | 19GB |
| moonshotai/Kimi-K2-Thinking | Q8 | 37.33 tok/sEstimated Auto-generated benchmark | 978GB |
| meta-llama/Meta-Llama-3-70B-Instruct | Q8 | 37.28 tok/sEstimated Auto-generated benchmark | 68GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Q8 | 36.62 tok/sEstimated Auto-generated benchmark | 33GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | Q8 | 36.60 tok/sEstimated Auto-generated benchmark | 35GB |
| baichuan-inc/Baichuan-M2-32B | Q8 | 35.94 tok/sEstimated Auto-generated benchmark | 33GB |
| deepseek-ai/DeepSeek-V2.5 | Q8 | 35.66 tok/sEstimated Auto-generated benchmark | 656GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Q8 | 35.53 tok/sEstimated Auto-generated benchmark | 50GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | Q8 | 35.31 tok/sEstimated Auto-generated benchmark | 34GB |
| codellama/CodeLlama-34b-hf | Q8 | 34.69 tok/sEstimated Auto-generated benchmark | 35GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | Q8 | 34.57 tok/sEstimated Auto-generated benchmark | 68GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 34.12 tok/sEstimated Auto-generated benchmark | 68GB |
| Qwen/Qwen2.5-32B-Instruct | Q8 | 33.64 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/QwQ-32B-Preview | Q8 | 32.88 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-30B-A3B | FP16 | 32.19 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 | FP16 | 31.88 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q4 | 31.61 tok/sEstimated Auto-generated benchmark | 39GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | FP16 | 31.31 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | FP16 | 31.28 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 30.94 tok/sEstimated Auto-generated benchmark | 35GB |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | 30.77 tok/sEstimated Auto-generated benchmark | 61GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit | FP16 | 30.62 tok/sEstimated Auto-generated benchmark | 61GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | FP16 | 30.50 tok/sEstimated Auto-generated benchmark | 61GB |
| openai/gpt-oss-safeguard-20b | FP16 | 30.49 tok/sEstimated Auto-generated benchmark | 44GB |
| AI-MO/Kimina-Prover-72B | Q4 | 30.35 tok/sEstimated Auto-generated benchmark | 35GB |
| mlx-community/gpt-oss-20b-MXFP4-Q8 | FP16 | 30.07 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q4 | 29.99 tok/sEstimated Auto-generated benchmark | 39GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q4 | 29.99 tok/sEstimated Auto-generated benchmark | 34GB |
| mistralai/Mistral-Small-Instruct-2409 | FP16 | 29.91 tok/sEstimated Auto-generated benchmark | 46GB |
| openai/gpt-oss-120b | Q4 | 29.13 tok/sEstimated Auto-generated benchmark | 59GB |
| google/gemma-2-27b-it | FP16 | 29.07 tok/sEstimated Auto-generated benchmark | 56GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q4 | 29.00 tok/sEstimated Auto-generated benchmark | 36GB |
| unsloth/gpt-oss-20b-unsloth-bnb-4bit | FP16 | 28.87 tok/sEstimated Auto-generated benchmark | 41GB |
| meta-llama/Llama-3.3-70B-Instruct | Q4 | 28.83 tok/sEstimated Auto-generated benchmark | 34GB |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | 28.58 tok/sEstimated Auto-generated benchmark | 61GB |
| Qwen/Qwen2.5-72B-Instruct | Q4 | 28.36 tok/sEstimated Auto-generated benchmark | 36GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q4 | 28.35 tok/sEstimated Auto-generated benchmark | 44GB |
| unsloth/gpt-oss-20b-BF16 | FP16 | 28.33 tok/sEstimated Auto-generated benchmark | 41GB |
| Qwen/Qwen3-Coder-30B-A3B-Instruct | FP16 | 28.31 tok/sEstimated Auto-generated benchmark | 61GB |
| meta-llama/Llama-3.1-70B-Instruct | Q4 | 28.15 tok/sEstimated Auto-generated benchmark | 34GB |
| openai/gpt-oss-20b | FP16 | 28.02 tok/sEstimated Auto-generated benchmark | 41GB |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | 27.37 tok/sEstimated Auto-generated benchmark | 60GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q4 | 27.25 tok/sEstimated Auto-generated benchmark | 34GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q4 | 26.88 tok/sEstimated Auto-generated benchmark | 39GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q4 | 26.67 tok/sEstimated Auto-generated benchmark | 39GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q8 | 24.84 tok/sEstimated Auto-generated benchmark | 138GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q4 | 22.69 tok/sEstimated Auto-generated benchmark | 115GB |
| AI-MO/Kimina-Prover-72B | Q8 | 22.36 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 22.11 tok/sEstimated Auto-generated benchmark | 70GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Q8 | 22.10 tok/sEstimated Auto-generated benchmark | 69GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | 21.74 tok/sEstimated Auto-generated benchmark | 78GB |
| meta-llama/Llama-3.3-70B-Instruct | Q8 | 21.61 tok/sEstimated Auto-generated benchmark | 69GB |
| mistralai/Mistral-Large-Instruct-2411 | Q8 | 21.21 tok/sEstimated Auto-generated benchmark | 120GB |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | FP16 | 21.20 tok/sEstimated Auto-generated benchmark | 101GB |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | FP16 | 21.08 tok/sEstimated Auto-generated benchmark | 66GB |
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | FP16 | 21.03 tok/sEstimated Auto-generated benchmark | 137GB |
| deepseek-ai/DeepSeek-Math-V2 | Q4 | 20.90 tok/sEstimated Auto-generated benchmark | 383GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | 20.69 tok/sEstimated Auto-generated benchmark | 78GB |
| NousResearch/Hermes-3-Llama-3.1-70B | Q8 | 20.57 tok/sEstimated Auto-generated benchmark | 69GB |
| moonshotai/Kimi-K2-Thinking | FP16 | 20.56 tok/sEstimated Auto-generated benchmark | 1956GB |
| dphn/dolphin-2.9.1-yi-1.5-34b | FP16 | 20.52 tok/sEstimated Auto-generated benchmark | 70GB |
| Qwen/Qwen2.5-32B | FP16 | 20.29 tok/sEstimated Auto-generated benchmark | 66GB |
| openai/gpt-oss-120b | Q8 | 20.08 tok/sEstimated Auto-generated benchmark | 117GB |
| Qwen/Qwen2.5-Coder-32B-Instruct | FP16 | 20.07 tok/sEstimated Auto-generated benchmark | 67GB |
| deepseek-ai/deepseek-coder-33b-instruct | FP16 | 20.04 tok/sEstimated Auto-generated benchmark | 68GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 20.02 tok/sEstimated Auto-generated benchmark | 137GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | 20.01 tok/sEstimated Auto-generated benchmark | 88GB |
| Qwen/Qwen2.5-Math-72B-Instruct | Q8 | 19.58 tok/sEstimated Auto-generated benchmark | 71GB |
| meta-llama/Llama-3.1-70B-Instruct | Q8 | 19.41 tok/sEstimated Auto-generated benchmark | 69GB |
| 01-ai/Yi-1.5-34B-Chat | FP16 | 19.36 tok/sEstimated Auto-generated benchmark | 70GB |
| deepseek-ai/DeepSeek-V2.5 | FP16 | 19.25 tok/sEstimated Auto-generated benchmark | 1312GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 19.22 tok/sEstimated Auto-generated benchmark | 137GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 19.15 tok/sEstimated Auto-generated benchmark | 66GB |
| unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit | FP16 | 18.89 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 18.75 tok/sEstimated Auto-generated benchmark | 71GB |
| baichuan-inc/Baichuan-M2-32B | FP16 | 18.66 tok/sEstimated Auto-generated benchmark | 66GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | 18.64 tok/sEstimated Auto-generated benchmark | 78GB |
| meta-llama/Meta-Llama-3-70B-Instruct | FP16 | 18.52 tok/sEstimated Auto-generated benchmark | 137GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | 18.50 tok/sEstimated Auto-generated benchmark | 78GB |
| Qwen/QwQ-32B-Preview | FP16 | 18.50 tok/sEstimated Auto-generated benchmark | 67GB |
| Qwen/Qwen3-32B | FP16 | 18.46 tok/sEstimated Auto-generated benchmark | 66GB |
| RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | FP16 | 18.22 tok/sEstimated Auto-generated benchmark | 137GB |
| MiniMaxAI/MiniMax-M1-40k | Q4 | 17.82 tok/sEstimated Auto-generated benchmark | 255GB |
| codellama/CodeLlama-34b-hf | FP16 | 17.73 tok/sEstimated Auto-generated benchmark | 70GB |
| MiniMaxAI/MiniMax-VL-01 | Q4 | 17.65 tok/sEstimated Auto-generated benchmark | 256GB |
| Qwen/Qwen2.5-32B-Instruct | FP16 | 17.49 tok/sEstimated Auto-generated benchmark | 67GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | 16.95 tok/sEstimated Auto-generated benchmark | 378GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | Q8 | 16.42 tok/sEstimated Auto-generated benchmark | 231GB |
| Qwen/Qwen3-235B-A22B | Q4 | 16.09 tok/sEstimated Auto-generated benchmark | 115GB |
| deepseek-ai/DeepSeek-Math-V2 | Q8 | 14.91 tok/sEstimated Auto-generated benchmark | 766GB |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | FP16 | 14.78 tok/sEstimated Auto-generated benchmark | 275GB |
| MiniMaxAI/MiniMax-M1-40k | Q8 | 13.21 tok/sEstimated Auto-generated benchmark | 510GB |
| Qwen/Qwen3-235B-A22B | Q8 | 13.07 tok/sEstimated Auto-generated benchmark | 230GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | 12.99 tok/sEstimated Auto-generated benchmark | 755GB |
| MiniMaxAI/MiniMax-VL-01 | Q8 | 11.81 tok/sEstimated Auto-generated benchmark | 511GB |
| mistralai/Mistral-Large-Instruct-2411 | FP16 | 11.79 tok/sEstimated Auto-generated benchmark | 240GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | FP16 | 11.78 tok/sEstimated Auto-generated benchmark | 156GB |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | 11.60 tok/sEstimated Auto-generated benchmark | 138GB |
| openai/gpt-oss-120b | FP16 | 11.46 tok/sEstimated Auto-generated benchmark | 235GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 11.46 tok/sEstimated Auto-generated benchmark | 142GB |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | 11.09 tok/sEstimated Auto-generated benchmark | 138GB |
| nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | FP16 | 11.04 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | FP16 | 10.83 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen2.5-72B-Instruct | FP16 | 10.80 tok/sEstimated Auto-generated benchmark | 141GB |
| AI-MO/Kimina-Prover-72B | FP16 | 10.78 tok/sEstimated Auto-generated benchmark | 141GB |
| meta-llama/Llama-3.1-70B-Instruct | FP16 | 10.73 tok/sEstimated Auto-generated benchmark | 138GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | FP16 | 10.56 tok/sEstimated Auto-generated benchmark | 156GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | 10.41 tok/sEstimated Auto-generated benchmark | 156GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | FP16 | 10.18 tok/sEstimated Auto-generated benchmark | 176GB |
| Qwen/Qwen2.5-Math-72B-Instruct | FP16 | 10.12 tok/sEstimated Auto-generated benchmark | 142GB |
| deepseek-ai/DeepSeek-Math-V2 | FP16 | 8.69 tok/sEstimated Auto-generated benchmark | 1532GB |
| deepseek-ai/DeepSeek-Coder-V2-Instruct-0724 | FP16 | 7.85 tok/sEstimated Auto-generated benchmark | 461GB |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | 7.21 tok/sEstimated Auto-generated benchmark | 1509GB |
| Qwen/Qwen3-235B-A22B | FP16 | 7.01 tok/sEstimated Auto-generated benchmark | 460GB |
| MiniMaxAI/MiniMax-VL-01 | FP16 | 6.81 tok/sEstimated Auto-generated benchmark | 1021GB |
| MiniMaxAI/MiniMax-M1-40k | FP16 | 6.52 tok/sEstimated Auto-generated benchmark | 1020GB |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
| Model | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|
| EssentialAI/rnj-1 | FP16 | Fits comfortably | 40.15 tok/sEstimated | 19GB (have 24GB) |
| EssentialAI/rnj-1 | Q8 | Fits comfortably | 80.19 tok/sEstimated | 10GB (have 24GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q4 | Not supported | 16.95 tok/sEstimated | 378GB (have 24GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | Q8 | Not supported | 12.99 tok/sEstimated | 755GB (have 24GB) |
| mistralai/Mistral-Large-3-675B-Instruct-2512 | FP16 | Not supported | 7.21 tok/sEstimated | 1509GB (have 24GB) |
| EssentialAI/rnj-1 | Q4 | Fits comfortably | 119.84 tok/sEstimated | 5GB (have 24GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q4 | Fits comfortably | 49.54 tok/sEstimated | 17GB (have 24GB) |
| BSC-LT/salamandraTA-7b-instruct | Q8 | Fits comfortably | 96.41 tok/sEstimated | 7GB (have 24GB) |
| liuhaotian/llava-v1.5-7b | Q4 | Fits comfortably | 135.17 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | Not supported | 18.64 tok/sEstimated | 78GB (have 24GB) |
| Qwen/Qwen3-Next-80B-A3B-Thinking | FP16 | Not supported | 10.41 tok/sEstimated | 156GB (have 24GB) |
| meta-llama/Llama-2-13b-chat-hf | Q4 | Fits comfortably | 109.15 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | FP16 | Not supported | 30.77 tok/sEstimated | 61GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | Q8 | Not supported | 58.46 tok/sEstimated | 31GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 98.83 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | FP16 | Not supported | 10.80 tok/sEstimated | 141GB (have 24GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q8 | Fits (tight) | 57.35 tok/sEstimated | 23GB (have 24GB) |
| mistralai/Mistral-Small-Instruct-2409 | Q4 | Fits comfortably | 80.08 tok/sEstimated | 11GB (have 24GB) |
| google/gemma-2-27b-it | Q4 | Fits comfortably | 73.57 tok/sEstimated | 14GB (have 24GB) |
| google/gemma-2-27b-it | Q8 | Not supported | 55.09 tok/sEstimated | 28GB (have 24GB) |
| black-forest-labs/FLUX.2-dev | FP16 | Fits comfortably | 52.79 tok/sEstimated | 16GB (have 24GB) |
| OpenPipe/Qwen3-14B-Instruct | Q8 | Fits comfortably | 79.54 tok/sEstimated | 14GB (have 24GB) |
| OpenPipe/Qwen3-14B-Instruct | FP16 | Not supported | 44.26 tok/sEstimated | 29GB (have 24GB) |
| openai/gpt-oss-120b | Q4 | Not supported | 29.13 tok/sEstimated | 59GB (have 24GB) |
| Qwen/Qwen3-4B | Q4 | Fits comfortably | 150.01 tok/sEstimated | 2GB (have 24GB) |
| OpenPipe/Qwen3-14B-Instruct | Q4 | Fits comfortably | 107.16 tok/sEstimated | 7GB (have 24GB) |
| openai-community/gpt2-xl | Q4 | Fits comfortably | 155.17 tok/sEstimated | 4GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q8 | Fits comfortably | 116.24 tok/sEstimated | 3GB (have 24GB) |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | Q4 | Fits comfortably | 157.33 tok/sEstimated | 4GB (have 24GB) |
| facebook/opt-125m | Q8 | Fits comfortably | 94.09 tok/sEstimated | 7GB (have 24GB) |
| meta-llama/Llama-3.2-3B-Instruct | Q4 | Fits comfortably | 140.00 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q4 | Fits comfortably | 119.44 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-1.5B | FP16 | Fits comfortably | 59.49 tok/sEstimated | 11GB (have 24GB) |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | Q4 | Fits comfortably | 143.50 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2.5-14B-Instruct | Q8 | Fits comfortably | 82.26 tok/sEstimated | 14GB (have 24GB) |
| meta-llama/Llama-3.3-70B-Instruct | FP16 | Not supported | 20.02 tok/sEstimated | 137GB (have 24GB) |
| Qwen/Qwen3-Embedding-8B | Q4 | Fits comfortably | 150.26 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-Embedding-8B | Q8 | Fits comfortably | 97.11 tok/sEstimated | 9GB (have 24GB) |
| Qwen/Qwen3-14B | Q8 | Fits comfortably | 74.34 tok/sEstimated | 14GB (have 24GB) |
| Qwen/Qwen3-14B | FP16 | Not supported | 42.91 tok/sEstimated | 29GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q4 | Fits comfortably | 137.43 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Q8 | Fits comfortably | 103.44 tok/sEstimated | 7GB (have 24GB) |
| meta-llama/Llama-2-7b-hf | FP16 | Fits comfortably | 53.76 tok/sEstimated | 15GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q4 | Fits comfortably | 155.33 tok/sEstimated | 4GB (have 24GB) |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Q8 | Fits comfortably | 105.32 tok/sEstimated | 9GB (have 24GB) |
| Qwen/Qwen2-0.5B | Q4 | Fits comfortably | 135.96 tok/sEstimated | 3GB (have 24GB) |
| Qwen/Qwen2-0.5B | Q8 | Fits comfortably | 103.06 tok/sEstimated | 5GB (have 24GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q4 | Fits comfortably | 189.80 tok/sEstimated | 2GB (have 24GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | Q8 | Fits comfortably | 127.79 tok/sEstimated | 3GB (have 24GB) |
| deepseek-ai/deepseek-coder-1.3b-instruct | FP16 | Fits comfortably | 63.14 tok/sEstimated | 6GB (have 24GB) |
| microsoft/phi-4 | Q4 | Fits comfortably | 145.33 tok/sEstimated | 4GB (have 24GB) |
| microsoft/phi-4 | Q8 | Fits comfortably | 102.80 tok/sEstimated | 7GB (have 24GB) |
| microsoft/phi-4 | FP16 | Fits comfortably | 59.19 tok/sEstimated | 15GB (have 24GB) |
| meta-llama/Llama-3.1-8B | FP16 | Fits comfortably | 50.78 tok/sEstimated | 17GB (have 24GB) |
| Qwen/Qwen3-8B-FP8 | Q4 | Fits comfortably | 147.13 tok/sEstimated | 4GB (have 24GB) |
| microsoft/DialoGPT-small | Q4 | Fits comfortably | 142.26 tok/sEstimated | 4GB (have 24GB) |
| microsoft/DialoGPT-small | Q8 | Fits comfortably | 94.39 tok/sEstimated | 7GB (have 24GB) |
| microsoft/DialoGPT-small | FP16 | Fits comfortably | 60.39 tok/sEstimated | 15GB (have 24GB) |
| microsoft/Phi-3.5-vision-instruct | Q4 | Fits comfortably | 148.42 tok/sEstimated | 4GB (have 24GB) |
| microsoft/Phi-3.5-vision-instruct | FP16 | Fits comfortably | 58.76 tok/sEstimated | 15GB (have 24GB) |
| swiss-ai/Apertus-8B-Instruct-2509 | Q4 | Fits comfortably | 156.77 tok/sEstimated | 4GB (have 24GB) |
| sshleifer/tiny-gpt2 | Q8 | Fits comfortably | 103.00 tok/sEstimated | 7GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q4 | Fits comfortably | 76.44 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen2-0.5B-Instruct | Q4 | Fits comfortably | 142.19 tok/sEstimated | 3GB (have 24GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | FP16 | Fits comfortably | 60.11 tok/sEstimated | 9GB (have 24GB) |
| Qwen/Qwen3-30B-A3B-Thinking-2507 | Q8 | Not supported | 59.96 tok/sEstimated | 31GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit | Q8 | Not supported | 51.52 tok/sEstimated | 31GB (have 24GB) |
| Qwen/Qwen3-1.7B-Base | Q4 | Fits comfortably | 153.53 tok/sEstimated | 4GB (have 24GB) |
| Qwen/Qwen3-1.7B-Base | Q8 | Fits comfortably | 105.42 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen3-1.7B-Base | FP16 | Fits comfortably | 52.92 tok/sEstimated | 15GB (have 24GB) |
| ibm-granite/granite-3.3-8b-instruct | Q4 | Fits comfortably | 151.60 tok/sEstimated | 4GB (have 24GB) |
| ibm-granite/granite-3.3-8b-instruct | Q8 | Fits comfortably | 102.98 tok/sEstimated | 9GB (have 24GB) |
| ibm-granite/granite-3.3-8b-instruct | FP16 | Fits comfortably | 57.86 tok/sEstimated | 17GB (have 24GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q4 | Fits comfortably | 154.54 tok/sEstimated | 4GB (have 24GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | Q8 | Fits comfortably | 110.30 tok/sEstimated | 7GB (have 24GB) |
| HuggingFaceM4/tiny-random-LlamaForCausalLM | FP16 | Fits comfortably | 54.07 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | 18.75 tok/sEstimated | 71GB (have 24GB) |
| NousResearch/Hermes-3-Llama-3.1-70B | FP16 | Not supported | 11.09 tok/sEstimated | 138GB (have 24GB) |
| MiniMaxAI/MiniMax-M1-40k | Q4 | Not supported | 17.82 tok/sEstimated | 255GB (have 24GB) |
| MiniMaxAI/MiniMax-M1-40k | Q8 | Not supported | 13.21 tok/sEstimated | 510GB (have 24GB) |
| MiniMaxAI/MiniMax-M1-40k | FP16 | Not supported | 6.52 tok/sEstimated | 1020GB (have 24GB) |
| openai/gpt-oss-20b | Q4 | Fits comfortably | 79.69 tok/sEstimated | 10GB (have 24GB) |
| apple/OpenELM-1_1B-Instruct | FP16 | Fits comfortably | 68.19 tok/sEstimated | 2GB (have 24GB) |
| kaitchup/Phi-3-mini-4k-instruct-gptq-4bit | Q4 | Fits comfortably | 134.49 tok/sEstimated | 2GB (have 24GB) |
| lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit | FP16 | Not supported | 28.58 tok/sEstimated | 61GB (have 24GB) |
| deepseek-ai/deepseek-coder-33b-instruct | Q8 | Not supported | 38.52 tok/sEstimated | 34GB (have 24GB) |
| Qwen/Qwen2.5-72B-Instruct | Q8 | Not supported | 22.11 tok/sEstimated | 70GB (have 24GB) |
| HuggingFaceH4/zephyr-7b-beta | Q4 | Fits comfortably | 147.42 tok/sEstimated | 4GB (have 24GB) |
| openai-community/gpt2 | Q4 | Fits comfortably | 151.17 tok/sEstimated | 4GB (have 24GB) |
| openai-community/gpt2 | Q8 | Fits comfortably | 95.39 tok/sEstimated | 7GB (have 24GB) |
| Qwen/Qwen2.5-7B-Instruct | Q8 | Fits comfortably | 98.49 tok/sEstimated | 7GB (have 24GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q4 | Fits comfortably | 149.85 tok/sEstimated | 3GB (have 24GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | Q8 | Fits comfortably | 107.21 tok/sEstimated | 5GB (have 24GB) |
| Gensyn/Qwen2.5-0.5B-Instruct | FP16 | Fits comfortably | 58.88 tok/sEstimated | 11GB (have 24GB) |
| meta-llama/Llama-3.1-8B-Instruct | Q4 | Fits comfortably | 152.36 tok/sEstimated | 4GB (have 24GB) |
| google/gemma-3-1b-it | Q8 | Fits comfortably | 110.01 tok/sEstimated | 1GB (have 24GB) |
| google/gemma-3-1b-it | FP16 | Fits comfortably | 69.40 tok/sEstimated | 2GB (have 24GB) |
| Qwen/Qwen3-Embedding-0.6B | Q4 | Fits comfortably | 146.14 tok/sEstimated | 3GB (have 24GB) |
| bigscience/bloomz-560m | FP16 | Fits comfortably | 53.16 tok/sEstimated | 15GB (have 24GB) |
| Qwen/Qwen3-4B-Thinking-2507 | FP16 | Fits comfortably | 58.85 tok/sEstimated | 9GB (have 24GB) |
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
On qwen3-30B Q4, Vulkan decode hits ~117 tok/sec once a 32K context fills, while ROCm drops to ~12 tok/sec—making Vulkan the faster option for long prompts.
Source: Reddit – /r/LocalLLaMA (mrdpho0)
The same benchmarks show Vulkan prompt prefill at ~486 tok/s on Windows drivers versus ~432 tok/s on ROCm, highlighting the driver advantage.
Source: Reddit – /r/LocalLLaMA (mrdpho0)
Yes. Builders highlight Ryzen AI 395 mini-PCs with RX 7900-class GPUs that can load 70B Q8 contexts—something 24 GB NVIDIA cards can’t do—though throughput is slower.
Source: Reddit – /r/LocalLLaMA (mqupq0a)
Not yet—FlashAttention under Vulkan falls back to the CPU on 7900 XTX, so enabling it doesn’t improve throughput the way it does on NVIDIA cards.
Source: Reddit – /r/LocalLLaMA (mrdpho0)
RX 7900 XTX offers 24 GB GDDR6 and a 355 W TBP. As of Nov 2025 Amazon listed it at $899 in stock.
Explore how RX 7900 XT stacks up for local inference workloads.
Explore how RX 6900 XT stacks up for local inference workloads.
Explore how RTX 4090 stacks up for local inference workloads.
Explore how RTX 4080 stacks up for local inference workloads.
Explore how RTX 4070 Ti stacks up for local inference workloads.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.
Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.