Quick Answer: RTX 4090 offers 24GB VRAM and starts around current market pricing. It delivers approximately 270 tokens/sec on Deepseek AI Deepseek Ocr 2. It typically draws 450W under load.
RTX 4090 remains the go-to GPU for local AI workloads. It runs every mainstream 70B model, sustains the fastest consumer inference speeds, and anchors premium builds that scale to production deployments.
With 24GB VRAM, RTX 4090 can run models up to approximately 60B parameters using 4-bit quantization. This handles most popular models including Llama 3 70B, Mistral 7B, and larger.
Consider RTX 4090 or RTX 6000 Ada — 24GB Ada offers better efficiency than Ampere.
Showing 80 of 80 benchmark rows.
| Model | Size | Quantization | Tokens/sec | VRAM used |
|---|---|---|---|---|
| Deepseek AI Deepseek Ocr 2 | Unknown | Q4 | 270.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Deepseek AI Deepseek Math V2 | Unknown | Q4 | 270.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Deepseek AI Deepseek V2 5 | Unknown | Q4 | 270.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Deepseek AI Deepseek V3 | Unknown | Q4 | 270.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Deepseek AI Deepseek Coder V2 Lite Instruct | Unknown | Q4 | 270.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Deepseek AI Deepseek V3.1 | Unknown | Q4 | 270.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Deepseek AI Deepseek Coder 1.3B Instruct | 1.3B | Q4 | 270.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Deepseek AI Deepseek R1 Distill Qwen 1.5B | 1.5B | Q4 | 270.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Deepseek AI Deepseek Ocr | Unknown | Q4 | 225.00 tok/sEstimated Static estimation (DB-independent) | 4GB |
| Lmstudio Community Deepseek R1 0528 Qwen3 8B Mlx 8bit | 8B | Q4 | 225.00 tok/sEstimated Static estimation (DB-independent) | 4GB |
| Lmstudio Community Deepseek R1 0528 Qwen3 8B Mlx 4bit | 8B | Q4 | 225.00 tok/sEstimated Static estimation (DB-independent) | 4GB |
| Deepseek AI Deepseek R1 | Unknown | Q4 | 225.00 tok/sEstimated Static estimation (DB-independent) | 4GB |
| Deepseek AI Deepseek R1 0528 | Unknown | Q4 | 225.00 tok/sEstimated Static estimation (DB-independent) | 4GB |
| Deepseek AI Deepseek R1 Distill Llama 8B | 8B | Q4 | 225.00 tok/sEstimated Static estimation (DB-independent) | 4GB |
| Deepseek AI Deepseek R1 Distill Qwen 7B | 7B | Q4 | 225.00 tok/sEstimated Static estimation (DB-independent) | 4GB |
| Nineninesix Kani Tts 2 En | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Nanbeige Nanbeige4 1 3B | 3B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Minimaxai Minimax M2 5 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Minimaxai Minimax M2 1 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Stepfun AI Step 3 5 Flash | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Qwen Qwen3 Coder Next | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Moonshotai Kimi K2 5 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Xiaomimimo Mimo V2 Flash | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Nari Labs Dia2 2B | 2B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Google Embeddinggemma 300M | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Facebook Sam3 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Black Forest Labs Flux 2 Dev | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Moonshotai Kimi K2 Thinking | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Microsoft Phi 3 5 Mini Instruct | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Meta Llama Llama 3 2 3B Instruct | 3B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Qwen Qwen3 1.7B Base | 1.7B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Dicta Il Dictalm2.0 Instruct | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen2 0.5B Instruct | 0.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Alibaba Nlp Gte Qwen2 1.5B Instruct | 1.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Apple Openelm 1 1B Instruct | 1B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen2.5 3B | 3B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Unsloth Gemma 3 1B It | 1B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Bigcode Starcoder2 3B | 3B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Ibm Granite Granite Docling 258M | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Skt Kogpt2 Base V2 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Google Gemma 3 270M It | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Eleutherai Pythia 70M Deduped | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Microsoft Vibevoice 1.5B | 1.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Ibm Granite Granite 3.3 2B Instruct | 2B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Google Gemma 2B | 2B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Trl Internal Testing Tiny Llamaforcausallm 3.2 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Llamafactory Tiny Random Llama 3 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Unsloth Llama 3.2 1B Instruct | 1B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Numind Nuextract 1.5 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Hmellor Tiny Random Llamaforcausallm | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Sshleifer Tiny Gpt2 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Openai Community Gpt2 Xl | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Ibm Research Powermoe 3B | 3B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Unsloth Llama 3.2 3B Instruct | 3B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Meta Llama Llama 3.2 3B | 3B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Eleutherai Gpt Neo 125M | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Meta Llama Llama Guard 3 1B | 1B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen2 1.5B Instruct | 1.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Google Gemma 2 2B It | 2B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Microsoft Phi 3.5 Mini Instruct | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Microsoft Phi 3.5 Vision Instruct | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Rinna Japanese Gpt Neox Small | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen2.5 Coder 1.5B | 1.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Microsoft Dialogpt Small | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen3 0.6B Base | 0.6B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Openai Community Gpt2 Medium | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Trl Internal Testing Tiny Random Llamaforcausallm | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen2.5 Math 1.5B | 1.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Huggingfacetb Smollm 135M | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Liquidai Lfm2 1.2B | 1.2B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen2 0.5B | 0.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Minimaxai Minimax M2 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Huggingfacetb Smollm2 135M | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Microsoft Phi 2 | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen2.5 0.5B | 0.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen2.5 1.5B | 1.5B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Qwen Qwen3 Reranker 0.6B | 0.6B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Google T5 T5 3B | 3B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 2GB |
| Qwen Qwen3 1.7B | 1.7B | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
| Openai Community Gpt2 Large | Unknown | Q4 | 216.00 tok/sEstimated Static estimation (DB-independent) | 1GB |
Showing 240 of 240 compatibility rows.
| Model | Size | Quantization | Verdict | Estimated speed | VRAM needed |
|---|---|---|---|---|---|
| 01 AI Yi 1 5 34B Chat | 34B | Q4 | Fits comfortably | 63.00 tok/sEstimated | 17GB (have 24GB) |
| 01 AI Yi 1 5 34B Chat | 34B | Q8 | Not supported | 44.10 tok/sEstimated | 34GB (have 24GB) |
| 01 AI Yi 1 5 34B Chat | 34B | FP16 | Not supported | 23.94 tok/sEstimated | 68GB (have 24GB) |
| AI Forever Rugpt 3.5 13B | 13B | Q4 | Fits comfortably | 135.00 tok/sEstimated | 7GB (have 24GB) |
| AI Forever Rugpt 3.5 13B | 13B | Q8 | Fits comfortably | 94.50 tok/sEstimated | 13GB (have 24GB) |
| AI Forever Rugpt 3.5 13B | 13B | FP16 | Not supported | 51.30 tok/sEstimated | 26GB (have 24GB) |
| AI Mo Kimina Prover 72B | 72B | Q4 | Not supported | 36.00 tok/sEstimated | 36GB (have 24GB) |
| AI Mo Kimina Prover 72B | 72B | Q8 | Not supported | 25.20 tok/sEstimated | 72GB (have 24GB) |
| AI Mo Kimina Prover 72B | 72B | FP16 | Not supported | 13.68 tok/sEstimated | 144GB (have 24GB) |
| Alibaba Nlp Gte Qwen2 1.5B Instruct | 1.5B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Alibaba Nlp Gte Qwen2 1.5B Instruct | 1.5B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Alibaba Nlp Gte Qwen2 1.5B Instruct | 1.5B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 3GB (have 24GB) |
| Allenai Olmo 2 0425 1B | 1B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Allenai Olmo 2 0425 1B | 1B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Allenai Olmo 2 0425 1B | 1B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 2GB (have 24GB) |
| Allenai Olmo 3 7B Think | 7B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Allenai Olmo 3 7B Think | 7B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Allenai Olmo 3 7B Think | 7B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Apple Openelm 1 1B Instruct | 1B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Apple Openelm 1 1B Instruct | 1B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Apple Openelm 1 1B Instruct | 1B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 3GB (have 24GB) |
| Baichuan Inc Baichuan M2 32B | 32B | Q4 | Fits comfortably | 63.00 tok/sEstimated | 16GB (have 24GB) |
| Baichuan Inc Baichuan M2 32B | 32B | Q8 | Not supported | 44.10 tok/sEstimated | 32GB (have 24GB) |
| Baichuan Inc Baichuan M2 32B | 32B | FP16 | Not supported | 23.94 tok/sEstimated | 64GB (have 24GB) |
| Bigcode Starcoder2 3B | 3B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 2GB (have 24GB) |
| Bigcode Starcoder2 3B | 3B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 3GB (have 24GB) |
| Bigcode Starcoder2 3B | 3B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 6GB (have 24GB) |
| Bigscience Bloomz 560M | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Bigscience Bloomz 560M | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Bigscience Bloomz 560M | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 2GB (have 24GB) |
| Black Forest Labs Flux 1 Dev | Unknown | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Black Forest Labs Flux 1 Dev | Unknown | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Black Forest Labs Flux 1 Dev | Unknown | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Black Forest Labs Flux 2 Dev | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Black Forest Labs Flux 2 Dev | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Black Forest Labs Flux 2 Dev | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 4GB (have 24GB) |
| Bsc Lt Salamandrata 7B Instruct | 7B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Bsc Lt Salamandrata 7B Instruct | 7B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Bsc Lt Salamandrata 7B Instruct | 7B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Codellama Codellama 34B HF | 34B | Q4 | Fits comfortably | 63.00 tok/sEstimated | 17GB (have 24GB) |
| Codellama Codellama 34B HF | 34B | Q8 | Not supported | 44.10 tok/sEstimated | 34GB (have 24GB) |
| Codellama Codellama 34B HF | 34B | FP16 | Not supported | 23.94 tok/sEstimated | 68GB (have 24GB) |
| Context Labs Meta Llama Llama 3.2 3B Instruct FP16 | 3B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 2GB (have 24GB) |
| Context Labs Meta Llama Llama 3.2 3B Instruct FP16 | 3B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 3GB (have 24GB) |
| Context Labs Meta Llama Llama 3.2 3B Instruct FP16 | 3B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 6GB (have 24GB) |
| Deepseek AI Deepseek Coder 1.3B Instruct | 1.3B | Q4 | Fits comfortably | 270.00 tok/sEstimated | 1GB (have 24GB) |
| Deepseek AI Deepseek Coder 1.3B Instruct | 1.3B | Q8 | Fits comfortably | 189.00 tok/sEstimated | 2GB (have 24GB) |
| Deepseek AI Deepseek Coder 1.3B Instruct | 1.3B | FP16 | Fits comfortably | 102.60 tok/sEstimated | 3GB (have 24GB) |
| Deepseek AI Deepseek Coder 33B Instruct | 33B | Q4 | Fits comfortably | 78.75 tok/sEstimated | 17GB (have 24GB) |
| Deepseek AI Deepseek Coder 33B Instruct | 33B | Q8 | Not supported | 55.12 tok/sEstimated | 33GB (have 24GB) |
| Deepseek AI Deepseek Coder 33B Instruct | 33B | FP16 | Not supported | 29.92 tok/sEstimated | 66GB (have 24GB) |
| Deepseek AI Deepseek Coder V2 Instruct 0724 | Unknown | Q4 | Not supported | 45.00 tok/sEstimated | 36GB (have 24GB) |
| Deepseek AI Deepseek Coder V2 Instruct 0724 | Unknown | Q8 | Not supported | 31.50 tok/sEstimated | 72GB (have 24GB) |
| Deepseek AI Deepseek Coder V2 Instruct 0724 | Unknown | FP16 | Not supported | 17.10 tok/sEstimated | 144GB (have 24GB) |
| Deepseek AI Deepseek Coder V2 Lite Instruct | Unknown | Q4 | Fits comfortably | 270.00 tok/sEstimated | 1GB (have 24GB) |
| Deepseek AI Deepseek Coder V2 Lite Instruct | Unknown | Q8 | Fits comfortably | 189.00 tok/sEstimated | 2GB (have 24GB) |
| Deepseek AI Deepseek Coder V2 Lite Instruct | Unknown | FP16 | Fits comfortably | 102.60 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek Math V2 | Unknown | Q4 | Fits comfortably | 270.00 tok/sEstimated | 1GB (have 24GB) |
| Deepseek AI Deepseek Math V2 | Unknown | Q8 | Fits comfortably | 189.00 tok/sEstimated | 2GB (have 24GB) |
| Deepseek AI Deepseek Math V2 | Unknown | FP16 | Fits comfortably | 102.60 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek Ocr | Unknown | Q4 | Fits comfortably | 225.00 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek Ocr | Unknown | Q8 | Fits comfortably | 157.50 tok/sEstimated | 7GB (have 24GB) |
| Deepseek AI Deepseek Ocr | Unknown | FP16 | Fits comfortably | 85.50 tok/sEstimated | 14GB (have 24GB) |
| Deepseek AI Deepseek Ocr 2 | Unknown | Q4 | Fits comfortably | 270.00 tok/sEstimated | 1GB (have 24GB) |
| Deepseek AI Deepseek Ocr 2 | Unknown | Q8 | Fits comfortably | 189.00 tok/sEstimated | 2GB (have 24GB) |
| Deepseek AI Deepseek Ocr 2 | Unknown | FP16 | Fits comfortably | 102.60 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek R1 | Unknown | Q4 | Fits comfortably | 225.00 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek R1 | Unknown | Q8 | Fits comfortably | 157.50 tok/sEstimated | 7GB (have 24GB) |
| Deepseek AI Deepseek R1 | Unknown | FP16 | Fits comfortably | 85.50 tok/sEstimated | 14GB (have 24GB) |
| Deepseek AI Deepseek R1 0528 | Unknown | Q4 | Fits comfortably | 225.00 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek R1 0528 | Unknown | Q8 | Fits comfortably | 157.50 tok/sEstimated | 8GB (have 24GB) |
| Deepseek AI Deepseek R1 0528 | Unknown | FP16 | Fits comfortably | 85.50 tok/sEstimated | 16GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Llama 8B | 8B | Q4 | Fits comfortably | 225.00 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Llama 8B | 8B | Q8 | Fits comfortably | 157.50 tok/sEstimated | 8GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Llama 8B | 8B | FP16 | Fits comfortably | 85.50 tok/sEstimated | 16GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 1.5B | 1.5B | Q4 | Fits comfortably | 270.00 tok/sEstimated | 1GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 1.5B | 1.5B | Q8 | Fits comfortably | 189.00 tok/sEstimated | 2GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 1.5B | 1.5B | FP16 | Fits comfortably | 102.60 tok/sEstimated | 3GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 32B | 32B | Q4 | Fits comfortably | 78.75 tok/sEstimated | 16GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 32B | 32B | Q8 | Not supported | 55.12 tok/sEstimated | 32GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 32B | 32B | FP16 | Not supported | 29.92 tok/sEstimated | 64GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 7B | 7B | Q4 | Fits comfortably | 225.00 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 7B | 7B | Q8 | Fits comfortably | 157.50 tok/sEstimated | 7GB (have 24GB) |
| Deepseek AI Deepseek R1 Distill Qwen 7B | 7B | FP16 | Fits comfortably | 85.50 tok/sEstimated | 14GB (have 24GB) |
| Deepseek AI Deepseek V2 5 | Unknown | Q4 | Fits comfortably | 270.00 tok/sEstimated | 1GB (have 24GB) |
| Deepseek AI Deepseek V2 5 | Unknown | Q8 | Fits comfortably | 189.00 tok/sEstimated | 2GB (have 24GB) |
| Deepseek AI Deepseek V2 5 | Unknown | FP16 | Fits comfortably | 102.60 tok/sEstimated | 4GB (have 24GB) |
| Deepseek AI Deepseek V3 | Unknown | Q4 | Fits comfortably | 270.00 tok/sEstimated | 2GB (have 24GB) |
| Deepseek AI Deepseek V3 | Unknown | Q8 | Fits comfortably | 189.00 tok/sEstimated | 3GB (have 24GB) |
| Deepseek AI Deepseek V3 | Unknown | FP16 | Fits comfortably | 102.60 tok/sEstimated | 6GB (have 24GB) |
| Deepseek AI Deepseek V3 0324 | Unknown | Q4 | Fits comfortably | 78.75 tok/sEstimated | 16GB (have 24GB) |
| Deepseek AI Deepseek V3 0324 | Unknown | Q8 | Not supported | 55.12 tok/sEstimated | 32GB (have 24GB) |
| Deepseek AI Deepseek V3 0324 | Unknown | FP16 | Not supported | 29.92 tok/sEstimated | 64GB (have 24GB) |
| Deepseek AI Deepseek V3.1 | Unknown | Q4 | Fits comfortably | 270.00 tok/sEstimated | 2GB (have 24GB) |
| Deepseek AI Deepseek V3.1 | Unknown | Q8 | Fits comfortably | 189.00 tok/sEstimated | 3GB (have 24GB) |
| Deepseek AI Deepseek V3.1 | Unknown | FP16 | Fits comfortably | 102.60 tok/sEstimated | 6GB (have 24GB) |
| Dicta Il Dictalm2.0 Instruct | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Dicta Il Dictalm2.0 Instruct | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Dicta Il Dictalm2.0 Instruct | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 4GB (have 24GB) |
| Distilbert Distilgpt2 | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Distilbert Distilgpt2 | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Distilbert Distilgpt2 | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 4GB (have 24GB) |
| Dphn Dolphin 2.9.1 Yi 1.5 34B | 34B | Q4 | Fits comfortably | 63.00 tok/sEstimated | 17GB (have 24GB) |
| Dphn Dolphin 2.9.1 Yi 1.5 34B | 34B | Q8 | Not supported | 44.10 tok/sEstimated | 34GB (have 24GB) |
| Dphn Dolphin 2.9.1 Yi 1.5 34B | 34B | FP16 | Not supported | 23.94 tok/sEstimated | 68GB (have 24GB) |
| Eleutherai Gpt Neo 125M | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Eleutherai Gpt Neo 125M | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Eleutherai Gpt Neo 125M | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Eleutherai Pythia 70M Deduped | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Eleutherai Pythia 70M Deduped | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Eleutherai Pythia 70M Deduped | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Essentialai Rnj 1 | Unknown | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Essentialai Rnj 1 | Unknown | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Essentialai Rnj 1 | Unknown | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Facebook Opt 125M | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Facebook Opt 125M | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Facebook Opt 125M | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Facebook Sam3 | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 2GB (have 24GB) |
| Facebook Sam3 | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 3GB (have 24GB) |
| Facebook Sam3 | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 6GB (have 24GB) |
| Fireredteam Firered Image Edit 1 0 | Unknown | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Fireredteam Firered Image Edit 1 0 | Unknown | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Fireredteam Firered Image Edit 1 0 | Unknown | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Gensyn Qwen2.5 0.5B Instruct | 0.5B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Gensyn Qwen2.5 0.5B Instruct | 0.5B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Gensyn Qwen2.5 0.5B Instruct | 0.5B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Google Bert Bert Base Uncased | Unknown | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Google Bert Bert Base Uncased | Unknown | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Google Bert Bert Base Uncased | Unknown | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Google Embeddinggemma 300M | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Google Embeddinggemma 300M | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Google Embeddinggemma 300M | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Google Gemma 2 27B It | 27B | Q4 | Fits comfortably | 99.00 tok/sEstimated | 14GB (have 24GB) |
| Google Gemma 2 27B It | 27B | Q8 | Not supported | 69.30 tok/sEstimated | 27GB (have 24GB) |
| Google Gemma 2 27B It | 27B | FP16 | Not supported | 37.62 tok/sEstimated | 54GB (have 24GB) |
| Google Gemma 2 2B It | 2B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Google Gemma 2 2B It | 2B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Google Gemma 2 2B It | 2B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 4GB (have 24GB) |
| Google Gemma 2 9B It | 9B | Q4 | Fits comfortably | 135.00 tok/sEstimated | 5GB (have 24GB) |
| Google Gemma 2 9B It | 9B | Q8 | Fits comfortably | 94.50 tok/sEstimated | 9GB (have 24GB) |
| Google Gemma 2 9B It | 9B | FP16 | Fits comfortably | 51.30 tok/sEstimated | 18GB (have 24GB) |
| Google Gemma 2B | 2B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Google Gemma 2B | 2B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Google Gemma 2B | 2B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 4GB (have 24GB) |
| Google Gemma 3 1B It | 1B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Google Gemma 3 1B It | 1B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Google Gemma 3 1B It | 1B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 2GB (have 24GB) |
| Google Gemma 3 270M It | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Google Gemma 3 270M It | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Google Gemma 3 270M It | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Google T5 T5 3B | 3B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 2GB (have 24GB) |
| Google T5 T5 3B | 3B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 3GB (have 24GB) |
| Google T5 T5 3B | 3B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 6GB (have 24GB) |
| Gsai Ml Llada 8B Base | 8B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Gsai Ml Llada 8B Base | 8B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 8GB (have 24GB) |
| Gsai Ml Llada 8B Base | 8B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 16GB (have 24GB) |
| Gsai Ml Llada 8B Instruct | 8B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Gsai Ml Llada 8B Instruct | 8B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 8GB (have 24GB) |
| Gsai Ml Llada 8B Instruct | 8B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 16GB (have 24GB) |
| Hmellor Tiny Random Llamaforcausallm | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Hmellor Tiny Random Llamaforcausallm | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Hmellor Tiny Random Llamaforcausallm | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Huggingfaceh4 Zephyr 7B Beta | 7B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Huggingfaceh4 Zephyr 7B Beta | 7B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Huggingfaceh4 Zephyr 7B Beta | 7B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Huggingfacem4 Tiny Random Llamaforcausallm | Unknown | Q4 | Fits comfortably | 180.00 tok/sEstimated | 2GB (have 24GB) |
| Huggingfacem4 Tiny Random Llamaforcausallm | Unknown | Q8 | Fits comfortably | 126.00 tok/sEstimated | 4GB (have 24GB) |
| Huggingfacem4 Tiny Random Llamaforcausallm | Unknown | FP16 | Fits comfortably | 68.40 tok/sEstimated | 8GB (have 24GB) |
| Huggingfacetb Smollm 135M | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Huggingfacetb Smollm 135M | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Huggingfacetb Smollm 135M | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Huggingfacetb Smollm2 135M | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Huggingfacetb Smollm2 135M | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Huggingfacetb Smollm2 135M | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Huggyllama Llama 7B | 7B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Huggyllama Llama 7B | 7B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Huggyllama Llama 7B | 7B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Ibm Granite Granite 3.3 2B Instruct | 2B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Ibm Granite Granite 3.3 2B Instruct | 2B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Ibm Granite Granite 3.3 2B Instruct | 2B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 4GB (have 24GB) |
| Ibm Granite Granite 3.3 8B Instruct | 8B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Ibm Granite Granite 3.3 8B Instruct | 8B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 8GB (have 24GB) |
| Ibm Granite Granite 3.3 8B Instruct | 8B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 16GB (have 24GB) |
| Ibm Granite Granite Docling 258M | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Ibm Granite Granite Docling 258M | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 1GB (have 24GB) |
| Ibm Granite Granite Docling 258M | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 1GB (have 24GB) |
| Ibm Research Powermoe 3B | 3B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 2GB (have 24GB) |
| Ibm Research Powermoe 3B | 3B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 3GB (have 24GB) |
| Ibm Research Powermoe 3B | 3B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 6GB (have 24GB) |
| Ilyagusev Saiga Llama3 8B | 8B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 2GB (have 24GB) |
| Ilyagusev Saiga Llama3 8B | 8B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 4GB (have 24GB) |
| Ilyagusev Saiga Llama3 8B | 8B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 8GB (have 24GB) |
| Inference Net Schematron 3B | 3B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 2GB (have 24GB) |
| Inference Net Schematron 3B | 3B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 3GB (have 24GB) |
| Inference Net Schematron 3B | 3B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 6GB (have 24GB) |
| Kaitchup Phi 3 Mini 4K Instruct Gptq 4bit | Unknown | Q4 | Fits comfortably | 180.00 tok/sEstimated | 2GB (have 24GB) |
| Kaitchup Phi 3 Mini 4K Instruct Gptq 4bit | Unknown | Q8 | Fits comfortably | 126.00 tok/sEstimated | 4GB (have 24GB) |
| Kaitchup Phi 3 Mini 4K Instruct Gptq 4bit | Unknown | FP16 | Fits comfortably | 68.40 tok/sEstimated | 8GB (have 24GB) |
| Liquidai Lfm2 1.2B | 1.2B | Q4 | Fits comfortably | 216.00 tok/sEstimated | 1GB (have 24GB) |
| Liquidai Lfm2 1.2B | 1.2B | Q8 | Fits comfortably | 151.20 tok/sEstimated | 2GB (have 24GB) |
| Liquidai Lfm2 1.2B | 1.2B | FP16 | Fits comfortably | 82.08 tok/sEstimated | 3GB (have 24GB) |
| Liuhaotian Llava V1.5 7B | 7B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Liuhaotian Llava V1.5 7B | 7B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Liuhaotian Llava V1.5 7B | 7B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Llamafactory Tiny Random Llama 3 | Unknown | Q4 | Fits comfortably | 216.00 tok/sEstimated | 2GB (have 24GB) |
| Llamafactory Tiny Random Llama 3 | Unknown | Q8 | Fits comfortably | 151.20 tok/sEstimated | 3GB (have 24GB) |
| Llamafactory Tiny Random Llama 3 | Unknown | FP16 | Fits comfortably | 82.08 tok/sEstimated | 6GB (have 24GB) |
| Lmstudio Community Deepseek R1 0528 Qwen3 8B Mlx 4bit | 8B | Q4 | Fits comfortably | 225.00 tok/sEstimated | 4GB (have 24GB) |
| Lmstudio Community Deepseek R1 0528 Qwen3 8B Mlx 4bit | 8B | Q8 | Fits comfortably | 157.50 tok/sEstimated | 8GB (have 24GB) |
| Lmstudio Community Deepseek R1 0528 Qwen3 8B Mlx 4bit | 8B | FP16 | Fits comfortably | 85.50 tok/sEstimated | 16GB (have 24GB) |
| Lmstudio Community Deepseek R1 0528 Qwen3 8B Mlx 8bit | 8B | Q4 | Fits comfortably | 225.00 tok/sEstimated | 4GB (have 24GB) |
| Lmstudio Community Deepseek R1 0528 Qwen3 8B Mlx 8bit | 8B | Q8 | Fits comfortably | 157.50 tok/sEstimated | 8GB (have 24GB) |
| Lmstudio Community Deepseek R1 0528 Qwen3 8B Mlx 8bit | 8B | FP16 | Fits comfortably | 85.50 tok/sEstimated | 16GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 4bit | 4B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 2GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 4bit | 4B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 4GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 4bit | 4B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 8GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 6bit | 4B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 2GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 6bit | 4B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 4GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 6bit | 4B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 8GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 8bit | 4B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 2GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 8bit | 4B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 4GB (have 24GB) |
| Lmstudio Community Qwen3 4B Thinking 2507 Mlx 8bit | 4B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 8GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 4bit | 30B | Q4 | Fits comfortably | 99.00 tok/sEstimated | 15GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 4bit | 30B | Q8 | Not supported | 69.30 tok/sEstimated | 30GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 4bit | 30B | FP16 | Not supported | 37.62 tok/sEstimated | 60GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 5bit | 30B | Q4 | Fits comfortably | 99.00 tok/sEstimated | 15GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 5bit | 30B | Q8 | Not supported | 69.30 tok/sEstimated | 30GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 5bit | 30B | FP16 | Not supported | 37.62 tok/sEstimated | 60GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 6bit | 30B | Q4 | Fits comfortably | 99.00 tok/sEstimated | 15GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 6bit | 30B | Q8 | Not supported | 69.30 tok/sEstimated | 30GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 6bit | 30B | FP16 | Not supported | 37.62 tok/sEstimated | 60GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 8bit | 30B | Q4 | Fits comfortably | 99.00 tok/sEstimated | 15GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 8bit | 30B | Q8 | Not supported | 69.30 tok/sEstimated | 30GB (have 24GB) |
| Lmstudio Community Qwen3 Coder 30B A3b Instruct Mlx 8bit | 30B | FP16 | Not supported | 37.62 tok/sEstimated | 60GB (have 24GB) |
| Lmsys Vicuna 7B V1.5 | 7B | Q4 | Fits comfortably | 180.00 tok/sEstimated | 4GB (have 24GB) |
| Lmsys Vicuna 7B V1.5 | 7B | Q8 | Fits comfortably | 126.00 tok/sEstimated | 7GB (have 24GB) |
| Lmsys Vicuna 7B V1.5 | 7B | FP16 | Fits comfortably | 68.40 tok/sEstimated | 14GB (have 24GB) |
| Meta Llama Llama 2 13B Chat HF | 13B | Q4 | Fits comfortably | 135.00 tok/sEstimated | 7GB (have 24GB) |
| Meta Llama Llama 2 13B Chat HF | 13B | Q8 | Fits comfortably | 94.50 tok/sEstimated | 13GB (have 24GB) |
| Meta Llama Llama 2 13B Chat HF | 13B | FP16 | Not supported | 51.30 tok/sEstimated | 26GB (have 24GB) |
Open direct compatibility pages for this GPU with VRAM fit and estimated speed.
Static benchmark coverage for xgen-universe-capybara on rtx-4090.
Static benchmark coverage for nineninesix-kani-tts-2-en on rtx-4090.
Static benchmark coverage for fireredteam-firered-image-edit-1-0 on rtx-4090.
Static benchmark coverage for nanbeige-nanbeige4-1-3b on rtx-4090.
Static benchmark coverage for zai-org-glm-5 on rtx-4090.
Static benchmark coverage for minimaxai-minimax-m2-5 on rtx-4090.
Static benchmark coverage for qwen-qwen3-tts-12hz-1-7b-customvoice on rtx-4090.
Static benchmark coverage for minimaxai-minimax-m2-1 on rtx-4090.
Static benchmark coverage for microsoft-vibevoice-asr on rtx-4090.
Static benchmark coverage for zai-org-glm-ocr on rtx-4090.
Static benchmark coverage for zai-org-glm-4-7-flash on rtx-4090.
Static benchmark coverage for deepseek-ai-deepseek-ocr-2 on rtx-4090.
Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data
Buy directly on Amazon with fast shipping and reliable customer service.
Essential accessories to pair with RTX 4090
Total Bundle Price
All items from Amazon
💡 Not ready to buy? Try cloud GPUs first
Test RTX 4090 performance in the cloud before investing in hardware. Pay by the hour with no commitment.
Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.
Community llama.cpp benchmarks of the ubergarm/Qwen3-30B-A3B-GGUF build show the RTX 4090 sustaining roughly 150–160 tokens/sec with CUDA kernels, keeping decode latency under 7 ms per token.
Source: Reddit – /r/LocalLLaMA (mq59v1k)
No. Builders loading Llama 3.1 70B Q4_K_M report roughly half the tensor pages spilling to system RAM on a 24 GB 4090, which drags throughput because PCIe becomes the bottleneck. Multi-GPU setups or 48 GB cards avoid the spill.
Source: Reddit – /r/LocalLLaMA (mqcouez)
Power users running multi-4090 racks note that a single 4090 comfortably hosts one 32B-class model; parallel agents or MoE workloads need tensor parallelism across multiple GPUs to keep speeds high.
Source: Reddit – /r/LocalLLaMA (mqwkgv3)
NVIDIA rates the RTX 4090 at 450 W board power and recommends at least an 850 W PSU with the 16-pin 12VHPWR connector to maintain headroom for AI workloads.
Source: TechPowerUp – RTX 4090 Specs
Our price tracker (Nov 2025) shows Amazon at $1,599 in stock.
Source: Supabase price tracker snapshot – 2025-11-03
Explore how RTX 4080 stacks up for local inference workloads.
Explore how RTX 4070 Ti stacks up for local inference workloads.
Explore how RTX 3090 stacks up for local inference workloads.
Explore how RX 7900 XTX stacks up for local inference workloads.
Explore how RTX 4070 stacks up for local inference workloads.
RPG • 2020
RPG • 2023
Action RPG • 2023
RPG • 2023
Survival Horror • 2023
Action RPG • 2022
Action RPG • 2024
Action Adventure • 2025
Survival Horror • 2023
Action • 2022
Action Adventure • 2023
Action Adventure • 2019