Quick Answer: RTX 4090 offers 24GB VRAM and starts around current market pricing. It delivers approximately 237 tokens/sec on meta-llama/Llama-3.2-1B. It typically draws 450W under load.

RTX 4090

Unknown

By NVIDIAReleased 2022-10MSRP $1,599.00

RTX 4090 remains the go-to GPU for local AI workloads. It runs every mainstream 70B model, sustains the fastest consumer inference speeds, and anchors premium builds that scale to production deployments.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM24GB

Cores16,384

TDP450W

ArchitectureAda Lovelace

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RTX 4090 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-1B	Q4	236.93 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	236.31 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	234.17 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	233.28 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	232.46 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	231.79 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	231.78 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	231.53 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	230.47 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	229.75 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	229.43 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	226.87 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	225.67 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	224.35 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	223.94 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	222.41 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	220.50 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	212.85 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	211.46 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	208.71 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	206.65 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	206.11 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	205.12 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	204.34 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	204.34 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	200.86 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	200.35 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	200.19 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	199.86 tok/sEstimated Auto-generated benchmark	2GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	197.47 tok/sEstimated Auto-generated benchmark	3GB
MiniMaxAI/MiniMax-M2	Q4	197.36 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	197.28 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B	Q4	197.19 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-8B	Q4	197.13 tok/sEstimated Auto-generated benchmark	4GB
allenai/OLMo-2-0425-1B	Q4	196.96 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-0.6B-Base	Q4	196.79 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	196.43 tok/sEstimated Auto-generated benchmark	2GB
openai-community/gpt2	Q4	196.28 tok/sEstimated Auto-generated benchmark	4GB
dicta-il/dictalm2.0-instruct	Q4	196.11 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-xl	Q4	195.90 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	195.68 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	195.65 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	194.82 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-1B-Instruct	Q4	194.55 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-7B	Q4	193.74 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	193.51 tok/sEstimated Auto-generated benchmark	4GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	193.22 tok/sEstimated Auto-generated benchmark	4GB
black-forest-labs/FLUX.2-dev	Q4	193.09 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B	Q4	192.14 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B	Q4	191.89 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-3.2-1B

1GB

236.93 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

236.31 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

234.17 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

233.28 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

232.46 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

231.79 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

231.78 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

231.53 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

230.47 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

229.75 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

229.43 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

226.87 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

225.67 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

224.35 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

223.94 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

222.41 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

220.50 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

212.85 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

211.46 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

208.71 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

206.65 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

206.11 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

205.12 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

204.34 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

204.34 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

200.86 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

200.35 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

200.19 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

199.86 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

197.47 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

4GB

197.36 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

197.28 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

197.19 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

4GB

197.13 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

196.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

196.79 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

196.43 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

196.28 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

4GB

196.11 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

195.90 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

195.68 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

195.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

194.82 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

194.55 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

4GB

193.74 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

193.51 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

193.22 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.2-dev

4GB

193.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

192.14 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

191.89 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
microsoft/Phi-3.5-mini-instruct	Q4	Fits comfortably	178.16 tok/sEstimated	2GB (have 24GB)
facebook/sam3	Q8	Fits comfortably	151.19 tok/sEstimated	1GB (have 24GB)
AI-MO/Kimina-Prover-72B	FP16	Not supported	14.46 tok/sEstimated	141GB (have 24GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q4	Fits comfortably	89.81 tok/sEstimated	15GB (have 24GB)
ai-forever/ruGPT-3.5-13B	Q4	Fits comfortably	132.50 tok/sEstimated	7GB (have 24GB)
Qwen/Qwen2.5-72B-Instruct	Q4	Not supported	34.18 tok/sEstimated	36GB (have 24GB)
ibm-research/PowerMoE-3b	Q8	Fits comfortably	144.77 tok/sEstimated	3GB (have 24GB)
IlyaGusev/saiga_llama3_8b	Q4	Fits comfortably	169.42 tok/sEstimated	4GB (have 24GB)
Qwen/Qwen2-7B-Instruct	FP16	Fits comfortably	62.57 tok/sEstimated	15GB (have 24GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	178.74 tok/sEstimated	2GB (have 24GB)
Qwen/Qwen3-4B-Base	Q8	Fits comfortably	123.29 tok/sEstimated	4GB (have 24GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	140.37 tok/sEstimated	7GB (have 24GB)
Qwen/Qwen2.5-14B	Q8	Fits comfortably	93.37 tok/sEstimated	14GB (have 24GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	Fits comfortably	184.39 tok/sEstimated	4GB (have 24GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	Fits comfortably	121.22 tok/sEstimated	7GB (have 24GB)
apple/OpenELM-1_1B-Instruct	FP16	Fits comfortably	82.25 tok/sEstimated	2GB (have 24GB)
AI-MO/Kimina-Prover-72B	Q4	Not supported	34.32 tok/sEstimated	35GB (have 24GB)
moonshotai/Kimi-K2-Thinking	Q8	Not supported	45.69 tok/sEstimated	978GB (have 24GB)
moonshotai/Kimi-K2-Thinking	FP16	Not supported	23.80 tok/sEstimated	1956GB (have 24GB)
deepseek-ai/DeepSeek-Math-V2	Q8	Not supported	18.46 tok/sEstimated	766GB (have 24GB)
deepseek-ai/DeepSeek-Math-V2	FP16	Not supported	10.24 tok/sEstimated	1532GB (have 24GB)
Tongyi-MAI/Z-Image-Turbo	Q4	Fits comfortably	178.19 tok/sEstimated	4GB (have 24GB)
Tongyi-MAI/Z-Image-Turbo	Q8	Fits comfortably	127.07 tok/sEstimated	8GB (have 24GB)
Tongyi-MAI/Z-Image-Turbo	FP16	Fits comfortably	74.91 tok/sEstimated	16GB (have 24GB)
tencent/HunyuanOCR	Q8	Fits comfortably	164.42 tok/sEstimated	2GB (have 24GB)
facebook/sam3	FP16	Fits comfortably	88.62 tok/sEstimated	2GB (have 24GB)
MiniMaxAI/MiniMax-VL-01	Q4	Not supported	22.57 tok/sEstimated	256GB (have 24GB)
MiniMaxAI/MiniMax-VL-01	Q8	Not supported	13.66 tok/sEstimated	511GB (have 24GB)
MiniMaxAI/MiniMax-VL-01	FP16	Not supported	7.41 tok/sEstimated	1021GB (have 24GB)
MiniMaxAI/MiniMax-M1-40k	Q4	Not supported	20.51 tok/sEstimated	255GB (have 24GB)
MiniMaxAI/MiniMax-M1-40k	Q8	Not supported	16.20 tok/sEstimated	510GB (have 24GB)
MiniMaxAI/MiniMax-M1-40k	FP16	Not supported	8.61 tok/sEstimated	1020GB (have 24GB)
WeiboAI/VibeThinker-1.5B	Q4	Fits comfortably	229.43 tok/sEstimated	1GB (have 24GB)
WeiboAI/VibeThinker-1.5B	Q8	Fits comfortably	162.28 tok/sEstimated	2GB (have 24GB)
WeiboAI/VibeThinker-1.5B	FP16	Fits comfortably	87.59 tok/sEstimated	4GB (have 24GB)
tencent/HunyuanVideo-1.5	Q4	Fits comfortably	174.11 tok/sEstimated	4GB (have 24GB)
tencent/HunyuanVideo-1.5	Q8	Fits comfortably	132.51 tok/sEstimated	8GB (have 24GB)
tencent/HunyuanVideo-1.5	FP16	Fits comfortably	68.66 tok/sEstimated	16GB (have 24GB)
nari-labs/Dia2-2B	Q4	Fits comfortably	225.67 tok/sEstimated	2GB (have 24GB)
nari-labs/Dia2-2B	Q8	Fits comfortably	159.64 tok/sEstimated	3GB (have 24GB)
nari-labs/Dia2-2B	FP16	Fits comfortably	84.00 tok/sEstimated	5GB (have 24GB)
unsloth/Llama-3.2-1B-Instruct	Q4	Fits comfortably	208.71 tok/sEstimated	1GB (have 24GB)
unsloth/Llama-3.2-1B-Instruct	Q8	Fits comfortably	145.22 tok/sEstimated	1GB (have 24GB)
unsloth/Llama-3.2-1B-Instruct	FP16	Fits comfortably	86.07 tok/sEstimated	2GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	183.82 tok/sEstimated	4GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	Fits comfortably	133.08 tok/sEstimated	9GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	FP16	Fits comfortably	68.16 tok/sEstimated	17GB (have 24GB)
Qwen/Qwen3-235B-A22B	Q4	Not supported	22.33 tok/sEstimated	115GB (have 24GB)
ibm-granite/granite-docling-258M	FP16	Fits comfortably	74.85 tok/sEstimated	15GB (have 24GB)
google/gemma-2-9b-it	FP16	Fits comfortably	47.99 tok/sEstimated	20GB (have 24GB)

microsoft/Phi-3.5-mini-instructQ4

Fits comfortably2GB required · 24GB available

178.16 tok/sEstimated

facebook/sam3Q8

Fits comfortably1GB required · 24GB available

151.19 tok/sEstimated

AI-MO/Kimina-Prover-72BFP16

Not supported141GB required · 24GB available

14.46 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ4

Fits comfortably15GB required · 24GB available

89.81 tok/sEstimated

ai-forever/ruGPT-3.5-13BQ4

Fits comfortably7GB required · 24GB available

132.50 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ4

Not supported36GB required · 24GB available

34.18 tok/sEstimated

ibm-research/PowerMoE-3bQ8

Fits comfortably3GB required · 24GB available

144.77 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ4

Fits comfortably4GB required · 24GB available

169.42 tok/sEstimated

Qwen/Qwen2-7B-InstructFP16

Fits comfortably15GB required · 24GB available

62.57 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 24GB available

178.74 tok/sEstimated

Qwen/Qwen3-4B-BaseQ8

Fits comfortably4GB required · 24GB available

123.29 tok/sEstimated

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 24GB available

140.37 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Fits comfortably14GB required · 24GB available

93.37 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4

Fits comfortably4GB required · 24GB available

184.39 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q8

Fits comfortably7GB required · 24GB available

121.22 tok/sEstimated

apple/OpenELM-1_1B-InstructFP16

Fits comfortably2GB required · 24GB available

82.25 tok/sEstimated

AI-MO/Kimina-Prover-72BQ4

Not supported35GB required · 24GB available

34.32 tok/sEstimated

moonshotai/Kimi-K2-ThinkingQ8

Not supported978GB required · 24GB available

45.69 tok/sEstimated

moonshotai/Kimi-K2-ThinkingFP16

Not supported1956GB required · 24GB available

23.80 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2Q8

Not supported766GB required · 24GB available

18.46 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2FP16

Not supported1532GB required · 24GB available

10.24 tok/sEstimated

Tongyi-MAI/Z-Image-TurboQ4

Fits comfortably4GB required · 24GB available

178.19 tok/sEstimated

Tongyi-MAI/Z-Image-TurboQ8

Fits comfortably8GB required · 24GB available

127.07 tok/sEstimated

Tongyi-MAI/Z-Image-TurboFP16

Fits comfortably16GB required · 24GB available

74.91 tok/sEstimated

tencent/HunyuanOCRQ8

Fits comfortably2GB required · 24GB available

164.42 tok/sEstimated

facebook/sam3FP16

Fits comfortably2GB required · 24GB available

88.62 tok/sEstimated

MiniMaxAI/MiniMax-VL-01Q4

Not supported256GB required · 24GB available

22.57 tok/sEstimated

MiniMaxAI/MiniMax-VL-01Q8

Not supported511GB required · 24GB available

13.66 tok/sEstimated

MiniMaxAI/MiniMax-VL-01FP16

Not supported1021GB required · 24GB available

7.41 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kQ4

Not supported255GB required · 24GB available

20.51 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kQ8

Not supported510GB required · 24GB available

16.20 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kFP16

Not supported1020GB required · 24GB available

8.61 tok/sEstimated

WeiboAI/VibeThinker-1.5BQ4

Fits comfortably1GB required · 24GB available

229.43 tok/sEstimated

WeiboAI/VibeThinker-1.5BQ8

Fits comfortably2GB required · 24GB available

162.28 tok/sEstimated

WeiboAI/VibeThinker-1.5BFP16

Fits comfortably4GB required · 24GB available

87.59 tok/sEstimated

tencent/HunyuanVideo-1.5Q4

Fits comfortably4GB required · 24GB available

174.11 tok/sEstimated

tencent/HunyuanVideo-1.5Q8

Fits comfortably8GB required · 24GB available

132.51 tok/sEstimated

tencent/HunyuanVideo-1.5FP16

Fits comfortably16GB required · 24GB available

68.66 tok/sEstimated

nari-labs/Dia2-2BQ4

Fits comfortably2GB required · 24GB available

225.67 tok/sEstimated

nari-labs/Dia2-2BQ8

Fits comfortably3GB required · 24GB available

159.64 tok/sEstimated

nari-labs/Dia2-2BFP16

Fits comfortably5GB required · 24GB available

84.00 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 24GB available

208.71 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 24GB available

145.22 tok/sEstimated

unsloth/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 24GB available

86.07 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 24GB available

183.82 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ8

Fits comfortably9GB required · 24GB available

133.08 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitFP16

Fits comfortably17GB required · 24GB available

68.16 tok/sEstimated

Qwen/Qwen3-235B-A22BQ4

Not supported115GB required · 24GB available

22.33 tok/sEstimated

ibm-granite/granite-docling-258MFP16

Fits comfortably15GB required · 24GB available

74.85 tok/sEstimated

google/gemma-2-9b-itFP16

Fits comfortably20GB required · 24GB available

47.99 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

What throughput does RTX 4090 deliver on modern 30B models?

Community llama.cpp benchmarks of the ubergarm/Qwen3-30B-A3B-GGUF build show the RTX 4090 sustaining roughly 150–160 tokens/sec with CUDA kernels, keeping decode latency under 7 ms per token.

Source: Reddit – /r/LocalLLaMA (mq59v1k)

Can a single RTX 4090 keep Llama 3.1 70B Q4 fully in VRAM?

No. Builders loading Llama 3.1 70B Q4_K_M report roughly half the tensor pages spilling to system RAM on a 24 GB 4090, which drags throughput because PCIe becomes the bottleneck. Multi-GPU setups or 48 GB cards avoid the spill.

Source: Reddit – /r/LocalLLaMA (mqcouez)

How many large models can RTX 4090 run simultaneously?

Power users running multi-4090 racks note that a single 4090 comfortably hosts one 32B-class model; parallel agents or MoE workloads need tensor parallelism across multiple GPUs to keep speeds high.

Source: Reddit – /r/LocalLLaMA (mqwkgv3)

What power supply and connectors does RTX 4090 require?

NVIDIA rates the RTX 4090 at 450 W board power and recommends at least an 850 W PSU with the 16-pin 12VHPWR connector to maintain headroom for AI workloads.

Source: TechPowerUp – RTX 4090 Specs

What is the current street price for RTX 4090?

Our price tracker (Nov 2025) shows Amazon at $1,599 in stock.

Source: Supabase price tracker snapshot – 2025-11-03

Alternative GPUs

RTX 4080

16GB

Explore how RTX 4080 stacks up for local inference workloads.

RTX 4070 Ti

12GB

Explore how RTX 4070 Ti stacks up for local inference workloads.

RTX 3090

24GB

Explore how RTX 3090 stacks up for local inference workloads.

NVIDIA RTX 6000 Ada

48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

RX 7900 XTX

24GB

Explore how RX 7900 XTX stacks up for local inference workloads.

Compare RTX 4090

RTX 4090 vs RTX 4080

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RTX 4090 vs RTX 3090

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

RTX 4090 vs RX 7900 XTX

Side-by-side VRAM, throughput, efficiency, and pricing benchmarks for both GPUs.

Quick Answer: RTX 4090 offers 24GB VRAM and starts around current market pricing. It delivers approximately 237 tokens/sec on meta-llama/Llama-3.2-1B. It typically draws 450W under load.

RTX 4090

Unknown

By NVIDIAReleased 2022-10MSRP $1,599.00

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM24GB

Cores16,384

TDP450W

ArchitectureAda Lovelace

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RTX 4090 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-1B	Q4	236.93 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	236.31 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	234.17 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	233.28 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	232.46 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	231.79 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	231.78 tok/sEstimated Auto-generated benchmark	2GB
unsloth/gemma-3-1b-it	Q4	231.53 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B-Instruct	Q4	230.47 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	229.75 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	229.43 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	226.87 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	225.67 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	224.35 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2-2b-it	Q4	223.94 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	222.41 tok/sEstimated Auto-generated benchmark	1GB
inference-net/Schematron-3B	Q4	220.50 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	212.85 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	211.46 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-1B-Instruct	Q4	208.71 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	206.65 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2b	Q4	206.11 tok/sEstimated Auto-generated benchmark	1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	205.12 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	204.34 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	204.34 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	200.86 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	200.35 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	200.19 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	199.86 tok/sEstimated Auto-generated benchmark	2GB
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	197.47 tok/sEstimated Auto-generated benchmark	3GB
MiniMaxAI/MiniMax-M2	Q4	197.36 tok/sEstimated Auto-generated benchmark	4GB
petals-team/StableBeluga2	Q4	197.28 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B	Q4	197.19 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen3-Embedding-8B	Q4	197.13 tok/sEstimated Auto-generated benchmark	4GB
allenai/OLMo-2-0425-1B	Q4	196.96 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-0.6B-Base	Q4	196.79 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	196.43 tok/sEstimated Auto-generated benchmark	2GB
openai-community/gpt2	Q4	196.28 tok/sEstimated Auto-generated benchmark	4GB
dicta-il/dictalm2.0-instruct	Q4	196.11 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-xl	Q4	195.90 tok/sEstimated Auto-generated benchmark	4GB
zai-org/GLM-4.5-Air	Q4	195.68 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-V3	Q4	195.65 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-7B-Instruct	Q4	194.82 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.2-1B-Instruct	Q4	194.55 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-7B	Q4	193.74 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	193.51 tok/sEstimated Auto-generated benchmark	4GB
swiss-ai/Apertus-8B-Instruct-2509	Q4	193.22 tok/sEstimated Auto-generated benchmark	4GB
black-forest-labs/FLUX.2-dev	Q4	193.09 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B	Q4	192.14 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-1.7B	Q4	191.89 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-3.2-1B

1GB

236.93 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

236.31 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

234.17 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

233.28 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

232.46 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

231.79 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

231.78 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

231.53 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

230.47 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

229.75 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

229.43 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

226.87 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

225.67 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

224.35 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

223.94 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

222.41 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

220.50 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

212.85 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

211.46 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

208.71 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

206.65 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

206.11 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

205.12 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

204.34 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

204.34 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

200.86 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

200.35 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

200.19 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

199.86 tok/sEstimated

Auto-generated benchmark

Alibaba-NLP/gte-Qwen2-1.5B-instruct

3GB

197.47 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

4GB

197.36 tok/sEstimated

Auto-generated benchmark

petals-team/StableBeluga2

4GB

197.28 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B

2GB

197.19 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-8B

4GB

197.13 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

196.96 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B-Base

3GB

196.79 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

196.43 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2

4GB

196.28 tok/sEstimated

Auto-generated benchmark

dicta-il/dictalm2.0-instruct

4GB

196.11 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

195.90 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

195.68 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-V3

4GB

195.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B-Instruct

4GB

194.82 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

194.55 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-7B

4GB

193.74 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

193.51 tok/sEstimated

Auto-generated benchmark

swiss-ai/Apertus-8B-Instruct-2509

4GB

193.22 tok/sEstimated

Auto-generated benchmark

black-forest-labs/FLUX.2-dev

4GB

193.09 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B

4GB

192.14 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B

4GB

191.89 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
microsoft/Phi-3.5-mini-instruct	Q4	Fits comfortably	178.16 tok/sEstimated	2GB (have 24GB)
facebook/sam3	Q8	Fits comfortably	151.19 tok/sEstimated	1GB (have 24GB)
AI-MO/Kimina-Prover-72B	FP16	Not supported	14.46 tok/sEstimated	141GB (have 24GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit	Q4	Fits comfortably	89.81 tok/sEstimated	15GB (have 24GB)
ai-forever/ruGPT-3.5-13B	Q4	Fits comfortably	132.50 tok/sEstimated	7GB (have 24GB)
Qwen/Qwen2.5-72B-Instruct	Q4	Not supported	34.18 tok/sEstimated	36GB (have 24GB)
ibm-research/PowerMoE-3b	Q8	Fits comfortably	144.77 tok/sEstimated	3GB (have 24GB)
IlyaGusev/saiga_llama3_8b	Q4	Fits comfortably	169.42 tok/sEstimated	4GB (have 24GB)
Qwen/Qwen2-7B-Instruct	FP16	Fits comfortably	62.57 tok/sEstimated	15GB (have 24GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	178.74 tok/sEstimated	2GB (have 24GB)
Qwen/Qwen3-4B-Base	Q8	Fits comfortably	123.29 tok/sEstimated	4GB (have 24GB)
Qwen/Qwen2.5-14B	Q4	Fits comfortably	140.37 tok/sEstimated	7GB (have 24GB)
Qwen/Qwen2.5-14B	Q8	Fits comfortably	93.37 tok/sEstimated	14GB (have 24GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q4	Fits comfortably	184.39 tok/sEstimated	4GB (have 24GB)
trl-internal-testing/tiny-LlamaForCausalLM-3.2	Q8	Fits comfortably	121.22 tok/sEstimated	7GB (have 24GB)
apple/OpenELM-1_1B-Instruct	FP16	Fits comfortably	82.25 tok/sEstimated	2GB (have 24GB)
AI-MO/Kimina-Prover-72B	Q4	Not supported	34.32 tok/sEstimated	35GB (have 24GB)
moonshotai/Kimi-K2-Thinking	Q8	Not supported	45.69 tok/sEstimated	978GB (have 24GB)
moonshotai/Kimi-K2-Thinking	FP16	Not supported	23.80 tok/sEstimated	1956GB (have 24GB)
deepseek-ai/DeepSeek-Math-V2	Q8	Not supported	18.46 tok/sEstimated	766GB (have 24GB)
deepseek-ai/DeepSeek-Math-V2	FP16	Not supported	10.24 tok/sEstimated	1532GB (have 24GB)
Tongyi-MAI/Z-Image-Turbo	Q4	Fits comfortably	178.19 tok/sEstimated	4GB (have 24GB)
Tongyi-MAI/Z-Image-Turbo	Q8	Fits comfortably	127.07 tok/sEstimated	8GB (have 24GB)
Tongyi-MAI/Z-Image-Turbo	FP16	Fits comfortably	74.91 tok/sEstimated	16GB (have 24GB)
tencent/HunyuanOCR	Q8	Fits comfortably	164.42 tok/sEstimated	2GB (have 24GB)
facebook/sam3	FP16	Fits comfortably	88.62 tok/sEstimated	2GB (have 24GB)
MiniMaxAI/MiniMax-VL-01	Q4	Not supported	22.57 tok/sEstimated	256GB (have 24GB)
MiniMaxAI/MiniMax-VL-01	Q8	Not supported	13.66 tok/sEstimated	511GB (have 24GB)
MiniMaxAI/MiniMax-VL-01	FP16	Not supported	7.41 tok/sEstimated	1021GB (have 24GB)
MiniMaxAI/MiniMax-M1-40k	Q4	Not supported	20.51 tok/sEstimated	255GB (have 24GB)
MiniMaxAI/MiniMax-M1-40k	Q8	Not supported	16.20 tok/sEstimated	510GB (have 24GB)
MiniMaxAI/MiniMax-M1-40k	FP16	Not supported	8.61 tok/sEstimated	1020GB (have 24GB)
WeiboAI/VibeThinker-1.5B	Q4	Fits comfortably	229.43 tok/sEstimated	1GB (have 24GB)
WeiboAI/VibeThinker-1.5B	Q8	Fits comfortably	162.28 tok/sEstimated	2GB (have 24GB)
WeiboAI/VibeThinker-1.5B	FP16	Fits comfortably	87.59 tok/sEstimated	4GB (have 24GB)
tencent/HunyuanVideo-1.5	Q4	Fits comfortably	174.11 tok/sEstimated	4GB (have 24GB)
tencent/HunyuanVideo-1.5	Q8	Fits comfortably	132.51 tok/sEstimated	8GB (have 24GB)
tencent/HunyuanVideo-1.5	FP16	Fits comfortably	68.66 tok/sEstimated	16GB (have 24GB)
nari-labs/Dia2-2B	Q4	Fits comfortably	225.67 tok/sEstimated	2GB (have 24GB)
nari-labs/Dia2-2B	Q8	Fits comfortably	159.64 tok/sEstimated	3GB (have 24GB)
nari-labs/Dia2-2B	FP16	Fits comfortably	84.00 tok/sEstimated	5GB (have 24GB)
unsloth/Llama-3.2-1B-Instruct	Q4	Fits comfortably	208.71 tok/sEstimated	1GB (have 24GB)
unsloth/Llama-3.2-1B-Instruct	Q8	Fits comfortably	145.22 tok/sEstimated	1GB (have 24GB)
unsloth/Llama-3.2-1B-Instruct	FP16	Fits comfortably	86.07 tok/sEstimated	2GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q4	Fits comfortably	183.82 tok/sEstimated	4GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	Q8	Fits comfortably	133.08 tok/sEstimated	9GB (have 24GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit	FP16	Fits comfortably	68.16 tok/sEstimated	17GB (have 24GB)
Qwen/Qwen3-235B-A22B	Q4	Not supported	22.33 tok/sEstimated	115GB (have 24GB)
ibm-granite/granite-docling-258M	FP16	Fits comfortably	74.85 tok/sEstimated	15GB (have 24GB)
google/gemma-2-9b-it	FP16	Fits comfortably	47.99 tok/sEstimated	20GB (have 24GB)

microsoft/Phi-3.5-mini-instructQ4

Fits comfortably2GB required · 24GB available

178.16 tok/sEstimated

facebook/sam3Q8

Fits comfortably1GB required · 24GB available

151.19 tok/sEstimated

AI-MO/Kimina-Prover-72BFP16

Not supported141GB required · 24GB available

14.46 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bitQ4

Fits comfortably15GB required · 24GB available

89.81 tok/sEstimated

ai-forever/ruGPT-3.5-13BQ4

Fits comfortably7GB required · 24GB available

132.50 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ4

Not supported36GB required · 24GB available

34.18 tok/sEstimated

ibm-research/PowerMoE-3bQ8

Fits comfortably3GB required · 24GB available

144.77 tok/sEstimated

IlyaGusev/saiga_llama3_8bQ4

Fits comfortably4GB required · 24GB available

169.42 tok/sEstimated

Qwen/Qwen2-7B-InstructFP16

Fits comfortably15GB required · 24GB available

62.57 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 24GB available

178.74 tok/sEstimated

Qwen/Qwen3-4B-BaseQ8

Fits comfortably4GB required · 24GB available

123.29 tok/sEstimated

Qwen/Qwen2.5-14BQ4

Fits comfortably7GB required · 24GB available

140.37 tok/sEstimated

Qwen/Qwen2.5-14BQ8

Fits comfortably14GB required · 24GB available

93.37 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4

Fits comfortably4GB required · 24GB available

184.39 tok/sEstimated

trl-internal-testing/tiny-LlamaForCausalLM-3.2Q8

Fits comfortably7GB required · 24GB available

121.22 tok/sEstimated

apple/OpenELM-1_1B-InstructFP16

Fits comfortably2GB required · 24GB available

82.25 tok/sEstimated

AI-MO/Kimina-Prover-72BQ4

Not supported35GB required · 24GB available

34.32 tok/sEstimated

moonshotai/Kimi-K2-ThinkingQ8

Not supported978GB required · 24GB available

45.69 tok/sEstimated

moonshotai/Kimi-K2-ThinkingFP16

Not supported1956GB required · 24GB available

23.80 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2Q8

Not supported766GB required · 24GB available

18.46 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2FP16

Not supported1532GB required · 24GB available

10.24 tok/sEstimated

Tongyi-MAI/Z-Image-TurboQ4

Fits comfortably4GB required · 24GB available

178.19 tok/sEstimated

Tongyi-MAI/Z-Image-TurboQ8

Fits comfortably8GB required · 24GB available

127.07 tok/sEstimated

Tongyi-MAI/Z-Image-TurboFP16

Fits comfortably16GB required · 24GB available

74.91 tok/sEstimated

tencent/HunyuanOCRQ8

Fits comfortably2GB required · 24GB available

164.42 tok/sEstimated

facebook/sam3FP16

Fits comfortably2GB required · 24GB available

88.62 tok/sEstimated

MiniMaxAI/MiniMax-VL-01Q4

Not supported256GB required · 24GB available

22.57 tok/sEstimated

MiniMaxAI/MiniMax-VL-01Q8

Not supported511GB required · 24GB available

13.66 tok/sEstimated

MiniMaxAI/MiniMax-VL-01FP16

Not supported1021GB required · 24GB available

7.41 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kQ4

Not supported255GB required · 24GB available

20.51 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kQ8

Not supported510GB required · 24GB available

16.20 tok/sEstimated

MiniMaxAI/MiniMax-M1-40kFP16

Not supported1020GB required · 24GB available

8.61 tok/sEstimated

WeiboAI/VibeThinker-1.5BQ4

Fits comfortably1GB required · 24GB available

229.43 tok/sEstimated

WeiboAI/VibeThinker-1.5BQ8

Fits comfortably2GB required · 24GB available

162.28 tok/sEstimated

WeiboAI/VibeThinker-1.5BFP16

Fits comfortably4GB required · 24GB available

87.59 tok/sEstimated

tencent/HunyuanVideo-1.5Q4

Fits comfortably4GB required · 24GB available

174.11 tok/sEstimated

tencent/HunyuanVideo-1.5Q8

Fits comfortably8GB required · 24GB available

132.51 tok/sEstimated

tencent/HunyuanVideo-1.5FP16

Fits comfortably16GB required · 24GB available

68.66 tok/sEstimated

nari-labs/Dia2-2BQ4

Fits comfortably2GB required · 24GB available

225.67 tok/sEstimated

nari-labs/Dia2-2BQ8

Fits comfortably3GB required · 24GB available

159.64 tok/sEstimated

nari-labs/Dia2-2BFP16

Fits comfortably5GB required · 24GB available

84.00 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 24GB available

208.71 tok/sEstimated

unsloth/Llama-3.2-1B-InstructQ8

Fits comfortably1GB required · 24GB available

145.22 tok/sEstimated

unsloth/Llama-3.2-1B-InstructFP16

Fits comfortably2GB required · 24GB available

86.07 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ4

Fits comfortably4GB required · 24GB available

183.82 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitQ8

Fits comfortably9GB required · 24GB available

133.08 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bitFP16

Fits comfortably17GB required · 24GB available

68.16 tok/sEstimated

Qwen/Qwen3-235B-A22BQ4

Not supported115GB required · 24GB available

22.33 tok/sEstimated

ibm-granite/granite-docling-258MFP16

Fits comfortably15GB required · 24GB available

74.85 tok/sEstimated

google/gemma-2-9b-itFP16

Fits comfortably20GB required · 24GB available

47.99 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

GPU FAQs

Data-backed answers pulled from community benchmarks, manufacturer specs, and live pricing.

What throughput does RTX 4090 deliver on modern 30B models?

Community llama.cpp benchmarks of the ubergarm/Qwen3-30B-A3B-GGUF build show the RTX 4090 sustaining roughly 150–160 tokens/sec with CUDA kernels, keeping decode latency under 7 ms per token.

Source: Reddit – /r/LocalLLaMA (mq59v1k)

Can a single RTX 4090 keep Llama 3.1 70B Q4 fully in VRAM?

Source: Reddit – /r/LocalLLaMA (mqcouez)

How many large models can RTX 4090 run simultaneously?

Power users running multi-4090 racks note that a single 4090 comfortably hosts one 32B-class model; parallel agents or MoE workloads need tensor parallelism across multiple GPUs to keep speeds high.

Source: Reddit – /r/LocalLLaMA (mqwkgv3)

What power supply and connectors does RTX 4090 require?

NVIDIA rates the RTX 4090 at 450 W board power and recommends at least an 850 W PSU with the 16-pin 12VHPWR connector to maintain headroom for AI workloads.

Source: TechPowerUp – RTX 4090 Specs

What is the current street price for RTX 4090?

Our price tracker (Nov 2025) shows Amazon at $1,599 in stock.

Source: Supabase price tracker snapshot – 2025-11-03