Quick Answer: RX 7800 XT offers 16GB VRAM and starts around $599.99. It delivers approximately 123 tokens/sec on inference-net/Schematron-3B. It typically draws 263W under load.

RX 7800 XT

In Stock

By AMDReleased 2023-09MSRP $499.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $599.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM16GB

Cores3,840

TDP263W

ArchitectureRDNA 3

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonIn Stock

$599.99

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RX 7800 XT performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
inference-net/Schematron-3B	Q4	123.20 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	122.21 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	120.63 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	118.71 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	118.16 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	116.34 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	116.13 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	116.11 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	116.08 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	115.10 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	113.65 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	113.51 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	111.85 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	111.37 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	111.01 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	109.67 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	109.17 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	109.16 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	109.04 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	108.34 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	106.93 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	106.41 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	106.17 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	105.65 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	105.56 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	105.51 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	105.35 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	105.15 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	104.03 tok/sEstimated Auto-generated benchmark	1GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	103.47 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-1.5B-Instruct	Q4	103.07 tok/sEstimated Auto-generated benchmark	3GB
liuhaotian/llava-v1.5-7b	Q4	103.06 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	102.89 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-Guard-3-8B	Q4	102.53 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2-2b-it	Q4	102.40 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-1.7B-Base	Q4	102.38 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	102.31 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	102.20 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-xl	Q4	102.20 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-1b-it	Q4	102.05 tok/sEstimated Auto-generated benchmark	1GB
microsoft/Phi-3.5-mini-instruct	Q4	102.04 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-0.5B	Q4	102.02 tok/sEstimated Auto-generated benchmark	3GB
MiniMaxAI/MiniMax-M2	Q4	101.81 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	101.65 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	101.25 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B-Instruct	Q4	101.11 tok/sEstimated Auto-generated benchmark	3GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	101.03 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Base	Q4	101.00 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	101.00 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	100.89 tok/sEstimated Auto-generated benchmark	2GB

inference-net/Schematron-3B

2GB

123.20 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

122.21 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

120.63 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

118.71 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

118.16 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

116.34 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

116.13 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

116.11 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

116.08 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

115.10 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

113.65 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

113.51 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

111.85 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

111.37 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

111.01 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

109.67 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

109.17 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

109.16 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

109.04 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

108.34 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

106.93 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

106.41 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

106.17 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

105.65 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

105.56 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

105.51 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

105.35 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

105.15 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

104.03 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

4GB

103.47 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

103.07 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

4GB

103.06 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

102.89 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

102.53 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

102.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

102.38 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

102.31 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

102.20 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

102.20 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

102.05 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

102.04 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

102.02 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

4GB

101.81 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

101.65 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

4GB

101.25 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

3GB

101.11 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

101.03 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

101.00 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

101.00 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

100.89 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
google-t5/t5-3b	FP16	Fits comfortably	40.06 tok/sEstimated	6GB (have 16GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	38.94 tok/sEstimated	11GB (have 16GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	101.00 tok/sEstimated	4GB (have 16GB)
meta-llama/Llama-2-7b-hf	Q4	Fits comfortably	99.03 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen2-0.5B	FP16	Fits comfortably	35.42 tok/sEstimated	11GB (have 16GB)
deepseek-ai/DeepSeek-R1-0528	FP16	Fits (tight)	33.81 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Not supported	21.37 tok/sEstimated	33GB (have 16GB)
LiquidAI/LFM2-1.2B	FP16	Fits comfortably	43.91 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q8	Not supported	39.55 tok/sEstimated	31GB (have 16GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	FP16	Not supported	20.75 tok/sEstimated	61GB (have 16GB)
Qwen/Qwen3-30B-A3B	FP16	Not supported	20.88 tok/sEstimated	61GB (have 16GB)
rinna/japanese-gpt-neox-small	Q8	Fits comfortably	68.07 tok/sEstimated	7GB (have 16GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	Fits comfortably	91.84 tok/sEstimated	4GB (have 16GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	Fits comfortably	61.22 tok/sEstimated	7GB (have 16GB)
Qwen/Qwen2-7B-Instruct	Q4	Fits comfortably	88.96 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	101.00 tok/sEstimated	2GB (have 16GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	Fits comfortably	70.98 tok/sEstimated	5GB (have 16GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	Fits comfortably	51.95 tok/sEstimated	10GB (have 16GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	67.89 tok/sEstimated	9GB (have 16GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	FP16	Not supported	37.40 tok/sEstimated	17GB (have 16GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	FP16	Not supported	19.90 tok/sEstimated	61GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	Fits (tight)	55.59 tok/sEstimated	15GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Not supported	32.95 tok/sEstimated	31GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	FP16	Not supported	17.80 tok/sEstimated	61GB (have 16GB)
deepseek-ai/DeepSeek-V3	Q8	Fits comfortably	65.13 tok/sEstimated	7GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q4	Fits (tight)	47.54 tok/sEstimated	15GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q8	Not supported	34.23 tok/sEstimated	31GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	FP16	Not supported	20.24 tok/sEstimated	61GB (have 16GB)
dicta-il/dictalm2.0-instruct	Q4	Fits comfortably	84.97 tok/sEstimated	4GB (have 16GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	68.46 tok/sEstimated	7GB (have 16GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	FP16	Fits (tight)	34.38 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen2.5-72B-Instruct	Q8	Not supported	13.65 tok/sEstimated	71GB (have 16GB)
Qwen/QwQ-32B-Preview	Q8	Not supported	23.20 tok/sEstimated	34GB (have 16GB)
Qwen/QwQ-32B-Preview	FP16	Not supported	12.54 tok/sEstimated	67GB (have 16GB)
mistralai/Mixtral-8x22B-Instruct-v0.1	FP16	Not supported	8.99 tok/sEstimated	275GB (have 16GB)
microsoft/Phi-3.5-mini-instruct	Q4	Fits comfortably	102.04 tok/sEstimated	2GB (have 16GB)
microsoft/Phi-3.5-mini-instruct	Q8	Fits comfortably	60.31 tok/sEstimated	4GB (have 16GB)
microsoft/Phi-3.5-mini-instruct	FP16	Fits comfortably	37.73 tok/sEstimated	8GB (have 16GB)
microsoft/Phi-3-medium-128k-instruct	Q4	Fits comfortably	76.96 tok/sEstimated	7GB (have 16GB)
NousResearch/Hermes-3-Llama-3.1-8B	Q4	Fits comfortably	64.40 tok/sEstimated	4GB (have 16GB)
NousResearch/Hermes-3-Llama-3.1-8B	FP16	Not supported	28.79 tok/sEstimated	17GB (have 16GB)
01-ai/Yi-1.5-34B-Chat	Q4	Not supported	35.02 tok/sEstimated	18GB (have 16GB)
01-ai/Yi-1.5-34B-Chat	Q8	Not supported	21.05 tok/sEstimated	35GB (have 16GB)
moonshotai/Kimi-Linear-48B-A3B-Instruct	FP16	Not supported	11.87 tok/sEstimated	101GB (have 16GB)
moonshotai/Kimi-K2-Thinking	FP16	Not supported	13.11 tok/sEstimated	1956GB (have 16GB)
deepseek-ai/DeepSeek-Math-V2	Q4	Not supported	14.86 tok/sEstimated	383GB (have 16GB)
deepseek-ai/DeepSeek-Math-V2	Q8	Not supported	10.61 tok/sEstimated	766GB (have 16GB)
deepseek-ai/DeepSeek-Math-V2	FP16	Not supported	5.56 tok/sEstimated	1532GB (have 16GB)
tencent/HunyuanOCR	Q8	Fits comfortably	79.68 tok/sEstimated	2GB (have 16GB)
google-t5/t5-3b	Q8	Fits comfortably	77.97 tok/sEstimated	3GB (have 16GB)

google-t5/t5-3bFP16

Fits comfortably6GB required · 16GB available

40.06 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 16GB available

38.94 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 16GB available

101.00 tok/sEstimated

meta-llama/Llama-2-7b-hfQ4

Fits comfortably4GB required · 16GB available

99.03 tok/sEstimated

Qwen/Qwen2-0.5BFP16

Fits comfortably11GB required · 16GB available

35.42 tok/sEstimated

deepseek-ai/DeepSeek-R1-0528FP16

Fits (tight)15GB required · 16GB available

33.81 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Not supported33GB required · 16GB available

21.37 tok/sEstimated

LiquidAI/LFM2-1.2BFP16

Fits comfortably4GB required · 16GB available

43.91 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructQ8

Not supported31GB required · 16GB available

39.55 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructFP16

Not supported61GB required · 16GB available

20.75 tok/sEstimated

Qwen/Qwen3-30B-A3BFP16

Not supported61GB required · 16GB available

20.88 tok/sEstimated

rinna/japanese-gpt-neox-smallQ8

Fits comfortably7GB required · 16GB available

68.07 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ4

Fits comfortably4GB required · 16GB available

91.84 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ8

Fits comfortably7GB required · 16GB available

61.22 tok/sEstimated

Qwen/Qwen2-7B-InstructQ4

Fits comfortably4GB required · 16GB available

88.96 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 16GB available

101.00 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q4

Fits comfortably5GB required · 16GB available

70.98 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q8

Fits comfortably10GB required · 16GB available

51.95 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 16GB available

67.89 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructFP16

Not supported17GB required · 16GB available

37.40 tok/sEstimated

Qwen/Qwen3-30B-A3B-Thinking-2507FP16

Not supported61GB required · 16GB available

19.90 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ4

Fits (tight)15GB required · 16GB available

55.59 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Not supported31GB required · 16GB available

32.95 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitFP16

Not supported61GB required · 16GB available

17.80 tok/sEstimated

deepseek-ai/DeepSeek-V3Q8

Fits comfortably7GB required · 16GB available

65.13 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ4

Fits (tight)15GB required · 16GB available

47.54 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ8

Not supported31GB required · 16GB available

34.23 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitFP16

Not supported61GB required · 16GB available

20.24 tok/sEstimated

dicta-il/dictalm2.0-instructQ4

Fits comfortably4GB required · 16GB available

84.97 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 16GB available

68.46 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMFP16

Fits (tight)15GB required · 16GB available

34.38 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ8

Not supported71GB required · 16GB available

13.65 tok/sEstimated

Qwen/QwQ-32B-PreviewQ8

Not supported34GB required · 16GB available

23.20 tok/sEstimated

Qwen/QwQ-32B-PreviewFP16

Not supported67GB required · 16GB available

12.54 tok/sEstimated

mistralai/Mixtral-8x22B-Instruct-v0.1FP16

Not supported275GB required · 16GB available

8.99 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ4

Fits comfortably2GB required · 16GB available

102.04 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ8

Fits comfortably4GB required · 16GB available

60.31 tok/sEstimated

microsoft/Phi-3.5-mini-instructFP16

Fits comfortably8GB required · 16GB available

37.73 tok/sEstimated

microsoft/Phi-3-medium-128k-instructQ4

Fits comfortably7GB required · 16GB available

76.96 tok/sEstimated

NousResearch/Hermes-3-Llama-3.1-8BQ4

Fits comfortably4GB required · 16GB available

64.40 tok/sEstimated

NousResearch/Hermes-3-Llama-3.1-8BFP16

Not supported17GB required · 16GB available

28.79 tok/sEstimated

01-ai/Yi-1.5-34B-ChatQ4

Not supported18GB required · 16GB available

35.02 tok/sEstimated

01-ai/Yi-1.5-34B-ChatQ8

Not supported35GB required · 16GB available

21.05 tok/sEstimated

moonshotai/Kimi-Linear-48B-A3B-InstructFP16

Not supported101GB required · 16GB available

11.87 tok/sEstimated

moonshotai/Kimi-K2-ThinkingFP16

Not supported1956GB required · 16GB available

13.11 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2Q4

Not supported383GB required · 16GB available

14.86 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2Q8

Not supported766GB required · 16GB available

10.61 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2FP16

Not supported1532GB required · 16GB available

5.56 tok/sEstimated

tencent/HunyuanOCRQ8

Fits comfortably2GB required · 16GB available

79.68 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 16GB available

77.97 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

Quick Answer: RX 7800 XT offers 16GB VRAM and starts around $599.99. It delivers approximately 123 tokens/sec on inference-net/Schematron-3B. It typically draws 263W under load.

RX 7800 XT

In Stock

By AMDReleased 2023-09MSRP $499.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $599.99 View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM16GB

Cores3,840

TDP263W

ArchitectureRDNA 3

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonIn Stock

$599.99

Buy on Amazon

More Amazon options

Rotate out primary variants whenever validation flags an issue.

💡 Not ready to buy? Try cloud GPUs first

Test RX 7800 XT performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
inference-net/Schematron-3B	Q4	123.20 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	122.21 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	120.63 tok/sEstimated Auto-generated benchmark	2GB
google-t5/t5-3b	Q4	118.71 tok/sEstimated Auto-generated benchmark	2GB
facebook/sam3	Q4	118.16 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	116.34 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	116.13 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	116.11 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B-Instruct	Q4	116.08 tok/sEstimated Auto-generated benchmark	2GB
allenai/OLMo-2-0425-1B	Q4	115.10 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	113.65 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	113.51 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	111.85 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	111.37 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	111.01 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	109.67 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	109.17 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	109.16 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	109.04 tok/sEstimated Auto-generated benchmark	1GB
google-bert/bert-base-uncased	Q4	108.34 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	106.93 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-1B	Q4	106.41 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	106.17 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	105.65 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	105.56 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	105.51 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	105.35 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	105.15 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	104.03 tok/sEstimated Auto-generated benchmark	1GB
NousResearch/Meta-Llama-3.1-8B-Instruct	Q4	103.47 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-1.5B-Instruct	Q4	103.07 tok/sEstimated Auto-generated benchmark	3GB
liuhaotian/llava-v1.5-7b	Q4	103.06 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	102.89 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-Guard-3-8B	Q4	102.53 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2-2b-it	Q4	102.40 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen3-1.7B-Base	Q4	102.38 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	102.31 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	Q4	102.20 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-xl	Q4	102.20 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-3-1b-it	Q4	102.05 tok/sEstimated Auto-generated benchmark	1GB
microsoft/Phi-3.5-mini-instruct	Q4	102.04 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-0.5B	Q4	102.02 tok/sEstimated Auto-generated benchmark	3GB
MiniMaxAI/MiniMax-M2	Q4	101.81 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-1.5B	Q4	101.65 tok/sEstimated Auto-generated benchmark	3GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	101.25 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2-0.5B-Instruct	Q4	101.11 tok/sEstimated Auto-generated benchmark	3GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit	Q4	101.03 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Base	Q4	101.00 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	101.00 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	100.89 tok/sEstimated Auto-generated benchmark	2GB

inference-net/Schematron-3B

2GB

123.20 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

122.21 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

120.63 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

118.71 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

118.16 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

116.34 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

116.13 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

116.11 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

116.08 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

115.10 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

113.65 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

113.51 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

111.85 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

111.37 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

111.01 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

109.67 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

109.17 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

109.16 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

109.04 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

108.34 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

106.93 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

106.41 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

106.17 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

105.65 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

105.56 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

105.51 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

105.35 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

105.15 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

104.03 tok/sEstimated

Auto-generated benchmark

NousResearch/Meta-Llama-3.1-8B-Instruct

4GB

103.47 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-1.5B-Instruct

3GB

103.07 tok/sEstimated

Auto-generated benchmark

liuhaotian/llava-v1.5-7b

4GB

103.06 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

102.89 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

102.53 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

102.40 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

102.38 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

102.31 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4GB

102.20 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-xl

4GB

102.20 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

102.05 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-mini-instruct

2GB

102.04 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B

3GB

102.02 tok/sEstimated

Auto-generated benchmark

MiniMaxAI/MiniMax-M2

4GB

101.81 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B

3GB

101.65 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

4GB

101.25 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2-0.5B-Instruct

3GB

101.11 tok/sEstimated

Auto-generated benchmark

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

4GB

101.03 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Base

2GB

101.00 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

4GB

101.00 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

100.89 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
google-t5/t5-3b	FP16	Fits comfortably	40.06 tok/sEstimated	6GB (have 16GB)
Qwen/Qwen2.5-0.5B	FP16	Fits comfortably	38.94 tok/sEstimated	11GB (have 16GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	Q4	Fits comfortably	101.00 tok/sEstimated	4GB (have 16GB)
meta-llama/Llama-2-7b-hf	Q4	Fits comfortably	99.03 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen2-0.5B	FP16	Fits comfortably	35.42 tok/sEstimated	11GB (have 16GB)
deepseek-ai/DeepSeek-R1-0528	FP16	Fits (tight)	33.81 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen2.5-32B-Instruct	Q8	Not supported	21.37 tok/sEstimated	33GB (have 16GB)
LiquidAI/LFM2-1.2B	FP16	Fits comfortably	43.91 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	Q8	Not supported	39.55 tok/sEstimated	31GB (have 16GB)
Qwen/Qwen3-Coder-30B-A3B-Instruct	FP16	Not supported	20.75 tok/sEstimated	61GB (have 16GB)
Qwen/Qwen3-30B-A3B	FP16	Not supported	20.88 tok/sEstimated	61GB (have 16GB)
rinna/japanese-gpt-neox-small	Q8	Fits comfortably	68.07 tok/sEstimated	7GB (have 16GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q4	Fits comfortably	91.84 tok/sEstimated	4GB (have 16GB)
unsloth/mistral-7b-v0.3-bnb-4bit	Q8	Fits comfortably	61.22 tok/sEstimated	7GB (have 16GB)
Qwen/Qwen2-7B-Instruct	Q4	Fits comfortably	88.96 tok/sEstimated	4GB (have 16GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	101.00 tok/sEstimated	2GB (have 16GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q4	Fits comfortably	70.98 tok/sEstimated	5GB (have 16GB)
nvidia/NVIDIA-Nemotron-Nano-9B-v2	Q8	Fits comfortably	51.95 tok/sEstimated	10GB (have 16GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	Q8	Fits comfortably	67.89 tok/sEstimated	9GB (have 16GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	FP16	Not supported	37.40 tok/sEstimated	17GB (have 16GB)
Qwen/Qwen3-30B-A3B-Thinking-2507	FP16	Not supported	19.90 tok/sEstimated	61GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	Fits (tight)	55.59 tok/sEstimated	15GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Not supported	32.95 tok/sEstimated	31GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	FP16	Not supported	17.80 tok/sEstimated	61GB (have 16GB)
deepseek-ai/DeepSeek-V3	Q8	Fits comfortably	65.13 tok/sEstimated	7GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q4	Fits (tight)	47.54 tok/sEstimated	15GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	Q8	Not supported	34.23 tok/sEstimated	31GB (have 16GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit	FP16	Not supported	20.24 tok/sEstimated	61GB (have 16GB)
dicta-il/dictalm2.0-instruct	Q4	Fits comfortably	84.97 tok/sEstimated	4GB (have 16GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	Q8	Fits comfortably	68.46 tok/sEstimated	7GB (have 16GB)
HuggingFaceM4/tiny-random-LlamaForCausalLM	FP16	Fits (tight)	34.38 tok/sEstimated	15GB (have 16GB)
Qwen/Qwen2.5-72B-Instruct	Q8	Not supported	13.65 tok/sEstimated	71GB (have 16GB)
Qwen/QwQ-32B-Preview	Q8	Not supported	23.20 tok/sEstimated	34GB (have 16GB)
Qwen/QwQ-32B-Preview	FP16	Not supported	12.54 tok/sEstimated	67GB (have 16GB)
mistralai/Mixtral-8x22B-Instruct-v0.1	FP16	Not supported	8.99 tok/sEstimated	275GB (have 16GB)
microsoft/Phi-3.5-mini-instruct	Q4	Fits comfortably	102.04 tok/sEstimated	2GB (have 16GB)
microsoft/Phi-3.5-mini-instruct	Q8	Fits comfortably	60.31 tok/sEstimated	4GB (have 16GB)
microsoft/Phi-3.5-mini-instruct	FP16	Fits comfortably	37.73 tok/sEstimated	8GB (have 16GB)
microsoft/Phi-3-medium-128k-instruct	Q4	Fits comfortably	76.96 tok/sEstimated	7GB (have 16GB)
NousResearch/Hermes-3-Llama-3.1-8B	Q4	Fits comfortably	64.40 tok/sEstimated	4GB (have 16GB)
NousResearch/Hermes-3-Llama-3.1-8B	FP16	Not supported	28.79 tok/sEstimated	17GB (have 16GB)
01-ai/Yi-1.5-34B-Chat	Q4	Not supported	35.02 tok/sEstimated	18GB (have 16GB)
01-ai/Yi-1.5-34B-Chat	Q8	Not supported	21.05 tok/sEstimated	35GB (have 16GB)
moonshotai/Kimi-Linear-48B-A3B-Instruct	FP16	Not supported	11.87 tok/sEstimated	101GB (have 16GB)
moonshotai/Kimi-K2-Thinking	FP16	Not supported	13.11 tok/sEstimated	1956GB (have 16GB)
deepseek-ai/DeepSeek-Math-V2	Q4	Not supported	14.86 tok/sEstimated	383GB (have 16GB)
deepseek-ai/DeepSeek-Math-V2	Q8	Not supported	10.61 tok/sEstimated	766GB (have 16GB)
deepseek-ai/DeepSeek-Math-V2	FP16	Not supported	5.56 tok/sEstimated	1532GB (have 16GB)
tencent/HunyuanOCR	Q8	Fits comfortably	79.68 tok/sEstimated	2GB (have 16GB)
google-t5/t5-3b	Q8	Fits comfortably	77.97 tok/sEstimated	3GB (have 16GB)

google-t5/t5-3bFP16

Fits comfortably6GB required · 16GB available

40.06 tok/sEstimated

Qwen/Qwen2.5-0.5BFP16

Fits comfortably11GB required · 16GB available

38.94 tok/sEstimated

deepseek-ai/DeepSeek-R1-Distill-Qwen-7BQ4

Fits comfortably4GB required · 16GB available

101.00 tok/sEstimated

meta-llama/Llama-2-7b-hfQ4

Fits comfortably4GB required · 16GB available

99.03 tok/sEstimated

Qwen/Qwen2-0.5BFP16

Fits comfortably11GB required · 16GB available

35.42 tok/sEstimated

deepseek-ai/DeepSeek-R1-0528FP16

Fits (tight)15GB required · 16GB available

33.81 tok/sEstimated

Qwen/Qwen2.5-32B-InstructQ8

Not supported33GB required · 16GB available

21.37 tok/sEstimated

LiquidAI/LFM2-1.2BFP16

Fits comfortably4GB required · 16GB available

43.91 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructQ8

Not supported31GB required · 16GB available

39.55 tok/sEstimated

Qwen/Qwen3-Coder-30B-A3B-InstructFP16

Not supported61GB required · 16GB available

20.75 tok/sEstimated

Qwen/Qwen3-30B-A3BFP16

Not supported61GB required · 16GB available

20.88 tok/sEstimated

rinna/japanese-gpt-neox-smallQ8

Fits comfortably7GB required · 16GB available

68.07 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ4

Fits comfortably4GB required · 16GB available

91.84 tok/sEstimated

unsloth/mistral-7b-v0.3-bnb-4bitQ8

Fits comfortably7GB required · 16GB available

61.22 tok/sEstimated

Qwen/Qwen2-7B-InstructQ4

Fits comfortably4GB required · 16GB available

88.96 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 16GB available

101.00 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q4

Fits comfortably5GB required · 16GB available

70.98 tok/sEstimated

nvidia/NVIDIA-Nemotron-Nano-9B-v2Q8

Fits comfortably10GB required · 16GB available

51.95 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 16GB available

67.89 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructFP16

Not supported17GB required · 16GB available

37.40 tok/sEstimated

Qwen/Qwen3-30B-A3B-Thinking-2507FP16

Not supported61GB required · 16GB available

19.90 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ4

Fits (tight)15GB required · 16GB available

55.59 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Not supported31GB required · 16GB available

32.95 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitFP16

Not supported61GB required · 16GB available

17.80 tok/sEstimated

deepseek-ai/DeepSeek-V3Q8

Fits comfortably7GB required · 16GB available

65.13 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ4

Fits (tight)15GB required · 16GB available

47.54 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitQ8

Not supported31GB required · 16GB available

34.23 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bitFP16

Not supported61GB required · 16GB available

20.24 tok/sEstimated

dicta-il/dictalm2.0-instructQ4

Fits comfortably4GB required · 16GB available

84.97 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMQ8

Fits comfortably7GB required · 16GB available

68.46 tok/sEstimated

HuggingFaceM4/tiny-random-LlamaForCausalLMFP16

Fits (tight)15GB required · 16GB available

34.38 tok/sEstimated

Qwen/Qwen2.5-72B-InstructQ8

Not supported71GB required · 16GB available

13.65 tok/sEstimated

Qwen/QwQ-32B-PreviewQ8

Not supported34GB required · 16GB available

23.20 tok/sEstimated

Qwen/QwQ-32B-PreviewFP16

Not supported67GB required · 16GB available

12.54 tok/sEstimated

mistralai/Mixtral-8x22B-Instruct-v0.1FP16

Not supported275GB required · 16GB available

8.99 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ4

Fits comfortably2GB required · 16GB available

102.04 tok/sEstimated

microsoft/Phi-3.5-mini-instructQ8

Fits comfortably4GB required · 16GB available

60.31 tok/sEstimated

microsoft/Phi-3.5-mini-instructFP16

Fits comfortably8GB required · 16GB available

37.73 tok/sEstimated

microsoft/Phi-3-medium-128k-instructQ4

Fits comfortably7GB required · 16GB available

76.96 tok/sEstimated

NousResearch/Hermes-3-Llama-3.1-8BQ4

Fits comfortably4GB required · 16GB available

64.40 tok/sEstimated

NousResearch/Hermes-3-Llama-3.1-8BFP16

Not supported17GB required · 16GB available

28.79 tok/sEstimated

01-ai/Yi-1.5-34B-ChatQ4

Not supported18GB required · 16GB available

35.02 tok/sEstimated

01-ai/Yi-1.5-34B-ChatQ8

Not supported35GB required · 16GB available

21.05 tok/sEstimated

moonshotai/Kimi-Linear-48B-A3B-InstructFP16

Not supported101GB required · 16GB available

11.87 tok/sEstimated

moonshotai/Kimi-K2-ThinkingFP16

Not supported1956GB required · 16GB available

13.11 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2Q4

Not supported383GB required · 16GB available

14.86 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2Q8

Not supported766GB required · 16GB available

10.61 tok/sEstimated

deepseek-ai/DeepSeek-Math-V2FP16

Not supported1532GB required · 16GB available

5.56 tok/sEstimated

tencent/HunyuanOCRQ8

Fits comfortably2GB required · 16GB available

79.68 tok/sEstimated

google-t5/t5-3bQ8

Fits comfortably3GB required · 16GB available

77.97 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.