Quick Answer: AMD Instinct MI210 offers 64GB VRAM and starts around current market pricing. It delivers approximately 313 tokens/sec on meta-llama/Llama-Guard-3-1B. It typically draws 300W under load.

AMD Instinct MI210

Check availability

By AMDReleased 2021-11MSRP $6,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM64GB

Cores6,656

TDP300W

ArchitectureCDNA 2

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test AMD Instinct MI210 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-Guard-3-1B	Q4	313.28 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	311.17 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	310.04 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	306.67 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	306.48 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	305.19 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	303.85 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	303.02 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	299.41 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	298.98 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	298.34 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	294.14 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	291.87 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-OCR	Q4	291.52 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	284.59 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	281.97 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	280.16 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	278.18 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	276.71 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	275.82 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	275.13 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	272.63 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	272.50 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	268.81 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	266.68 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	266.19 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	263.00 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	261.99 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-1.5B-Instruct	Q4	260.84 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.2-3B-Instruct	Q4	260.81 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	260.77 tok/sEstimated Auto-generated benchmark	2GB
microsoft/phi-2	Q4	260.16 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	260.08 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-Guard-3-8B	Q4	258.95 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Llama-3.2-1B-Instruct	Q4	258.79 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Meta-Llama-3-8B	Q4	258.52 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	258.48 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	257.69 tok/sEstimated Auto-generated benchmark	4GB
parler-tts/parler-tts-large-v1	Q4	257.67 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	257.20 tok/sEstimated Auto-generated benchmark	4GB
google-bert/bert-base-uncased	Q4	256.96 tok/sEstimated Auto-generated benchmark	1GB
microsoft/VibeVoice-1.5B	Q4	256.93 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B-Base	Q4	256.63 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Math-1.5B	Q4	256.46 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3-mini-128k-instruct	Q4	256.17 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	255.34 tok/sEstimated Auto-generated benchmark	3GB
microsoft/DialoGPT-small	Q4	254.55 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	254.51 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-v0.1	Q4	254.10 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-docling-258M	Q4	254.07 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-Guard-3-1B

1GB

313.28 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

311.17 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

310.04 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

306.67 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

306.48 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

305.19 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

303.85 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

303.02 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

299.41 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

298.98 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

298.34 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

294.14 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

291.87 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

291.52 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

284.59 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

281.97 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

280.16 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

278.18 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

276.71 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

275.82 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

275.13 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

272.63 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

272.50 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

268.81 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

266.68 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

266.19 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

263.00 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

261.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

3GB

260.84 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

260.81 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

260.77 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

260.16 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

260.08 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

258.95 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

258.79 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

258.52 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

258.48 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

257.69 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

257.67 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

257.20 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

256.96 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

256.93 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

256.63 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

3GB

256.46 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

256.17 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3GB

255.34 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

254.55 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

4GB

254.51 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

254.10 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

4GB

254.07 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Qwen/Qwen3-0.6B	FP16	Fits comfortably	93.54 tok/sEstimated	13GB (have 64GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	218.84 tok/sEstimated	3GB (have 64GB)
google/gemma-3-1b-it	Q4	Fits comfortably	303.02 tok/sEstimated	1GB (have 64GB)
google/gemma-3-1b-it	Q8	Fits comfortably	206.78 tok/sEstimated	1GB (have 64GB)
google/gemma-3-1b-it	FP16	Fits comfortably	117.47 tok/sEstimated	2GB (have 64GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	86.15 tok/sEstimated	9GB (have 64GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	261.99 tok/sEstimated	1GB (have 64GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	168.98 tok/sEstimated	9GB (have 64GB)
Qwen/Qwen3-8B	FP16	Fits comfortably	88.42 tok/sEstimated	17GB (have 64GB)
meta-llama/Meta-Llama-3-8B	FP16	Fits comfortably	98.52 tok/sEstimated	17GB (have 64GB)
Qwen/Qwen2.5-7B	Q4	Fits comfortably	235.85 tok/sEstimated	4GB (have 64GB)
Qwen/Qwen2.5-7B	Q8	Fits comfortably	154.48 tok/sEstimated	7GB (have 64GB)
Qwen/Qwen2.5-7B	FP16	Fits comfortably	91.71 tok/sEstimated	15GB (have 64GB)
Qwen/Qwen3-0.6B-Base	Q8	Fits comfortably	173.51 tok/sEstimated	6GB (have 64GB)
Qwen/Qwen3-0.6B-Base	FP16	Fits comfortably	92.22 tok/sEstimated	13GB (have 64GB)
Qwen/Qwen3-30B-A3B	Q8	Fits comfortably	82.87 tok/sEstimated	31GB (have 64GB)
Qwen/Qwen3-30B-A3B	FP16	Fits comfortably	47.62 tok/sEstimated	61GB (have 64GB)
microsoft/Phi-3.5-vision-instruct	Q8	Fits comfortably	175.09 tok/sEstimated	7GB (have 64GB)
Qwen/Qwen2-7B-Instruct	Q8	Fits comfortably	172.25 tok/sEstimated	7GB (have 64GB)
Qwen/Qwen2-7B-Instruct	FP16	Fits comfortably	88.63 tok/sEstimated	15GB (have 64GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	225.62 tok/sEstimated	2GB (have 64GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	Fits comfortably	241.68 tok/sEstimated	2GB (have 64GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	Fits comfortably	168.39 tok/sEstimated	4GB (have 64GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	FP16	Fits comfortably	85.74 tok/sEstimated	9GB (have 64GB)
unsloth/Llama-3.2-3B-Instruct	Q4	Fits comfortably	298.34 tok/sEstimated	2GB (have 64GB)
unsloth/Llama-3.2-3B-Instruct	Q8	Fits comfortably	186.17 tok/sEstimated	3GB (have 64GB)
OpenPipe/Qwen3-14B-Instruct	FP16	Fits comfortably	73.55 tok/sEstimated	29GB (have 64GB)
openai-community/gpt2-xl	Q4	Fits comfortably	218.65 tok/sEstimated	4GB (have 64GB)
openai-community/gpt2-xl	Q8	Fits comfortably	167.09 tok/sEstimated	7GB (have 64GB)
openai-community/gpt2-xl	FP16	Fits comfortably	97.89 tok/sEstimated	15GB (have 64GB)
microsoft/Phi-3-mini-128k-instruct	Q4	Fits comfortably	256.17 tok/sEstimated	4GB (have 64GB)
microsoft/Phi-3-mini-128k-instruct	Q8	Fits comfortably	150.93 tok/sEstimated	7GB (have 64GB)
GSAI-ML/LLaDA-8B-Instruct	Q4	Fits comfortably	227.64 tok/sEstimated	4GB (have 64GB)
GSAI-ML/LLaDA-8B-Instruct	Q8	Fits comfortably	163.69 tok/sEstimated	9GB (have 64GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	Fits comfortably	166.47 tok/sEstimated	9GB (have 64GB)
skt/kogpt2-base-v2	Q8	Fits comfortably	180.06 tok/sEstimated	7GB (have 64GB)
skt/kogpt2-base-v2	FP16	Fits comfortably	91.11 tok/sEstimated	15GB (have 64GB)
ibm-granite/granite-docling-258M	Q4	Fits comfortably	254.07 tok/sEstimated	4GB (have 64GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q4	Fits comfortably	50.19 tok/sEstimated	39GB (have 64GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	Not supported	34.41 tok/sEstimated	78GB (have 64GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	FP16	Not supported	18.47 tok/sEstimated	156GB (have 64GB)
meta-llama/Llama-2-13b-chat-hf	Q4	Fits comfortably	173.60 tok/sEstimated	7GB (have 64GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	98.95 tok/sEstimated	17GB (have 64GB)
apple/OpenELM-1_1B-Instruct	Q4	Fits comfortably	263.00 tok/sEstimated	1GB (have 64GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	Fits comfortably	125.81 tok/sEstimated	15GB (have 64GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Fits comfortably	94.83 tok/sEstimated	31GB (have 64GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	FP16	Fits comfortably	49.67 tok/sEstimated	61GB (have 64GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	Fits comfortably	230.57 tok/sEstimated	3GB (have 64GB)
Qwen/QwQ-32B-Preview	Q8	Fits comfortably	54.52 tok/sEstimated	34GB (have 64GB)
Qwen/Qwen3-0.6B	Q8	Fits comfortably	175.34 tok/sEstimated	6GB (have 64GB)

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 64GB available

93.54 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 64GB available

218.84 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 64GB available

303.02 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 64GB available

206.78 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 64GB available

117.47 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 64GB available

86.15 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 64GB available

261.99 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 64GB available

168.98 tok/sEstimated

Qwen/Qwen3-8BFP16

Fits comfortably17GB required · 64GB available

88.42 tok/sEstimated

meta-llama/Meta-Llama-3-8BFP16

Fits comfortably17GB required · 64GB available

98.52 tok/sEstimated

Qwen/Qwen2.5-7BQ4

Fits comfortably4GB required · 64GB available

235.85 tok/sEstimated

Qwen/Qwen2.5-7BQ8

Fits comfortably7GB required · 64GB available

154.48 tok/sEstimated

Qwen/Qwen2.5-7BFP16

Fits comfortably15GB required · 64GB available

91.71 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ8

Fits comfortably6GB required · 64GB available

173.51 tok/sEstimated

Qwen/Qwen3-0.6B-BaseFP16

Fits comfortably13GB required · 64GB available

92.22 tok/sEstimated

Qwen/Qwen3-30B-A3BQ8

Fits comfortably31GB required · 64GB available

82.87 tok/sEstimated

Qwen/Qwen3-30B-A3BFP16

Fits comfortably61GB required · 64GB available

47.62 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ8

Fits comfortably7GB required · 64GB available

175.09 tok/sEstimated

Qwen/Qwen2-7B-InstructQ8

Fits comfortably7GB required · 64GB available

172.25 tok/sEstimated

Qwen/Qwen2-7B-InstructFP16

Fits comfortably15GB required · 64GB available

88.63 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 64GB available

225.62 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4

Fits comfortably2GB required · 64GB available

241.68 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ8

Fits comfortably4GB required · 64GB available

168.39 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitFP16

Fits comfortably9GB required · 64GB available

85.74 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 64GB available

298.34 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 64GB available

186.17 tok/sEstimated

OpenPipe/Qwen3-14B-InstructFP16

Fits comfortably29GB required · 64GB available

73.55 tok/sEstimated

openai-community/gpt2-xlQ4

Fits comfortably4GB required · 64GB available

218.65 tok/sEstimated

openai-community/gpt2-xlQ8

Fits comfortably7GB required · 64GB available

167.09 tok/sEstimated

openai-community/gpt2-xlFP16

Fits comfortably15GB required · 64GB available

97.89 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ4

Fits comfortably4GB required · 64GB available

256.17 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ8

Fits comfortably7GB required · 64GB available

150.93 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ4

Fits comfortably4GB required · 64GB available

227.64 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ8

Fits comfortably9GB required · 64GB available

163.69 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ8

Fits comfortably9GB required · 64GB available

166.47 tok/sEstimated

skt/kogpt2-base-v2Q8

Fits comfortably7GB required · 64GB available

180.06 tok/sEstimated

skt/kogpt2-base-v2FP16

Fits comfortably15GB required · 64GB available

91.11 tok/sEstimated

ibm-granite/granite-docling-258MQ4

Fits comfortably4GB required · 64GB available

254.07 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ4

Fits comfortably39GB required · 64GB available

50.19 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ8

Not supported78GB required · 64GB available

34.41 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingFP16

Not supported156GB required · 64GB available

18.47 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ4

Fits comfortably7GB required · 64GB available

173.60 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 64GB available

98.95 tok/sEstimated

apple/OpenELM-1_1B-InstructQ4

Fits comfortably1GB required · 64GB available

263.00 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ4

Fits comfortably15GB required · 64GB available

125.81 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Fits comfortably31GB required · 64GB available

94.83 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitFP16

Fits comfortably61GB required · 64GB available

49.67 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ4

Fits comfortably3GB required · 64GB available

230.57 tok/sEstimated

Qwen/QwQ-32B-PreviewQ8

Fits comfortably34GB required · 64GB available

54.52 tok/sEstimated

Qwen/Qwen3-0.6BQ8

Fits comfortably6GB required · 64GB available

175.34 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.

AMD Instinct MI210

Check availability

By AMDReleased 2021-11MSRP $6,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM64GB

Cores6,656

TDP300W

ArchitectureCDNA 2

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.

💡 Not ready to buy? Try cloud GPUs first

Test AMD Instinct MI210 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-Guard-3-1B	Q4	313.28 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	311.17 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	310.04 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	306.67 tok/sEstimated Auto-generated benchmark	1GB
allenai/OLMo-2-0425-1B	Q4	306.48 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	305.19 tok/sEstimated Auto-generated benchmark	2GB
WeiboAI/VibeThinker-1.5B	Q4	303.85 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	303.02 tok/sEstimated Auto-generated benchmark	1GB
ibm-granite/granite-3.3-2b-instruct	Q4	299.41 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-3B	Q4	298.98 tok/sEstimated Auto-generated benchmark	2GB
unsloth/Llama-3.2-3B-Instruct	Q4	298.34 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	294.14 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	291.87 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/DeepSeek-OCR	Q4	291.52 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	284.59 tok/sEstimated Auto-generated benchmark	2GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	281.97 tok/sEstimated Auto-generated benchmark	2GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	280.16 tok/sEstimated Auto-generated benchmark	2GB
google/embeddinggemma-300m	Q4	278.18 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B	Q4	276.71 tok/sEstimated Auto-generated benchmark	2GB
nari-labs/Dia2-2B	Q4	275.82 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	275.13 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-2b	Q4	272.63 tok/sEstimated Auto-generated benchmark	1GB
bigcode/starcoder2-3b	Q4	272.50 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	268.81 tok/sEstimated Auto-generated benchmark	1GB
tencent/HunyuanOCR	Q4	266.68 tok/sEstimated Auto-generated benchmark	1GB
ibm-research/PowerMoE-3b	Q4	266.19 tok/sEstimated Auto-generated benchmark	2GB
apple/OpenELM-1_1B-Instruct	Q4	263.00 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B-Instruct	Q4	261.99 tok/sEstimated Auto-generated benchmark	1GB
Qwen/Qwen2.5-1.5B-Instruct	Q4	260.84 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-3.2-3B-Instruct	Q4	260.81 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	260.77 tok/sEstimated Auto-generated benchmark	2GB
microsoft/phi-2	Q4	260.16 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-0.5B-Instruct	Q4	260.08 tok/sEstimated Auto-generated benchmark	3GB
meta-llama/Llama-Guard-3-8B	Q4	258.95 tok/sEstimated Auto-generated benchmark	4GB
unsloth/Llama-3.2-1B-Instruct	Q4	258.79 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Meta-Llama-3-8B	Q4	258.52 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-3.3-8b-instruct	Q4	258.48 tok/sEstimated Auto-generated benchmark	4GB
meta-llama/Llama-3.1-8B	Q4	257.69 tok/sEstimated Auto-generated benchmark	4GB
parler-tts/parler-tts-large-v1	Q4	257.67 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	257.20 tok/sEstimated Auto-generated benchmark	4GB
google-bert/bert-base-uncased	Q4	256.96 tok/sEstimated Auto-generated benchmark	1GB
microsoft/VibeVoice-1.5B	Q4	256.93 tok/sEstimated Auto-generated benchmark	3GB
Qwen/Qwen3-1.7B-Base	Q4	256.63 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen2.5-Math-1.5B	Q4	256.46 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3-mini-128k-instruct	Q4	256.17 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Q4	255.34 tok/sEstimated Auto-generated benchmark	3GB
microsoft/DialoGPT-small	Q4	254.55 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q4	254.51 tok/sEstimated Auto-generated benchmark	4GB
mistralai/Mistral-7B-v0.1	Q4	254.10 tok/sEstimated Auto-generated benchmark	4GB
ibm-granite/granite-docling-258M	Q4	254.07 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-Guard-3-1B

1GB

313.28 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

311.17 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

310.04 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

306.67 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

306.48 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

305.19 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

303.85 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

303.02 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

299.41 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

298.98 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

298.34 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

294.14 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

291.87 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

291.52 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

284.59 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

281.97 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

280.16 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

278.18 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

276.71 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

275.82 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

275.13 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

272.63 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

272.50 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

268.81 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

266.68 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

266.19 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

263.00 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B-Instruct

1GB

261.99 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-1.5B-Instruct

3GB

260.84 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

260.81 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

260.77 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

260.16 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-0.5B-Instruct

3GB

260.08 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-8B

4GB

258.95 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

258.79 tok/sEstimated

Auto-generated benchmark

meta-llama/Meta-Llama-3-8B

4GB

258.52 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-8b-instruct

4GB

258.48 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

257.69 tok/sEstimated

Auto-generated benchmark

parler-tts/parler-tts-large-v1

4GB

257.67 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

257.20 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

256.96 tok/sEstimated

Auto-generated benchmark

microsoft/VibeVoice-1.5B

3GB

256.93 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-1.7B-Base

4GB

256.63 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-Math-1.5B

3GB

256.46 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

256.17 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3GB

255.34 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

254.55 tok/sEstimated

Auto-generated benchmark

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit

4GB

254.51 tok/sEstimated

Auto-generated benchmark

mistralai/Mistral-7B-v0.1

4GB

254.10 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-docling-258M

4GB

254.07 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
Qwen/Qwen3-0.6B	FP16	Fits comfortably	93.54 tok/sEstimated	13GB (have 64GB)
Gensyn/Qwen2.5-0.5B-Instruct	Q4	Fits comfortably	218.84 tok/sEstimated	3GB (have 64GB)
google/gemma-3-1b-it	Q4	Fits comfortably	303.02 tok/sEstimated	1GB (have 64GB)
google/gemma-3-1b-it	Q8	Fits comfortably	206.78 tok/sEstimated	1GB (have 64GB)
google/gemma-3-1b-it	FP16	Fits comfortably	117.47 tok/sEstimated	2GB (have 64GB)
Qwen/Qwen3-4B-Instruct-2507	FP16	Fits comfortably	86.15 tok/sEstimated	9GB (have 64GB)
meta-llama/Llama-3.2-1B-Instruct	Q4	Fits comfortably	261.99 tok/sEstimated	1GB (have 64GB)
Qwen/Qwen3-8B	Q8	Fits comfortably	168.98 tok/sEstimated	9GB (have 64GB)
Qwen/Qwen3-8B	FP16	Fits comfortably	88.42 tok/sEstimated	17GB (have 64GB)
meta-llama/Meta-Llama-3-8B	FP16	Fits comfortably	98.52 tok/sEstimated	17GB (have 64GB)
Qwen/Qwen2.5-7B	Q4	Fits comfortably	235.85 tok/sEstimated	4GB (have 64GB)
Qwen/Qwen2.5-7B	Q8	Fits comfortably	154.48 tok/sEstimated	7GB (have 64GB)
Qwen/Qwen2.5-7B	FP16	Fits comfortably	91.71 tok/sEstimated	15GB (have 64GB)
Qwen/Qwen3-0.6B-Base	Q8	Fits comfortably	173.51 tok/sEstimated	6GB (have 64GB)
Qwen/Qwen3-0.6B-Base	FP16	Fits comfortably	92.22 tok/sEstimated	13GB (have 64GB)
Qwen/Qwen3-30B-A3B	Q8	Fits comfortably	82.87 tok/sEstimated	31GB (have 64GB)
Qwen/Qwen3-30B-A3B	FP16	Fits comfortably	47.62 tok/sEstimated	61GB (have 64GB)
microsoft/Phi-3.5-vision-instruct	Q8	Fits comfortably	175.09 tok/sEstimated	7GB (have 64GB)
Qwen/Qwen2-7B-Instruct	Q8	Fits comfortably	172.25 tok/sEstimated	7GB (have 64GB)
Qwen/Qwen2-7B-Instruct	FP16	Fits comfortably	88.63 tok/sEstimated	15GB (have 64GB)
Qwen/Qwen3-4B-Base	Q4	Fits comfortably	225.62 tok/sEstimated	2GB (have 64GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	Fits comfortably	241.68 tok/sEstimated	2GB (have 64GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q8	Fits comfortably	168.39 tok/sEstimated	4GB (have 64GB)
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	FP16	Fits comfortably	85.74 tok/sEstimated	9GB (have 64GB)
unsloth/Llama-3.2-3B-Instruct	Q4	Fits comfortably	298.34 tok/sEstimated	2GB (have 64GB)
unsloth/Llama-3.2-3B-Instruct	Q8	Fits comfortably	186.17 tok/sEstimated	3GB (have 64GB)
OpenPipe/Qwen3-14B-Instruct	FP16	Fits comfortably	73.55 tok/sEstimated	29GB (have 64GB)
openai-community/gpt2-xl	Q4	Fits comfortably	218.65 tok/sEstimated	4GB (have 64GB)
openai-community/gpt2-xl	Q8	Fits comfortably	167.09 tok/sEstimated	7GB (have 64GB)
openai-community/gpt2-xl	FP16	Fits comfortably	97.89 tok/sEstimated	15GB (have 64GB)
microsoft/Phi-3-mini-128k-instruct	Q4	Fits comfortably	256.17 tok/sEstimated	4GB (have 64GB)
microsoft/Phi-3-mini-128k-instruct	Q8	Fits comfortably	150.93 tok/sEstimated	7GB (have 64GB)
GSAI-ML/LLaDA-8B-Instruct	Q4	Fits comfortably	227.64 tok/sEstimated	4GB (have 64GB)
GSAI-ML/LLaDA-8B-Instruct	Q8	Fits comfortably	163.69 tok/sEstimated	9GB (have 64GB)
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit	Q8	Fits comfortably	166.47 tok/sEstimated	9GB (have 64GB)
skt/kogpt2-base-v2	Q8	Fits comfortably	180.06 tok/sEstimated	7GB (have 64GB)
skt/kogpt2-base-v2	FP16	Fits comfortably	91.11 tok/sEstimated	15GB (have 64GB)
ibm-granite/granite-docling-258M	Q4	Fits comfortably	254.07 tok/sEstimated	4GB (have 64GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q4	Fits comfortably	50.19 tok/sEstimated	39GB (have 64GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	Not supported	34.41 tok/sEstimated	78GB (have 64GB)
Qwen/Qwen3-Next-80B-A3B-Thinking	FP16	Not supported	18.47 tok/sEstimated	156GB (have 64GB)
meta-llama/Llama-2-13b-chat-hf	Q4	Fits comfortably	173.60 tok/sEstimated	7GB (have 64GB)
NousResearch/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	98.95 tok/sEstimated	17GB (have 64GB)
apple/OpenELM-1_1B-Instruct	Q4	Fits comfortably	263.00 tok/sEstimated	1GB (have 64GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q4	Fits comfortably	125.81 tok/sEstimated	15GB (have 64GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	Q8	Fits comfortably	94.83 tok/sEstimated	31GB (have 64GB)
lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit	FP16	Fits comfortably	49.67 tok/sEstimated	61GB (have 64GB)
Alibaba-NLP/gte-Qwen2-1.5B-instruct	Q4	Fits comfortably	230.57 tok/sEstimated	3GB (have 64GB)
Qwen/QwQ-32B-Preview	Q8	Fits comfortably	54.52 tok/sEstimated	34GB (have 64GB)
Qwen/Qwen3-0.6B	Q8	Fits comfortably	175.34 tok/sEstimated	6GB (have 64GB)

Qwen/Qwen3-0.6BFP16

Fits comfortably13GB required · 64GB available

93.54 tok/sEstimated

Gensyn/Qwen2.5-0.5B-InstructQ4

Fits comfortably3GB required · 64GB available

218.84 tok/sEstimated

google/gemma-3-1b-itQ4

Fits comfortably1GB required · 64GB available

303.02 tok/sEstimated

google/gemma-3-1b-itQ8

Fits comfortably1GB required · 64GB available

206.78 tok/sEstimated

google/gemma-3-1b-itFP16

Fits comfortably2GB required · 64GB available

117.47 tok/sEstimated

Qwen/Qwen3-4B-Instruct-2507FP16

Fits comfortably9GB required · 64GB available

86.15 tok/sEstimated

meta-llama/Llama-3.2-1B-InstructQ4

Fits comfortably1GB required · 64GB available

261.99 tok/sEstimated

Qwen/Qwen3-8BQ8

Fits comfortably9GB required · 64GB available

168.98 tok/sEstimated

Qwen/Qwen3-8BFP16

Fits comfortably17GB required · 64GB available

88.42 tok/sEstimated

meta-llama/Meta-Llama-3-8BFP16

Fits comfortably17GB required · 64GB available

98.52 tok/sEstimated

Qwen/Qwen2.5-7BQ4

Fits comfortably4GB required · 64GB available

235.85 tok/sEstimated

Qwen/Qwen2.5-7BQ8

Fits comfortably7GB required · 64GB available

154.48 tok/sEstimated

Qwen/Qwen2.5-7BFP16

Fits comfortably15GB required · 64GB available

91.71 tok/sEstimated

Qwen/Qwen3-0.6B-BaseQ8

Fits comfortably6GB required · 64GB available

173.51 tok/sEstimated

Qwen/Qwen3-0.6B-BaseFP16

Fits comfortably13GB required · 64GB available

92.22 tok/sEstimated

Qwen/Qwen3-30B-A3BQ8

Fits comfortably31GB required · 64GB available

82.87 tok/sEstimated

Qwen/Qwen3-30B-A3BFP16

Fits comfortably61GB required · 64GB available

47.62 tok/sEstimated

microsoft/Phi-3.5-vision-instructQ8

Fits comfortably7GB required · 64GB available

175.09 tok/sEstimated

Qwen/Qwen2-7B-InstructQ8

Fits comfortably7GB required · 64GB available

172.25 tok/sEstimated

Qwen/Qwen2-7B-InstructFP16

Fits comfortably15GB required · 64GB available

88.63 tok/sEstimated

Qwen/Qwen3-4B-BaseQ4

Fits comfortably2GB required · 64GB available

225.62 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4

Fits comfortably2GB required · 64GB available

241.68 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ8

Fits comfortably4GB required · 64GB available

168.39 tok/sEstimated

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitFP16

Fits comfortably9GB required · 64GB available

85.74 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ4

Fits comfortably2GB required · 64GB available

298.34 tok/sEstimated

unsloth/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 64GB available

186.17 tok/sEstimated

OpenPipe/Qwen3-14B-InstructFP16

Fits comfortably29GB required · 64GB available

73.55 tok/sEstimated

openai-community/gpt2-xlQ4

Fits comfortably4GB required · 64GB available

218.65 tok/sEstimated

openai-community/gpt2-xlQ8

Fits comfortably7GB required · 64GB available

167.09 tok/sEstimated

openai-community/gpt2-xlFP16

Fits comfortably15GB required · 64GB available

97.89 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ4

Fits comfortably4GB required · 64GB available

256.17 tok/sEstimated

microsoft/Phi-3-mini-128k-instructQ8

Fits comfortably7GB required · 64GB available

150.93 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ4

Fits comfortably4GB required · 64GB available

227.64 tok/sEstimated

GSAI-ML/LLaDA-8B-InstructQ8

Fits comfortably9GB required · 64GB available

163.69 tok/sEstimated

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bitQ8

Fits comfortably9GB required · 64GB available

166.47 tok/sEstimated

skt/kogpt2-base-v2Q8

Fits comfortably7GB required · 64GB available

180.06 tok/sEstimated

skt/kogpt2-base-v2FP16

Fits comfortably15GB required · 64GB available

91.11 tok/sEstimated

ibm-granite/granite-docling-258MQ4

Fits comfortably4GB required · 64GB available

254.07 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ4

Fits comfortably39GB required · 64GB available

50.19 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingQ8

Not supported78GB required · 64GB available

34.41 tok/sEstimated

Qwen/Qwen3-Next-80B-A3B-ThinkingFP16

Not supported156GB required · 64GB available

18.47 tok/sEstimated

meta-llama/Llama-2-13b-chat-hfQ4

Fits comfortably7GB required · 64GB available

173.60 tok/sEstimated

NousResearch/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 64GB available

98.95 tok/sEstimated

apple/OpenELM-1_1B-InstructQ4

Fits comfortably1GB required · 64GB available

263.00 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ4

Fits comfortably15GB required · 64GB available

125.81 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitQ8

Fits comfortably31GB required · 64GB available

94.83 tok/sEstimated

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bitFP16

Fits comfortably61GB required · 64GB available

49.67 tok/sEstimated

Alibaba-NLP/gte-Qwen2-1.5B-instructQ4

Fits comfortably3GB required · 64GB available

230.57 tok/sEstimated

Qwen/QwQ-32B-PreviewQ8

Fits comfortably34GB required · 64GB available

54.52 tok/sEstimated

Qwen/Qwen3-0.6BQ8

Fits comfortably6GB required · 64GB available

175.34 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

RTX 5070

12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB

16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT

16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super

12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080

10GB

Explore how RTX 3080 stacks up for local inference workloads.