L
localai.computer
ModelsGPUsSystemsAI SetupsBuildsOpenClawMethodology

Resources

  • Methodology
  • Submit Benchmark
  • About

Browse

  • AI Models
  • GPUs
  • PC Builds

Guides

  • OpenClaw Guide
  • How-To Guides

Legal

  • Privacy
  • Terms
  • Contact

© 2025 localai.computer. Hardware recommendations for running AI models locally.

ℹ️We earn from qualifying purchases through affiliate links at no extra cost to you. This supports our free content and research.

  1. Home
  2. GPUs
  3. AMD Instinct MI250X

Quick Answer: AMD Instinct MI250X offers 128GB VRAM and starts around current market pricing. It delivers approximately 649 tokens/sec on deepseek-ai/DeepSeek-OCR-2. It typically draws 560W under load.

AMD Instinct MI250X

Check availability
By AMDReleased 2021-11MSRP $11,000.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Search on AmazonView Benchmarks
Specs snapshot
Key hardware metrics for AI workloads.
VRAM128GB
Cores14,080
TDP560W
ArchitectureCDNA 2
Key Takeaways
  • 128GB VRAM - runs models up to ~320B parameters
  • High-end compute for demanding workloads
  • High power draw (560W) - requires robust PSU (850W+ recommended)
  • Strong price-to-VRAM value

What this means for you

With 128GB VRAM, AMD Instinct MI250X can run models up to approximately 320B parameters using 4-bit quantization. This handles most popular models including Llama 3 70B, Mistral 7B, and larger.

Who should buy

  • Professional AI workloads requiring maximum VRAM
  • Running 100B+ parameter models with full precision

Looking to upgrade?

Consider H100 or MI300X — Maximum VRAM for enterprise workloads.

AI benchmarks

ModelQuantizationTokens/secVRAM used
deepseek-ai/DeepSeek-OCR-2Q4
649.00 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/Llama-3.2-1B-InstructQ4
627.73 tok/sEstimated

Auto-generated benchmark

1GB
LiquidAI/LFM2-1.2BQ4
627.55 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-3-1b-itQ4
625.75 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen2.5-3BQ4
621.04 tok/sEstimated

Auto-generated benchmark

2GB
nari-labs/Dia2-2BQ4
613.04 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-3B-InstructQ4
611.92 tok/sEstimated

Auto-generated benchmark

2GB
deepseek-ai/DeepSeek-OCRQ4
610.41 tok/sEstimated

Auto-generated benchmark

2GB
google/embeddinggemma-300mQ4
605.34 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen3-ASR-1.7BQ4
602.28 tok/sEstimated

Auto-generated benchmark

2GB
google-bert/bert-base-uncasedQ4
600.46 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/deepseek-coder-1.3b-instructQ4
586.19 tok/sEstimated

Auto-generated benchmark

2GB
nineninesix/kani-tts-2-enQ4
585.96 tok/sEstimated

Auto-generated benchmark

1GB
facebook/sam3Q4
585.38 tok/sEstimated

Auto-generated benchmark

1GB
ibm-granite/granite-3.3-2b-instructQ4
582.94 tok/sEstimated

Auto-generated benchmark

1GB
allenai/OLMo-2-0425-1BQ4
579.90 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-3B-InstructQ4
578.12 tok/sEstimated

Auto-generated benchmark

2GB
bigcode/starcoder2-3bQ4
570.73 tok/sEstimated

Auto-generated benchmark

2GB
google/gemma-2-2b-itQ4
570.11 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-2bQ4
561.32 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-Guard-3-1BQ4
559.58 tok/sEstimated

Auto-generated benchmark

1GB
inference-net/Schematron-3BQ4
558.55 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/gemma-3-1b-itQ4
557.67 tok/sEstimated

Auto-generated benchmark

1GB
ibm-research/PowerMoE-3bQ4
551.92 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-1B-InstructQ4
551.64 tok/sEstimated

Auto-generated benchmark

1GB
Qwen/Qwen2.5-3B-InstructQ4
540.66 tok/sEstimated

Auto-generated benchmark

2GB
apple/OpenELM-1_1B-InstructQ4
539.42 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-1BQ4
538.03 tok/sEstimated

Auto-generated benchmark

1GB
WeiboAI/VibeThinker-1.5BQ4
527.13 tok/sEstimated

Auto-generated benchmark

1GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4
525.45 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.1-8BQ4
524.41 tok/sEstimated

Auto-generated benchmark

4GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4
524.21 tok/sEstimated

Auto-generated benchmark

2GB
google/gemma-3-270m-itQ4
524.18 tok/sEstimated

Auto-generated benchmark

4GB
google-t5/t5-3bQ4
524.13 tok/sEstimated

Auto-generated benchmark

2GB
tencent/HunyuanOCRQ4
524.11 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/DeepSeek-V3.1Q4
524.05 tok/sEstimated

Auto-generated benchmark

4GB
trl-internal-testing/tiny-random-LlamaForCausalLMQ4
523.49 tok/sEstimated

Auto-generated benchmark

4GB
huggyllama/llama-7bQ4
522.77 tok/sEstimated

Auto-generated benchmark

4GB
facebook/opt-125mQ4
522.55 tok/sEstimated

Auto-generated benchmark

4GB
microsoft/phi-4Q4
520.91 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Llama-3.2-3BQ4
518.65 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-Coder-1.5BQ4
517.32 tok/sEstimated

Auto-generated benchmark

3GB
Qwen/Qwen3-0.6B-BaseQ4
516.25 tok/sEstimated

Auto-generated benchmark

3GB
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bitQ4
515.80 tok/sEstimated

Auto-generated benchmark

4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bitQ4
515.66 tok/sEstimated

Auto-generated benchmark

2GB
black-forest-labs/FLUX.2-devQ4
515.42 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2.5-7BQ4
514.74 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoiceQ4
514.72 tok/sEstimated

Auto-generated benchmark

4GB
ibm-granite/granite-3.3-8b-instructQ4
514.63 tok/sEstimated

Auto-generated benchmark

4GB
microsoft/Phi-4-multimodal-instructQ4
514.07 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-OCR-2
Q4
2GB
649.00 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-1B-Instruct
Q4
1GB
627.73 tok/sEstimated
Auto-generated benchmark
LiquidAI/LFM2-1.2B
Q4
1GB
627.55 tok/sEstimated
Auto-generated benchmark
google/gemma-3-1b-it
Q4
1GB
625.75 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B
Q4
2GB
621.04 tok/sEstimated
Auto-generated benchmark
nari-labs/Dia2-2B
Q4
2GB
613.04 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B-Instruct
Q4
2GB
611.92 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-OCR
Q4
2GB
610.41 tok/sEstimated
Auto-generated benchmark
google/embeddinggemma-300m
Q4
1GB
605.34 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-ASR-1.7B
Q4
2GB
602.28 tok/sEstimated
Auto-generated benchmark
google-bert/bert-base-uncased
Q4
1GB
600.46 tok/sEstimated
Auto-generated benchmark
deepseek-ai/deepseek-coder-1.3b-instruct
Q4
2GB
586.19 tok/sEstimated
Auto-generated benchmark
nineninesix/kani-tts-2-en
Q4
1GB
585.96 tok/sEstimated
Auto-generated benchmark
facebook/sam3
Q4
1GB
585.38 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-2b-instruct
Q4
1GB
582.94 tok/sEstimated
Auto-generated benchmark
allenai/OLMo-2-0425-1B
Q4
1GB
579.90 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-3B-Instruct
Q4
2GB
578.12 tok/sEstimated
Auto-generated benchmark
bigcode/starcoder2-3b
Q4
2GB
570.73 tok/sEstimated
Auto-generated benchmark
google/gemma-2-2b-it
Q4
1GB
570.11 tok/sEstimated
Auto-generated benchmark
google/gemma-2b
Q4
1GB
561.32 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-Guard-3-1B
Q4
1GB
559.58 tok/sEstimated
Auto-generated benchmark
inference-net/Schematron-3B
Q4
2GB
558.55 tok/sEstimated
Auto-generated benchmark
unsloth/gemma-3-1b-it
Q4
1GB
557.67 tok/sEstimated
Auto-generated benchmark
ibm-research/PowerMoE-3b
Q4
2GB
551.92 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B-Instruct
Q4
1GB
551.64 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B-Instruct
Q4
2GB
540.66 tok/sEstimated
Auto-generated benchmark
apple/OpenELM-1_1B-Instruct
Q4
1GB
539.42 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B
Q4
1GB
538.03 tok/sEstimated
Auto-generated benchmark
WeiboAI/VibeThinker-1.5B
Q4
1GB
527.13 tok/sEstimated
Auto-generated benchmark
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Q4
1GB
525.45 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.1-8B
Q4
4GB
524.41 tok/sEstimated
Auto-generated benchmark
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
Q4
2GB
524.21 tok/sEstimated
Auto-generated benchmark
google/gemma-3-270m-it
Q4
4GB
524.18 tok/sEstimated
Auto-generated benchmark
google-t5/t5-3b
Q4
2GB
524.13 tok/sEstimated
Auto-generated benchmark
tencent/HunyuanOCR
Q4
1GB
524.11 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-V3.1
Q4
4GB
524.05 tok/sEstimated
Auto-generated benchmark
trl-internal-testing/tiny-random-LlamaForCausalLM
Q4
4GB
523.49 tok/sEstimated
Auto-generated benchmark
huggyllama/llama-7b
Q4
4GB
522.77 tok/sEstimated
Auto-generated benchmark
facebook/opt-125m
Q4
4GB
522.55 tok/sEstimated
Auto-generated benchmark
microsoft/phi-4
Q4
4GB
520.91 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B
Q4
2GB
518.65 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-Coder-1.5B
Q4
3GB
517.32 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-0.6B-Base
Q4
3GB
516.25 tok/sEstimated
Auto-generated benchmark
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
Q4
4GB
515.80 tok/sEstimated
Auto-generated benchmark
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit
Q4
2GB
515.66 tok/sEstimated
Auto-generated benchmark
black-forest-labs/FLUX.2-dev
Q4
4GB
515.42 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-7B
Q4
4GB
514.74 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
Q4
4GB
514.72 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-8b-instruct
Q4
4GB
514.63 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-4-multimodal-instruct
Q4
4GB
514.07 tok/sEstimated
Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

ModelQuantizationVerdictEstimated speedVRAM needed
openai-community/gpt2Q8Fits comfortably
312.62 tok/sEstimated
7GB (have 128GB)
openai-community/gpt2FP16Fits comfortably
193.24 tok/sEstimated
15GB (have 128GB)
Qwen/Qwen2.5-7B-InstructQ4Fits comfortably
475.87 tok/sEstimated
4GB (have 128GB)
Qwen/Qwen2.5-7B-InstructQ8Fits comfortably
348.85 tok/sEstimated
7GB (have 128GB)
Qwen/Qwen2.5-7B-InstructFP16Fits comfortably
177.09 tok/sEstimated
15GB (have 128GB)
Qwen/Qwen3-0.6BQ4Fits comfortably
476.00 tok/sEstimated
3GB (have 128GB)
Qwen/Qwen3-0.6BQ8Fits comfortably
341.41 tok/sEstimated
6GB (have 128GB)
Qwen/Qwen3-0.6BFP16Fits comfortably
194.29 tok/sEstimated
13GB (have 128GB)
Gensyn/Qwen2.5-0.5B-InstructQ4Fits comfortably
509.95 tok/sEstimated
3GB (have 128GB)
Gensyn/Qwen2.5-0.5B-InstructQ8Fits comfortably
334.88 tok/sEstimated
5GB (have 128GB)
Gensyn/Qwen2.5-0.5B-InstructFP16Fits comfortably
181.05 tok/sEstimated
11GB (have 128GB)
meta-llama/Llama-3.1-8B-InstructQ4Fits comfortably
489.47 tok/sEstimated
4GB (have 128GB)
meta-llama/Llama-3.1-8B-InstructQ8Fits comfortably
339.81 tok/sEstimated
9GB (have 128GB)
meta-llama/Llama-3.1-8B-InstructFP16Fits comfortably
165.15 tok/sEstimated
17GB (have 128GB)
dphn/dolphin-2.9.1-yi-1.5-34bQ4Fits comfortably
160.73 tok/sEstimated
17GB (have 128GB)
dphn/dolphin-2.9.1-yi-1.5-34bQ8Fits comfortably
107.15 tok/sEstimated
35GB (have 128GB)
dphn/dolphin-2.9.1-yi-1.5-34bFP16Fits comfortably
65.35 tok/sEstimated
70GB (have 128GB)
openai/gpt-oss-20bQ4Fits comfortably
282.04 tok/sEstimated
10GB (have 128GB)
openai/gpt-oss-20bQ8Fits comfortably
181.04 tok/sEstimated
20GB (have 128GB)
openai/gpt-oss-20bFP16Fits comfortably
103.74 tok/sEstimated
41GB (have 128GB)
google/gemma-3-1b-itQ4Fits comfortably
625.75 tok/sEstimated
1GB (have 128GB)
google/gemma-3-1b-itQ8Fits comfortably
391.47 tok/sEstimated
1GB (have 128GB)
google/gemma-3-1b-itFP16Fits comfortably
218.35 tok/sEstimated
2GB (have 128GB)
Qwen/Qwen3-Embedding-0.6BQ4Fits comfortably
456.09 tok/sEstimated
3GB (have 128GB)
Qwen/Qwen3-Embedding-0.6BQ8Fits comfortably
331.84 tok/sEstimated
6GB (have 128GB)
Qwen/Qwen3-Embedding-0.6BFP16Fits comfortably
189.24 tok/sEstimated
13GB (have 128GB)
Qwen/Qwen2.5-1.5B-InstructQ4Fits comfortably
512.33 tok/sEstimated
3GB (have 128GB)
Qwen/Qwen2.5-1.5B-InstructQ8Fits comfortably
330.17 tok/sEstimated
5GB (have 128GB)
Qwen/Qwen2.5-1.5B-InstructFP16Fits comfortably
168.04 tok/sEstimated
11GB (have 128GB)
facebook/opt-125mQ4Fits comfortably
522.55 tok/sEstimated
4GB (have 128GB)
facebook/opt-125mQ8Fits comfortably
324.44 tok/sEstimated
7GB (have 128GB)
facebook/opt-125mFP16Fits comfortably
171.27 tok/sEstimated
15GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4Fits comfortably
525.45 tok/sEstimated
1GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8Fits comfortably
432.51 tok/sEstimated
1GB (have 128GB)
TinyLlama/TinyLlama-1.1B-Chat-v1.0FP16Fits comfortably
225.37 tok/sEstimated
2GB (have 128GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4Fits comfortably
461.27 tok/sEstimated
4GB (have 128GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8Fits comfortably
307.58 tok/sEstimated
7GB (have 128GB)
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5FP16Fits comfortably
187.74 tok/sEstimated
15GB (have 128GB)
Qwen/Qwen3-4B-Instruct-2507Q4Fits comfortably
456.66 tok/sEstimated
2GB (have 128GB)
Qwen/Qwen3-4B-Instruct-2507Q8Fits comfortably
331.01 tok/sEstimated
4GB (have 128GB)
Qwen/Qwen3-4B-Instruct-2507FP16Fits comfortably
186.36 tok/sEstimated
9GB (have 128GB)
meta-llama/Llama-3.2-1B-InstructQ4Fits comfortably
551.64 tok/sEstimated
1GB (have 128GB)
meta-llama/Llama-3.2-1B-InstructQ8Fits comfortably
415.87 tok/sEstimated
1GB (have 128GB)
meta-llama/Llama-3.2-1B-InstructFP16Fits comfortably
237.03 tok/sEstimated
2GB (have 128GB)
openai/gpt-oss-120bQ4Fits comfortably
97.11 tok/sEstimated
59GB (have 128GB)
openai/gpt-oss-120bQ8Fits comfortably
67.62 tok/sEstimated
117GB (have 128GB)
openai/gpt-oss-120bFP16Not supported
39.17 tok/sEstimated
235GB (have 128GB)
Qwen/Qwen2.5-3B-InstructQ4Fits comfortably
540.66 tok/sEstimated
2GB (have 128GB)
Qwen/Qwen2.5-3B-InstructQ8Fits comfortably
384.99 tok/sEstimated
3GB (have 128GB)
openai-community/gpt2Q4Fits comfortably
457.53 tok/sEstimated
4GB (have 128GB)
openai-community/gpt2Q8
Fits comfortably7GB required · 128GB available
312.62 tok/sEstimated
openai-community/gpt2FP16
Fits comfortably15GB required · 128GB available
193.24 tok/sEstimated
Qwen/Qwen2.5-7B-InstructQ4
Fits comfortably4GB required · 128GB available
475.87 tok/sEstimated
Qwen/Qwen2.5-7B-InstructQ8
Fits comfortably7GB required · 128GB available
348.85 tok/sEstimated
Qwen/Qwen2.5-7B-InstructFP16
Fits comfortably15GB required · 128GB available
177.09 tok/sEstimated
Qwen/Qwen3-0.6BQ4
Fits comfortably3GB required · 128GB available
476.00 tok/sEstimated
Qwen/Qwen3-0.6BQ8
Fits comfortably6GB required · 128GB available
341.41 tok/sEstimated
Qwen/Qwen3-0.6BFP16
Fits comfortably13GB required · 128GB available
194.29 tok/sEstimated
Gensyn/Qwen2.5-0.5B-InstructQ4
Fits comfortably3GB required · 128GB available
509.95 tok/sEstimated
Gensyn/Qwen2.5-0.5B-InstructQ8
Fits comfortably5GB required · 128GB available
334.88 tok/sEstimated
Gensyn/Qwen2.5-0.5B-InstructFP16
Fits comfortably11GB required · 128GB available
181.05 tok/sEstimated
meta-llama/Llama-3.1-8B-InstructQ4
Fits comfortably4GB required · 128GB available
489.47 tok/sEstimated
meta-llama/Llama-3.1-8B-InstructQ8
Fits comfortably9GB required · 128GB available
339.81 tok/sEstimated
meta-llama/Llama-3.1-8B-InstructFP16
Fits comfortably17GB required · 128GB available
165.15 tok/sEstimated
dphn/dolphin-2.9.1-yi-1.5-34bQ4
Fits comfortably17GB required · 128GB available
160.73 tok/sEstimated
dphn/dolphin-2.9.1-yi-1.5-34bQ8
Fits comfortably35GB required · 128GB available
107.15 tok/sEstimated
dphn/dolphin-2.9.1-yi-1.5-34bFP16
Fits comfortably70GB required · 128GB available
65.35 tok/sEstimated
openai/gpt-oss-20bQ4
Fits comfortably10GB required · 128GB available
282.04 tok/sEstimated
openai/gpt-oss-20bQ8
Fits comfortably20GB required · 128GB available
181.04 tok/sEstimated
openai/gpt-oss-20bFP16
Fits comfortably41GB required · 128GB available
103.74 tok/sEstimated
google/gemma-3-1b-itQ4
Fits comfortably1GB required · 128GB available
625.75 tok/sEstimated
google/gemma-3-1b-itQ8
Fits comfortably1GB required · 128GB available
391.47 tok/sEstimated
google/gemma-3-1b-itFP16
Fits comfortably2GB required · 128GB available
218.35 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BQ4
Fits comfortably3GB required · 128GB available
456.09 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BQ8
Fits comfortably6GB required · 128GB available
331.84 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BFP16
Fits comfortably13GB required · 128GB available
189.24 tok/sEstimated
Qwen/Qwen2.5-1.5B-InstructQ4
Fits comfortably3GB required · 128GB available
512.33 tok/sEstimated
Qwen/Qwen2.5-1.5B-InstructQ8
Fits comfortably5GB required · 128GB available
330.17 tok/sEstimated
Qwen/Qwen2.5-1.5B-InstructFP16
Fits comfortably11GB required · 128GB available
168.04 tok/sEstimated
facebook/opt-125mQ4
Fits comfortably4GB required · 128GB available
522.55 tok/sEstimated
facebook/opt-125mQ8
Fits comfortably7GB required · 128GB available
324.44 tok/sEstimated
facebook/opt-125mFP16
Fits comfortably15GB required · 128GB available
171.27 tok/sEstimated
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4
Fits comfortably1GB required · 128GB available
525.45 tok/sEstimated
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q8
Fits comfortably1GB required · 128GB available
432.51 tok/sEstimated
TinyLlama/TinyLlama-1.1B-Chat-v1.0FP16
Fits comfortably2GB required · 128GB available
225.37 tok/sEstimated
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q4
Fits comfortably4GB required · 128GB available
461.27 tok/sEstimated
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5Q8
Fits comfortably7GB required · 128GB available
307.58 tok/sEstimated
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5FP16
Fits comfortably15GB required · 128GB available
187.74 tok/sEstimated
Qwen/Qwen3-4B-Instruct-2507Q4
Fits comfortably2GB required · 128GB available
456.66 tok/sEstimated
Qwen/Qwen3-4B-Instruct-2507Q8
Fits comfortably4GB required · 128GB available
331.01 tok/sEstimated
Qwen/Qwen3-4B-Instruct-2507FP16
Fits comfortably9GB required · 128GB available
186.36 tok/sEstimated
meta-llama/Llama-3.2-1B-InstructQ4
Fits comfortably1GB required · 128GB available
551.64 tok/sEstimated
meta-llama/Llama-3.2-1B-InstructQ8
Fits comfortably1GB required · 128GB available
415.87 tok/sEstimated
meta-llama/Llama-3.2-1B-InstructFP16
Fits comfortably2GB required · 128GB available
237.03 tok/sEstimated
openai/gpt-oss-120bQ4
Fits comfortably59GB required · 128GB available
97.11 tok/sEstimated
openai/gpt-oss-120bQ8
Fits comfortably117GB required · 128GB available
67.62 tok/sEstimated
openai/gpt-oss-120bFP16
Not supported235GB required · 128GB available
39.17 tok/sEstimated
Qwen/Qwen2.5-3B-InstructQ4
Fits comfortably2GB required · 128GB available
540.66 tok/sEstimated
Qwen/Qwen2.5-3B-InstructQ8
Fits comfortably3GB required · 128GB available
384.99 tok/sEstimated
openai-community/gpt2Q4
Fits comfortably4GB required · 128GB available
457.53 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

No purchase links available yet. Try the Amazon search results to find this GPU.
Complete Your Build

Essential accessories to pair with AMD Instinct MI250X

Corsair RM750x (2025) 750W
Minimum 750W recommended for RTX 40 series
$119
Find on Amazon
Corsair Vengeance 32GB DDR5-6000
32GB ideal for AI workloads
$129
Find on Amazon
Noctua NF-A12x25 PWM
Quiet and efficient cooling
$35
Find on Amazon
Thermal Grizzly Kryonaut
Premium thermal paste for optimal cooling
$15
Find on Amazon

Total Bundle Price

All items from Amazon

$797
Individual: $797
Find All on AmazonMore GPUs

💡 Not ready to buy? Try cloud GPUs first

Test AMD Instinct MI250X performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hrRunPodfrom $0.30/hrLambda Labsenterprise-grade

Alternative GPUs

RTX 5070
12GB

Explore how RTX 5070 stacks up for local inference workloads.

RTX 4060 Ti 16GB
16GB

Explore how RTX 4060 Ti 16GB stacks up for local inference workloads.

RX 6800 XT
16GB

Explore how RX 6800 XT stacks up for local inference workloads.

RTX 4070 Super
12GB

Explore how RTX 4070 Super stacks up for local inference workloads.

RTX 3080
10GB

Explore how RTX 3080 stacks up for local inference workloads.

Can it play popular games?

Cyberpunk 2077
8GB VRAM

RPG • 2020

Baldur's Gate 3
8GB VRAM

RPG • 2023

Hogwarts Legacy
12GB VRAM

Action RPG • 2023

Starfield
8GB VRAM

RPG • 2023

Alan Wake 2
12GB VRAM

Survival Horror • 2023

Elden Ring
8GB VRAM

Action RPG • 2022

Black Myth: Wukong
12GB VRAM

Action RPG • 2024

Grand Theft Auto VI
12GB VRAM

Action Adventure • 2025

Resident Evil 4 Remake
12GB VRAM

Survival Horror • 2023

Marvel's Spider-Man Remastered
12GB VRAM

Action • 2022

The Last of Us Part I
12GB VRAM

Action Adventure • 2023

Red Dead Redemption 2
8GB VRAM

Action Adventure • 2019

View all 64 compatible games