L
localai.computer
ModelsGPUsSystemsAI SetupsBuildsMethodology

Resources

  • Methodology
  • Submit Benchmark
  • About

Browse

  • AI Models
  • GPUs
  • PC Builds

Community

  • Leaderboard

Legal

  • Privacy
  • Terms
  • Contact

© 2025 localai.computer. Hardware recommendations for running AI models locally.

ℹ️We earn from qualifying purchases through affiliate links at no extra cost to you. This supports our free content and research.

  1. Home
  2. GPUs
  3. NVIDIA A4000

Quick Answer: NVIDIA A4000 offers 16GB VRAM and starts around $21.99. It delivers approximately 101 tokens/sec on Qwen/Qwen2.5-3B. It typically draws 140W under load.

NVIDIA A4000

Unknown
By NVIDIAReleased 2021-04MSRP $999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Buy on Amazon - $21.99View Benchmarks
Specs snapshot
Key hardware metrics for AI workloads.
VRAM16GB
Cores6,144
TDP140W
ArchitectureAmpere

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown
$21.99
Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA A4000 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hrRunPodfrom $0.30/hrLambda Labsenterprise-grade

AI benchmarks

ModelQuantizationTokens/secVRAM used
Qwen/Qwen2.5-3BQ4
101.43 tok/sEstimated

Auto-generated benchmark

2GB
allenai/OLMo-2-0425-1BQ4
100.22 tok/sEstimated

Auto-generated benchmark

1GB
LiquidAI/LFM2-1.2BQ4
100.11 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/DeepSeek-OCRQ4
99.96 tok/sEstimated

Auto-generated benchmark

2GB
google-t5/t5-3bQ4
99.32 tok/sEstimated

Auto-generated benchmark

2GB
bigcode/starcoder2-3bQ4
99.02 tok/sEstimated

Auto-generated benchmark

2GB
google/gemma-2-2b-itQ4
98.78 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-2bQ4
97.97 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-3-1b-itQ4
96.76 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-Guard-3-1BQ4
96.67 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-1B-InstructQ4
96.14 tok/sEstimated

Auto-generated benchmark

1GB
google/embeddinggemma-300mQ4
95.54 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-1B-InstructQ4
93.81 tok/sEstimated

Auto-generated benchmark

1GB
inference-net/Schematron-3BQ4
93.21 tok/sEstimated

Auto-generated benchmark

2GB
ibm-research/PowerMoE-3bQ4
93.03 tok/sEstimated

Auto-generated benchmark

2GB
tencent/HunyuanOCRQ4
92.59 tok/sEstimated

Auto-generated benchmark

1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4
91.82 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-1BQ4
91.12 tok/sEstimated

Auto-generated benchmark

1GB
apple/OpenELM-1_1B-InstructQ4
91.12 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-3BQ4
89.93 tok/sEstimated

Auto-generated benchmark

2GB
deepseek-ai/deepseek-coder-1.3b-instructQ4
89.71 tok/sEstimated

Auto-generated benchmark

2GB
WeiboAI/VibeThinker-1.5BQ4
89.41 tok/sEstimated

Auto-generated benchmark

1GB
facebook/sam3Q4
89.24 tok/sEstimated

Auto-generated benchmark

1GB
google-bert/bert-base-uncasedQ4
88.91 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-3B-InstructQ4
87.99 tok/sEstimated

Auto-generated benchmark

2GB
unsloth/gemma-3-1b-itQ4
86.96 tok/sEstimated

Auto-generated benchmark

1GB
nari-labs/Dia2-2BQ4
86.92 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-3B-InstructQ4
85.94 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-Coder-1.5BQ4
84.34 tok/sEstimated

Auto-generated benchmark

3GB
trl-internal-testing/tiny-LlamaForCausalLM-3.2Q4
84.34 tok/sEstimated

Auto-generated benchmark

4GB
allenai/Olmo-3-7B-ThinkQ4
84.19 tok/sEstimated

Auto-generated benchmark

4GB
microsoft/Phi-3-mini-4k-instructQ4
84.16 tok/sEstimated

Auto-generated benchmark

4GB
ibm-granite/granite-3.3-2b-instructQ4
84.08 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-3B-InstructQ4
83.93 tok/sEstimated

Auto-generated benchmark

2GB
HuggingFaceH4/zephyr-7b-betaQ4
83.76 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-4B-BaseQ4
83.72 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen3-8BQ4
83.66 tok/sEstimated

Auto-generated benchmark

4GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4
83.60 tok/sEstimated

Auto-generated benchmark

1GB
mistralai/Mistral-7B-v0.1Q4
83.60 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2.5-0.5B-InstructQ4
82.85 tok/sEstimated

Auto-generated benchmark

3GB
dicta-il/dictalm2.0-instructQ4
82.80 tok/sEstimated

Auto-generated benchmark

4GB
EleutherAI/gpt-neo-125mQ4
82.71 tok/sEstimated

Auto-generated benchmark

4GB
bigscience/bloomz-560mQ4
82.70 tok/sEstimated

Auto-generated benchmark

4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bitQ4
82.56 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Meta-Llama-3-8BQ4
82.11 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Llama-2-7b-chat-hfQ4
81.90 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen2-0.5BQ4
81.86 tok/sEstimated

Auto-generated benchmark

3GB
swiss-ai/Apertus-8B-Instruct-2509Q4
81.64 tok/sEstimated

Auto-generated benchmark

4GB
trl-internal-testing/tiny-random-LlamaForCausalLMQ4
81.43 tok/sEstimated

Auto-generated benchmark

4GB
Alibaba-NLP/gte-Qwen2-1.5B-instructQ4
81.38 tok/sEstimated

Auto-generated benchmark

3GB
Qwen/Qwen2.5-3B
Q4
2GB
101.43 tok/sEstimated
Auto-generated benchmark
allenai/OLMo-2-0425-1B
Q4
1GB
100.22 tok/sEstimated
Auto-generated benchmark
LiquidAI/LFM2-1.2B
Q4
1GB
100.11 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-OCR
Q4
2GB
99.96 tok/sEstimated
Auto-generated benchmark
google-t5/t5-3b
Q4
2GB
99.32 tok/sEstimated
Auto-generated benchmark
bigcode/starcoder2-3b
Q4
2GB
99.02 tok/sEstimated
Auto-generated benchmark
google/gemma-2-2b-it
Q4
1GB
98.78 tok/sEstimated
Auto-generated benchmark
google/gemma-2b
Q4
1GB
97.97 tok/sEstimated
Auto-generated benchmark
google/gemma-3-1b-it
Q4
1GB
96.76 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-Guard-3-1B
Q4
1GB
96.67 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B-Instruct
Q4
1GB
96.14 tok/sEstimated
Auto-generated benchmark
google/embeddinggemma-300m
Q4
1GB
95.54 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-1B-Instruct
Q4
1GB
93.81 tok/sEstimated
Auto-generated benchmark
inference-net/Schematron-3B
Q4
2GB
93.21 tok/sEstimated
Auto-generated benchmark
ibm-research/PowerMoE-3b
Q4
2GB
93.03 tok/sEstimated
Auto-generated benchmark
tencent/HunyuanOCR
Q4
1GB
92.59 tok/sEstimated
Auto-generated benchmark
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
Q4
2GB
91.82 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B
Q4
1GB
91.12 tok/sEstimated
Auto-generated benchmark
apple/OpenELM-1_1B-Instruct
Q4
1GB
91.12 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B
Q4
2GB
89.93 tok/sEstimated
Auto-generated benchmark
deepseek-ai/deepseek-coder-1.3b-instruct
Q4
2GB
89.71 tok/sEstimated
Auto-generated benchmark
WeiboAI/VibeThinker-1.5B
Q4
1GB
89.41 tok/sEstimated
Auto-generated benchmark
facebook/sam3
Q4
1GB
89.24 tok/sEstimated
Auto-generated benchmark
google-bert/bert-base-uncased
Q4
1GB
88.91 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-3B-Instruct
Q4
2GB
87.99 tok/sEstimated
Auto-generated benchmark
unsloth/gemma-3-1b-it
Q4
1GB
86.96 tok/sEstimated
Auto-generated benchmark
nari-labs/Dia2-2B
Q4
2GB
86.92 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B-Instruct
Q4
2GB
85.94 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-Coder-1.5B
Q4
3GB
84.34 tok/sEstimated
Auto-generated benchmark
trl-internal-testing/tiny-LlamaForCausalLM-3.2
Q4
4GB
84.34 tok/sEstimated
Auto-generated benchmark
allenai/Olmo-3-7B-Think
Q4
4GB
84.19 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-3-mini-4k-instruct
Q4
4GB
84.16 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-2b-instruct
Q4
1GB
84.08 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B-Instruct
Q4
2GB
83.93 tok/sEstimated
Auto-generated benchmark
HuggingFaceH4/zephyr-7b-beta
Q4
4GB
83.76 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-4B-Base
Q4
2GB
83.72 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-8B
Q4
4GB
83.66 tok/sEstimated
Auto-generated benchmark
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Q4
1GB
83.60 tok/sEstimated
Auto-generated benchmark
mistralai/Mistral-7B-v0.1
Q4
4GB
83.60 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-0.5B-Instruct
Q4
3GB
82.85 tok/sEstimated
Auto-generated benchmark
dicta-il/dictalm2.0-instruct
Q4
4GB
82.80 tok/sEstimated
Auto-generated benchmark
EleutherAI/gpt-neo-125m
Q4
4GB
82.71 tok/sEstimated
Auto-generated benchmark
bigscience/bloomz-560m
Q4
4GB
82.70 tok/sEstimated
Auto-generated benchmark
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit
Q4
2GB
82.56 tok/sEstimated
Auto-generated benchmark
meta-llama/Meta-Llama-3-8B
Q4
4GB
82.11 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-2-7b-chat-hf
Q4
4GB
81.90 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2-0.5B
Q4
3GB
81.86 tok/sEstimated
Auto-generated benchmark
swiss-ai/Apertus-8B-Instruct-2509
Q4
4GB
81.64 tok/sEstimated
Auto-generated benchmark
trl-internal-testing/tiny-random-LlamaForCausalLM
Q4
4GB
81.43 tok/sEstimated
Auto-generated benchmark
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Q4
3GB
81.38 tok/sEstimated
Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

ModelQuantizationVerdictEstimated speedVRAM needed
dphn/dolphin-2.9.1-yi-1.5-34bQ8Not supported
19.26 tok/sEstimated
35GB (have 16GB)
dphn/dolphin-2.9.1-yi-1.5-34bFP16Not supported
9.99 tok/sEstimated
70GB (have 16GB)
openai/gpt-oss-20bQ4Fits comfortably
38.59 tok/sEstimated
10GB (have 16GB)
openai/gpt-oss-20bQ8Not supported
27.86 tok/sEstimated
20GB (have 16GB)
openai/gpt-oss-20bFP16Not supported
16.95 tok/sEstimated
41GB (have 16GB)
google/gemma-3-1b-itQ4Fits comfortably
96.76 tok/sEstimated
1GB (have 16GB)
google/gemma-3-1b-itFP16Fits comfortably
36.87 tok/sEstimated
2GB (have 16GB)
Qwen/Qwen3-Embedding-0.6BQ4Fits comfortably
74.17 tok/sEstimated
3GB (have 16GB)
Qwen/Qwen3-Embedding-0.6BQ8Fits comfortably
57.22 tok/sEstimated
6GB (have 16GB)
Qwen/Qwen3-Embedding-0.6BFP16Fits comfortably
28.60 tok/sEstimated
13GB (have 16GB)
meta-llama/Llama-3.2-1B-InstructQ8Fits comfortably
70.70 tok/sEstimated
1GB (have 16GB)
meta-llama/Llama-3.2-1BQ8Fits comfortably
70.78 tok/sEstimated
1GB (have 16GB)
Qwen/Qwen3-4BFP16Fits comfortably
27.90 tok/sEstimated
9GB (have 16GB)
Qwen/Qwen3-30B-A3B-Instruct-2507Q4Fits (tight)
46.18 tok/sEstimated
15GB (have 16GB)
meta-llama/Meta-Llama-3-8B-InstructQ4Fits comfortably
79.06 tok/sEstimated
4GB (have 16GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4Fits comfortably
76.74 tok/sEstimated
3GB (have 16GB)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8Fits comfortably
51.65 tok/sEstimated
5GB (have 16GB)
Qwen/Qwen2.5-1.5BFP16Fits comfortably
27.62 tok/sEstimated
11GB (have 16GB)
Qwen/Qwen2.5-14B-InstructQ4Fits comfortably
62.68 tok/sEstimated
7GB (have 16GB)
Qwen/Qwen3-Embedding-8BQ8Fits comfortably
54.58 tok/sEstimated
9GB (have 16GB)
Qwen/Qwen3-Embedding-8BFP16Not supported
31.39 tok/sEstimated
17GB (have 16GB)
Qwen/Qwen2-0.5BFP16Fits comfortably
31.18 tok/sEstimated
11GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instructQ4Fits comfortably
89.71 tok/sEstimated
2GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instructQ8Fits comfortably
66.51 tok/sEstimated
3GB (have 16GB)
deepseek-ai/deepseek-coder-1.3b-instructFP16Fits comfortably
32.68 tok/sEstimated
6GB (have 16GB)
microsoft/phi-4Q4Fits comfortably
78.34 tok/sEstimated
4GB (have 16GB)
microsoft/phi-4Q8Fits comfortably
50.77 tok/sEstimated
7GB (have 16GB)
microsoft/phi-4FP16Fits (tight)
26.76 tok/sEstimated
15GB (have 16GB)
deepseek-ai/DeepSeek-V3.1Q4Fits comfortably
79.19 tok/sEstimated
4GB (have 16GB)
deepseek-ai/DeepSeek-V3.1Q8Fits comfortably
53.78 tok/sEstimated
7GB (have 16GB)
deepseek-ai/DeepSeek-V3.1FP16Fits (tight)
30.68 tok/sEstimated
15GB (have 16GB)
meta-llama/Llama-3.1-8BQ4Fits comfortably
78.38 tok/sEstimated
4GB (have 16GB)
meta-llama/Llama-3.1-8BQ8Fits comfortably
58.68 tok/sEstimated
9GB (have 16GB)
meta-llama/Llama-3.1-8BFP16Not supported
29.36 tok/sEstimated
17GB (have 16GB)
LiquidAI/LFM2-1.2BQ4Fits comfortably
100.11 tok/sEstimated
1GB (have 16GB)
LiquidAI/LFM2-1.2BQ8Fits comfortably
65.32 tok/sEstimated
2GB (have 16GB)
LiquidAI/LFM2-1.2BFP16Fits comfortably
31.57 tok/sEstimated
4GB (have 16GB)
meta-llama/Meta-Llama-3-70B-InstructQ4Not supported
28.07 tok/sEstimated
34GB (have 16GB)
meta-llama/Meta-Llama-3-70B-InstructQ8Not supported
17.91 tok/sEstimated
68GB (have 16GB)
meta-llama/Meta-Llama-3-70B-InstructFP16Not supported
10.04 tok/sEstimated
137GB (have 16GB)
unsloth/gpt-oss-20b-BF16Q4Fits comfortably
46.37 tok/sEstimated
10GB (have 16GB)
unsloth/gpt-oss-20b-BF16FP16Not supported
15.11 tok/sEstimated
41GB (have 16GB)
HuggingFaceTB/SmolLM-135MQ4Fits comfortably
72.01 tok/sEstimated
4GB (have 16GB)
HuggingFaceTB/SmolLM-135MQ8Fits comfortably
52.47 tok/sEstimated
7GB (have 16GB)
HuggingFaceTB/SmolLM-135MFP16Fits (tight)
26.92 tok/sEstimated
15GB (have 16GB)
Qwen/Qwen2.5-Math-1.5BQ4Fits comfortably
76.90 tok/sEstimated
3GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLMQ4Fits comfortably
81.43 tok/sEstimated
4GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLMQ8Fits comfortably
52.13 tok/sEstimated
7GB (have 16GB)
trl-internal-testing/tiny-random-LlamaForCausalLMFP16Fits (tight)
27.74 tok/sEstimated
15GB (have 16GB)
openai-community/gpt2Q4Fits comfortably
70.73 tok/sEstimated
4GB (have 16GB)
dphn/dolphin-2.9.1-yi-1.5-34bQ8
Not supported35GB required · 16GB available
19.26 tok/sEstimated
dphn/dolphin-2.9.1-yi-1.5-34bFP16
Not supported70GB required · 16GB available
9.99 tok/sEstimated
openai/gpt-oss-20bQ4
Fits comfortably10GB required · 16GB available
38.59 tok/sEstimated
openai/gpt-oss-20bQ8
Not supported20GB required · 16GB available
27.86 tok/sEstimated
openai/gpt-oss-20bFP16
Not supported41GB required · 16GB available
16.95 tok/sEstimated
google/gemma-3-1b-itQ4
Fits comfortably1GB required · 16GB available
96.76 tok/sEstimated
google/gemma-3-1b-itFP16
Fits comfortably2GB required · 16GB available
36.87 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BQ4
Fits comfortably3GB required · 16GB available
74.17 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BQ8
Fits comfortably6GB required · 16GB available
57.22 tok/sEstimated
Qwen/Qwen3-Embedding-0.6BFP16
Fits comfortably13GB required · 16GB available
28.60 tok/sEstimated
meta-llama/Llama-3.2-1B-InstructQ8
Fits comfortably1GB required · 16GB available
70.70 tok/sEstimated
meta-llama/Llama-3.2-1BQ8
Fits comfortably1GB required · 16GB available
70.78 tok/sEstimated
Qwen/Qwen3-4BFP16
Fits comfortably9GB required · 16GB available
27.90 tok/sEstimated
Qwen/Qwen3-30B-A3B-Instruct-2507Q4
Fits (tight)15GB required · 16GB available
46.18 tok/sEstimated
meta-llama/Meta-Llama-3-8B-InstructQ4
Fits comfortably4GB required · 16GB available
79.06 tok/sEstimated
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ4
Fits comfortably3GB required · 16GB available
76.74 tok/sEstimated
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BQ8
Fits comfortably5GB required · 16GB available
51.65 tok/sEstimated
Qwen/Qwen2.5-1.5BFP16
Fits comfortably11GB required · 16GB available
27.62 tok/sEstimated
Qwen/Qwen2.5-14B-InstructQ4
Fits comfortably7GB required · 16GB available
62.68 tok/sEstimated
Qwen/Qwen3-Embedding-8BQ8
Fits comfortably9GB required · 16GB available
54.58 tok/sEstimated
Qwen/Qwen3-Embedding-8BFP16
Not supported17GB required · 16GB available
31.39 tok/sEstimated
Qwen/Qwen2-0.5BFP16
Fits comfortably11GB required · 16GB available
31.18 tok/sEstimated
deepseek-ai/deepseek-coder-1.3b-instructQ4
Fits comfortably2GB required · 16GB available
89.71 tok/sEstimated
deepseek-ai/deepseek-coder-1.3b-instructQ8
Fits comfortably3GB required · 16GB available
66.51 tok/sEstimated
deepseek-ai/deepseek-coder-1.3b-instructFP16
Fits comfortably6GB required · 16GB available
32.68 tok/sEstimated
microsoft/phi-4Q4
Fits comfortably4GB required · 16GB available
78.34 tok/sEstimated
microsoft/phi-4Q8
Fits comfortably7GB required · 16GB available
50.77 tok/sEstimated
microsoft/phi-4FP16
Fits (tight)15GB required · 16GB available
26.76 tok/sEstimated
deepseek-ai/DeepSeek-V3.1Q4
Fits comfortably4GB required · 16GB available
79.19 tok/sEstimated
deepseek-ai/DeepSeek-V3.1Q8
Fits comfortably7GB required · 16GB available
53.78 tok/sEstimated
deepseek-ai/DeepSeek-V3.1FP16
Fits (tight)15GB required · 16GB available
30.68 tok/sEstimated
meta-llama/Llama-3.1-8BQ4
Fits comfortably4GB required · 16GB available
78.38 tok/sEstimated
meta-llama/Llama-3.1-8BQ8
Fits comfortably9GB required · 16GB available
58.68 tok/sEstimated
meta-llama/Llama-3.1-8BFP16
Not supported17GB required · 16GB available
29.36 tok/sEstimated
LiquidAI/LFM2-1.2BQ4
Fits comfortably1GB required · 16GB available
100.11 tok/sEstimated
LiquidAI/LFM2-1.2BQ8
Fits comfortably2GB required · 16GB available
65.32 tok/sEstimated
LiquidAI/LFM2-1.2BFP16
Fits comfortably4GB required · 16GB available
31.57 tok/sEstimated
meta-llama/Meta-Llama-3-70B-InstructQ4
Not supported34GB required · 16GB available
28.07 tok/sEstimated
meta-llama/Meta-Llama-3-70B-InstructQ8
Not supported68GB required · 16GB available
17.91 tok/sEstimated
meta-llama/Meta-Llama-3-70B-InstructFP16
Not supported137GB required · 16GB available
10.04 tok/sEstimated
unsloth/gpt-oss-20b-BF16Q4
Fits comfortably10GB required · 16GB available
46.37 tok/sEstimated
unsloth/gpt-oss-20b-BF16FP16
Not supported41GB required · 16GB available
15.11 tok/sEstimated
HuggingFaceTB/SmolLM-135MQ4
Fits comfortably4GB required · 16GB available
72.01 tok/sEstimated
HuggingFaceTB/SmolLM-135MQ8
Fits comfortably7GB required · 16GB available
52.47 tok/sEstimated
HuggingFaceTB/SmolLM-135MFP16
Fits (tight)15GB required · 16GB available
26.92 tok/sEstimated
Qwen/Qwen2.5-Math-1.5BQ4
Fits comfortably3GB required · 16GB available
76.90 tok/sEstimated
trl-internal-testing/tiny-random-LlamaForCausalLMQ4
Fits comfortably4GB required · 16GB available
81.43 tok/sEstimated
trl-internal-testing/tiny-random-LlamaForCausalLMQ8
Fits comfortably7GB required · 16GB available
52.13 tok/sEstimated
trl-internal-testing/tiny-random-LlamaForCausalLMFP16
Fits (tight)15GB required · 16GB available
27.74 tok/sEstimated
openai-community/gpt2Q4
Fits comfortably4GB required · 16GB available
70.73 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

NVIDIA A5000
24GB

Explore how NVIDIA A5000 stacks up for local inference workloads.

NVIDIA A6000
48GB

Explore how NVIDIA A6000 stacks up for local inference workloads.

RTX 4080
16GB

Explore how RTX 4080 stacks up for local inference workloads.

RTX 4070
12GB

Explore how RTX 4070 stacks up for local inference workloads.

RTX 3060 12GB
12GB

Explore how RTX 3060 12GB stacks up for local inference workloads.