L
localai.computer
ModelsGPUsSystemsAI SetupsBuildsMethodology

Resources

  • Methodology
  • Submit Benchmark
  • About

Browse

  • AI Models
  • GPUs
  • PC Builds

Community

  • Leaderboard

Legal

  • Privacy
  • Terms
  • Contact

© 2025 localai.computer. Hardware recommendations for running AI models locally.

ℹ️We earn from qualifying purchases through affiliate links at no extra cost to you. This supports our free content and research.

  1. Home
  2. GPUs
  3. NVIDIA L40

Quick Answer: NVIDIA L40 offers 48GB VRAM and starts around current market pricing. It delivers approximately 217 tokens/sec on meta-llama/Llama-3.2-1B-Instruct. It typically draws 300W under load.

NVIDIA L40

Unknown
By NVIDIAReleased 2022-10MSRP $7,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on AmazonView Benchmarks
Specs snapshot
Key hardware metrics for AI workloads.
VRAM48GB
Cores18,176
TDP300W
ArchitectureAda Lovelace

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown
See price on Amazon
Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA L40 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hrRunPodfrom $0.30/hrLambda Labsenterprise-grade

AI benchmarks

ModelQuantizationTokens/secVRAM used
meta-llama/Llama-3.2-1B-InstructQ4
217.30 tok/sEstimated

Auto-generated benchmark

1GB
apple/OpenELM-1_1B-InstructQ4
214.24 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-Guard-3-1BQ4
214.03 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-1BQ4
212.74 tok/sEstimated

Auto-generated benchmark

1GB
google-t5/t5-3bQ4
212.54 tok/sEstimated

Auto-generated benchmark

2GB
google/gemma-2-2b-itQ4
212.05 tok/sEstimated

Auto-generated benchmark

1GB
google/embeddinggemma-300mQ4
211.11 tok/sEstimated

Auto-generated benchmark

1GB
facebook/sam3Q4
210.12 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/DeepSeek-OCRQ4
204.89 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-3B-InstructQ4
203.91 tok/sEstimated

Auto-generated benchmark

2GB
Qwen/Qwen2.5-3BQ4
203.65 tok/sEstimated

Auto-generated benchmark

2GB
inference-net/Schematron-3BQ4
201.76 tok/sEstimated

Auto-generated benchmark

2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0Q4
200.78 tok/sEstimated

Auto-generated benchmark

1GB
google/gemma-3-1b-itQ4
199.63 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/gemma-3-1b-itQ4
199.35 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-3B-InstructQ4
196.99 tok/sEstimated

Auto-generated benchmark

2GB
tencent/HunyuanOCRQ4
195.04 tok/sEstimated

Auto-generated benchmark

1GB
WeiboAI/VibeThinker-1.5BQ4
193.63 tok/sEstimated

Auto-generated benchmark

1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16Q4
189.25 tok/sEstimated

Auto-generated benchmark

2GB
ibm-research/PowerMoE-3bQ4
188.19 tok/sEstimated

Auto-generated benchmark

2GB
meta-llama/Llama-3.2-3BQ4
187.25 tok/sEstimated

Auto-generated benchmark

2GB
bigcode/starcoder2-3bQ4
187.09 tok/sEstimated

Auto-generated benchmark

2GB
google-bert/bert-base-uncasedQ4
186.38 tok/sEstimated

Auto-generated benchmark

1GB
unsloth/Llama-3.2-1B-InstructQ4
185.56 tok/sEstimated

Auto-generated benchmark

1GB
nari-labs/Dia2-2BQ4
185.02 tok/sEstimated

Auto-generated benchmark

2GB
LiquidAI/LFM2-1.2BQ4
184.67 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.2-3B-InstructQ4
184.49 tok/sEstimated

Auto-generated benchmark

2GB
ibm-granite/granite-3.3-2b-instructQ4
184.04 tok/sEstimated

Auto-generated benchmark

1GB
deepseek-ai/deepseek-coder-1.3b-instructQ4
182.66 tok/sEstimated

Auto-generated benchmark

2GB
microsoft/phi-2Q4
181.91 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-8B-BaseQ4
181.70 tok/sEstimated

Auto-generated benchmark

4GB
microsoft/DialoGPT-smallQ4
181.65 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-Embedding-4BQ4
181.25 tok/sEstimated

Auto-generated benchmark

2GB
microsoft/Phi-4-multimodal-instructQ4
181.09 tok/sEstimated

Auto-generated benchmark

4GB
hmellor/tiny-random-LlamaForCausalLMQ4
181.05 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-4B-Thinking-2507-FP8Q4
180.92 tok/sEstimated

Auto-generated benchmark

2GB
facebook/opt-125mQ4
180.81 tok/sEstimated

Auto-generated benchmark

4GB
lmsys/vicuna-7b-v1.5Q4
180.74 tok/sEstimated

Auto-generated benchmark

4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bitQ4
180.52 tok/sEstimated

Auto-generated benchmark

2GB
zai-org/GLM-4.5-AirQ4
180.38 tok/sEstimated

Auto-generated benchmark

4GB
allenai/OLMo-2-0425-1BQ4
179.97 tok/sEstimated

Auto-generated benchmark

1GB
microsoft/Phi-3-mini-128k-instructQ4
179.92 tok/sEstimated

Auto-generated benchmark

4GB
google/gemma-2bQ4
179.85 tok/sEstimated

Auto-generated benchmark

1GB
meta-llama/Llama-3.1-8BQ4
179.79 tok/sEstimated

Auto-generated benchmark

4GB
huggyllama/llama-7bQ4
179.31 tok/sEstimated

Auto-generated benchmark

4GB
skt/kogpt2-base-v2Q4
178.45 tok/sEstimated

Auto-generated benchmark

4GB
openai-community/gpt2-mediumQ4
177.98 tok/sEstimated

Auto-generated benchmark

4GB
deepseek-ai/DeepSeek-R1-0528Q4
177.79 tok/sEstimated

Auto-generated benchmark

4GB
Qwen/Qwen3-0.6BQ4
177.61 tok/sEstimated

Auto-generated benchmark

3GB
microsoft/Phi-3.5-vision-instructQ4
177.59 tok/sEstimated

Auto-generated benchmark

4GB
meta-llama/Llama-3.2-1B-Instruct
Q4
1GB
217.30 tok/sEstimated
Auto-generated benchmark
apple/OpenELM-1_1B-Instruct
Q4
1GB
214.24 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-Guard-3-1B
Q4
1GB
214.03 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-1B
Q4
1GB
212.74 tok/sEstimated
Auto-generated benchmark
google-t5/t5-3b
Q4
2GB
212.54 tok/sEstimated
Auto-generated benchmark
google/gemma-2-2b-it
Q4
1GB
212.05 tok/sEstimated
Auto-generated benchmark
google/embeddinggemma-300m
Q4
1GB
211.11 tok/sEstimated
Auto-generated benchmark
facebook/sam3
Q4
1GB
210.12 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-OCR
Q4
2GB
204.89 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B-Instruct
Q4
2GB
203.91 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen2.5-3B
Q4
2GB
203.65 tok/sEstimated
Auto-generated benchmark
inference-net/Schematron-3B
Q4
2GB
201.76 tok/sEstimated
Auto-generated benchmark
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Q4
1GB
200.78 tok/sEstimated
Auto-generated benchmark
google/gemma-3-1b-it
Q4
1GB
199.63 tok/sEstimated
Auto-generated benchmark
unsloth/gemma-3-1b-it
Q4
1GB
199.35 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-3B-Instruct
Q4
2GB
196.99 tok/sEstimated
Auto-generated benchmark
tencent/HunyuanOCR
Q4
1GB
195.04 tok/sEstimated
Auto-generated benchmark
WeiboAI/VibeThinker-1.5B
Q4
1GB
193.63 tok/sEstimated
Auto-generated benchmark
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
Q4
2GB
189.25 tok/sEstimated
Auto-generated benchmark
ibm-research/PowerMoE-3b
Q4
2GB
188.19 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B
Q4
2GB
187.25 tok/sEstimated
Auto-generated benchmark
bigcode/starcoder2-3b
Q4
2GB
187.09 tok/sEstimated
Auto-generated benchmark
google-bert/bert-base-uncased
Q4
1GB
186.38 tok/sEstimated
Auto-generated benchmark
unsloth/Llama-3.2-1B-Instruct
Q4
1GB
185.56 tok/sEstimated
Auto-generated benchmark
nari-labs/Dia2-2B
Q4
2GB
185.02 tok/sEstimated
Auto-generated benchmark
LiquidAI/LFM2-1.2B
Q4
1GB
184.67 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.2-3B-Instruct
Q4
2GB
184.49 tok/sEstimated
Auto-generated benchmark
ibm-granite/granite-3.3-2b-instruct
Q4
1GB
184.04 tok/sEstimated
Auto-generated benchmark
deepseek-ai/deepseek-coder-1.3b-instruct
Q4
2GB
182.66 tok/sEstimated
Auto-generated benchmark
microsoft/phi-2
Q4
4GB
181.91 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-8B-Base
Q4
4GB
181.70 tok/sEstimated
Auto-generated benchmark
microsoft/DialoGPT-small
Q4
4GB
181.65 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-Embedding-4B
Q4
2GB
181.25 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-4-multimodal-instruct
Q4
4GB
181.09 tok/sEstimated
Auto-generated benchmark
hmellor/tiny-random-LlamaForCausalLM
Q4
4GB
181.05 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-4B-Thinking-2507-FP8
Q4
2GB
180.92 tok/sEstimated
Auto-generated benchmark
facebook/opt-125m
Q4
4GB
180.81 tok/sEstimated
Auto-generated benchmark
lmsys/vicuna-7b-v1.5
Q4
4GB
180.74 tok/sEstimated
Auto-generated benchmark
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit
Q4
2GB
180.52 tok/sEstimated
Auto-generated benchmark
zai-org/GLM-4.5-Air
Q4
4GB
180.38 tok/sEstimated
Auto-generated benchmark
allenai/OLMo-2-0425-1B
Q4
1GB
179.97 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-3-mini-128k-instruct
Q4
4GB
179.92 tok/sEstimated
Auto-generated benchmark
google/gemma-2b
Q4
1GB
179.85 tok/sEstimated
Auto-generated benchmark
meta-llama/Llama-3.1-8B
Q4
4GB
179.79 tok/sEstimated
Auto-generated benchmark
huggyllama/llama-7b
Q4
4GB
179.31 tok/sEstimated
Auto-generated benchmark
skt/kogpt2-base-v2
Q4
4GB
178.45 tok/sEstimated
Auto-generated benchmark
openai-community/gpt2-medium
Q4
4GB
177.98 tok/sEstimated
Auto-generated benchmark
deepseek-ai/DeepSeek-R1-0528
Q4
4GB
177.79 tok/sEstimated
Auto-generated benchmark
Qwen/Qwen3-0.6B
Q4
3GB
177.61 tok/sEstimated
Auto-generated benchmark
microsoft/Phi-3.5-vision-instruct
Q4
4GB
177.59 tok/sEstimated
Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

ModelQuantizationVerdictEstimated speedVRAM needed
zai-org/GLM-4.6-FP8Q4Fits comfortably
168.55 tok/sEstimated
4GB (have 48GB)
zai-org/GLM-4.6-FP8FP16Fits comfortably
65.21 tok/sEstimated
15GB (have 48GB)
microsoft/DialoGPT-mediumFP16Fits comfortably
59.91 tok/sEstimated
15GB (have 48GB)
MiniMaxAI/MiniMax-M2FP16Fits comfortably
64.29 tok/sEstimated
15GB (have 48GB)
Qwen/Qwen2-0.5BQ4Fits comfortably
150.38 tok/sEstimated
3GB (have 48GB)
Qwen/Qwen2-0.5BQ8Fits comfortably
108.31 tok/sEstimated
5GB (have 48GB)
microsoft/phi-4Q8Fits comfortably
119.29 tok/sEstimated
7GB (have 48GB)
microsoft/phi-4FP16Fits comfortably
58.08 tok/sEstimated
15GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-InstructFP16Fits comfortably
66.39 tok/sEstimated
17GB (have 48GB)
Qwen/Qwen2.5-Math-1.5BFP16Fits comfortably
58.34 tok/sEstimated
11GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLMFP16Fits comfortably
67.26 tok/sEstimated
15GB (have 48GB)
EleutherAI/gpt-neo-125mQ4Fits comfortably
158.77 tok/sEstimated
4GB (have 48GB)
EleutherAI/gpt-neo-125mQ8Fits comfortably
106.22 tok/sEstimated
7GB (have 48GB)
Qwen/Qwen3-1.7B-BaseFP16Fits comfortably
63.25 tok/sEstimated
15GB (have 48GB)
ibm-granite/granite-3.3-8b-instructQ4Fits comfortably
172.20 tok/sEstimated
4GB (have 48GB)
ibm-granite/granite-3.3-8b-instructFP16Fits comfortably
63.75 tok/sEstimated
17GB (have 48GB)
Qwen/QwQ-32B-PreviewQ4Fits comfortably
53.56 tok/sEstimated
17GB (have 48GB)
Qwen/QwQ-32B-PreviewQ8Fits comfortably
38.25 tok/sEstimated
34GB (have 48GB)
deepseek-ai/DeepSeek-Coder-V2-Instruct-0724FP16Not supported
9.50 tok/sEstimated
461GB (have 48GB)
facebook/sam3Q8Fits comfortably
142.84 tok/sEstimated
1GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512Q4Fits comfortably
134.78 tok/sEstimated
8GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512Q8Fits comfortably
80.65 tok/sEstimated
16GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512FP16Fits comfortably
48.69 tok/sEstimated
32GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.2Q4Fits comfortably
163.82 tok/sEstimated
4GB (have 48GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4Fits comfortably
59.67 tok/sEstimated
34GB (have 48GB)
meta-llama/Llama-3.2-3B-InstructQ8Fits comfortably
142.98 tok/sEstimated
3GB (have 48GB)
meta-llama/Llama-3.2-3B-InstructFP16Fits comfortably
77.93 tok/sEstimated
6GB (have 48GB)
vikhyatk/moondream2FP16Fits comfortably
63.63 tok/sEstimated
15GB (have 48GB)
petals-team/StableBeluga2Q8Fits comfortably
117.38 tok/sEstimated
7GB (have 48GB)
microsoft/Phi-3-mini-4k-instructFP16Fits comfortably
68.82 tok/sEstimated
15GB (have 48GB)
openai-community/gpt2-largeQ4Fits comfortably
166.24 tok/sEstimated
4GB (have 48GB)
openai-community/gpt2-largeQ8Fits comfortably
110.46 tok/sEstimated
7GB (have 48GB)
Qwen/Qwen3-1.7BQ4Fits comfortably
153.43 tok/sEstimated
4GB (have 48GB)
Qwen/Qwen3-1.7BQ8Fits comfortably
114.20 tok/sEstimated
7GB (have 48GB)
MiniMaxAI/MiniMax-M2Q8Fits comfortably
114.24 tok/sEstimated
7GB (have 48GB)
Qwen/Qwen2.5-32BFP16Not supported
23.91 tok/sEstimated
66GB (have 48GB)
meta-llama/Llama-3.1-8B-InstructQ8Fits comfortably
83.13 tok/sEstimated
9GB (have 48GB)
black-forest-labs/FLUX.1-devQ8Fits comfortably
116.30 tok/sEstimated
8GB (have 48GB)
tencent/HunyuanVideo-1.5Q4Fits comfortably
168.43 tok/sEstimated
4GB (have 48GB)
tencent/HunyuanVideo-1.5Q8Fits comfortably
118.91 tok/sEstimated
8GB (have 48GB)
tencent/HunyuanVideo-1.5FP16Fits comfortably
65.47 tok/sEstimated
16GB (have 48GB)
nari-labs/Dia2-2BQ4Fits comfortably
185.02 tok/sEstimated
2GB (have 48GB)
nari-labs/Dia2-2BQ8Fits comfortably
152.56 tok/sEstimated
3GB (have 48GB)
nari-labs/Dia2-2BFP16Fits comfortably
80.35 tok/sEstimated
5GB (have 48GB)
Qwen/Qwen3-4BFP16Fits comfortably
64.72 tok/sEstimated
9GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507Q4Fits comfortably
92.59 tok/sEstimated
15GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507Q8Fits comfortably
57.63 tok/sEstimated
31GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507FP16Not supported
37.79 tok/sEstimated
61GB (have 48GB)
google-t5/t5-3bFP16Fits comfortably
71.28 tok/sEstimated
6GB (have 48GB)
Qwen/Qwen2.5-0.5BQ8Fits comfortably
119.42 tok/sEstimated
5GB (have 48GB)
zai-org/GLM-4.6-FP8Q4
Fits comfortably4GB required · 48GB available
168.55 tok/sEstimated
zai-org/GLM-4.6-FP8FP16
Fits comfortably15GB required · 48GB available
65.21 tok/sEstimated
microsoft/DialoGPT-mediumFP16
Fits comfortably15GB required · 48GB available
59.91 tok/sEstimated
MiniMaxAI/MiniMax-M2FP16
Fits comfortably15GB required · 48GB available
64.29 tok/sEstimated
Qwen/Qwen2-0.5BQ4
Fits comfortably3GB required · 48GB available
150.38 tok/sEstimated
Qwen/Qwen2-0.5BQ8
Fits comfortably5GB required · 48GB available
108.31 tok/sEstimated
microsoft/phi-4Q8
Fits comfortably7GB required · 48GB available
119.29 tok/sEstimated
microsoft/phi-4FP16
Fits comfortably15GB required · 48GB available
58.08 tok/sEstimated
unsloth/Meta-Llama-3.1-8B-InstructFP16
Fits comfortably17GB required · 48GB available
66.39 tok/sEstimated
Qwen/Qwen2.5-Math-1.5BFP16
Fits comfortably11GB required · 48GB available
58.34 tok/sEstimated
trl-internal-testing/tiny-random-LlamaForCausalLMFP16
Fits comfortably15GB required · 48GB available
67.26 tok/sEstimated
EleutherAI/gpt-neo-125mQ4
Fits comfortably4GB required · 48GB available
158.77 tok/sEstimated
EleutherAI/gpt-neo-125mQ8
Fits comfortably7GB required · 48GB available
106.22 tok/sEstimated
Qwen/Qwen3-1.7B-BaseFP16
Fits comfortably15GB required · 48GB available
63.25 tok/sEstimated
ibm-granite/granite-3.3-8b-instructQ4
Fits comfortably4GB required · 48GB available
172.20 tok/sEstimated
ibm-granite/granite-3.3-8b-instructFP16
Fits comfortably17GB required · 48GB available
63.75 tok/sEstimated
Qwen/QwQ-32B-PreviewQ4
Fits comfortably17GB required · 48GB available
53.56 tok/sEstimated
Qwen/QwQ-32B-PreviewQ8
Fits comfortably34GB required · 48GB available
38.25 tok/sEstimated
deepseek-ai/DeepSeek-Coder-V2-Instruct-0724FP16
Not supported461GB required · 48GB available
9.50 tok/sEstimated
facebook/sam3Q8
Fits comfortably1GB required · 48GB available
142.84 tok/sEstimated
mistralai/Ministral-3-14B-Instruct-2512Q4
Fits comfortably8GB required · 48GB available
134.78 tok/sEstimated
mistralai/Ministral-3-14B-Instruct-2512Q8
Fits comfortably16GB required · 48GB available
80.65 tok/sEstimated
mistralai/Ministral-3-14B-Instruct-2512FP16
Fits comfortably32GB required · 48GB available
48.69 tok/sEstimated
mistralai/Mistral-7B-Instruct-v0.2Q4
Fits comfortably4GB required · 48GB available
163.82 tok/sEstimated
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4
Fits comfortably34GB required · 48GB available
59.67 tok/sEstimated
meta-llama/Llama-3.2-3B-InstructQ8
Fits comfortably3GB required · 48GB available
142.98 tok/sEstimated
meta-llama/Llama-3.2-3B-InstructFP16
Fits comfortably6GB required · 48GB available
77.93 tok/sEstimated
vikhyatk/moondream2FP16
Fits comfortably15GB required · 48GB available
63.63 tok/sEstimated
petals-team/StableBeluga2Q8
Fits comfortably7GB required · 48GB available
117.38 tok/sEstimated
microsoft/Phi-3-mini-4k-instructFP16
Fits comfortably15GB required · 48GB available
68.82 tok/sEstimated
openai-community/gpt2-largeQ4
Fits comfortably4GB required · 48GB available
166.24 tok/sEstimated
openai-community/gpt2-largeQ8
Fits comfortably7GB required · 48GB available
110.46 tok/sEstimated
Qwen/Qwen3-1.7BQ4
Fits comfortably4GB required · 48GB available
153.43 tok/sEstimated
Qwen/Qwen3-1.7BQ8
Fits comfortably7GB required · 48GB available
114.20 tok/sEstimated
MiniMaxAI/MiniMax-M2Q8
Fits comfortably7GB required · 48GB available
114.24 tok/sEstimated
Qwen/Qwen2.5-32BFP16
Not supported66GB required · 48GB available
23.91 tok/sEstimated
meta-llama/Llama-3.1-8B-InstructQ8
Fits comfortably9GB required · 48GB available
83.13 tok/sEstimated
black-forest-labs/FLUX.1-devQ8
Fits comfortably8GB required · 48GB available
116.30 tok/sEstimated
tencent/HunyuanVideo-1.5Q4
Fits comfortably4GB required · 48GB available
168.43 tok/sEstimated
tencent/HunyuanVideo-1.5Q8
Fits comfortably8GB required · 48GB available
118.91 tok/sEstimated
tencent/HunyuanVideo-1.5FP16
Fits comfortably16GB required · 48GB available
65.47 tok/sEstimated
nari-labs/Dia2-2BQ4
Fits comfortably2GB required · 48GB available
185.02 tok/sEstimated
nari-labs/Dia2-2BQ8
Fits comfortably3GB required · 48GB available
152.56 tok/sEstimated
nari-labs/Dia2-2BFP16
Fits comfortably5GB required · 48GB available
80.35 tok/sEstimated
Qwen/Qwen3-4BFP16
Fits comfortably9GB required · 48GB available
64.72 tok/sEstimated
Qwen/Qwen3-30B-A3B-Instruct-2507Q4
Fits comfortably15GB required · 48GB available
92.59 tok/sEstimated
Qwen/Qwen3-30B-A3B-Instruct-2507Q8
Fits comfortably31GB required · 48GB available
57.63 tok/sEstimated
Qwen/Qwen3-30B-A3B-Instruct-2507FP16
Not supported61GB required · 48GB available
37.79 tok/sEstimated
google-t5/t5-3bFP16
Fits comfortably6GB required · 48GB available
71.28 tok/sEstimated
Qwen/Qwen2.5-0.5BQ8
Fits comfortably5GB required · 48GB available
119.42 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

NVIDIA RTX 6000 Ada
48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

NVIDIA A6000
48GB

Explore how NVIDIA A6000 stacks up for local inference workloads.

RTX 4090
24GB

Explore how RTX 4090 stacks up for local inference workloads.

RTX 4080
16GB

Explore how RTX 4080 stacks up for local inference workloads.

NVIDIA A5000
24GB

Explore how NVIDIA A5000 stacks up for local inference workloads.