Quick Answer: NVIDIA L40 offers 48GB VRAM and starts around current market pricing. It delivers approximately 217 tokens/sec on meta-llama/Llama-3.2-1B-Instruct. It typically draws 300W under load.

NVIDIA L40

Unknown

By NVIDIAReleased 2022-10MSRP $7,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM48GB

Cores18,176

TDP300W

ArchitectureAda Lovelace

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA L40 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-1B-Instruct	Q4	217.30 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	214.24 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	214.03 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	212.74 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	212.54 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	212.05 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	211.11 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	210.12 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	204.89 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	203.91 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	203.65 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	201.76 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	200.78 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	199.63 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	199.35 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	196.99 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	195.04 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	193.63 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	189.25 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	188.19 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	187.25 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	187.09 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	186.38 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	185.56 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	185.02 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	184.67 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	184.49 tok/sEstimated Auto-generated benchmark	2GB
ibm-granite/granite-3.3-2b-instruct	Q4	184.04 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	182.66 tok/sEstimated Auto-generated benchmark	2GB
microsoft/phi-2	Q4	181.91 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	181.70 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	181.65 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-4B	Q4	181.25 tok/sEstimated Auto-generated benchmark	2GB
microsoft/Phi-4-multimodal-instruct	Q4	181.09 tok/sEstimated Auto-generated benchmark	4GB
hmellor/tiny-random-LlamaForCausalLM	Q4	181.05 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	180.92 tok/sEstimated Auto-generated benchmark	2GB
facebook/opt-125m	Q4	180.81 tok/sEstimated Auto-generated benchmark	4GB
lmsys/vicuna-7b-v1.5	Q4	180.74 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	180.52 tok/sEstimated Auto-generated benchmark	2GB
zai-org/GLM-4.5-Air	Q4	180.38 tok/sEstimated Auto-generated benchmark	4GB
allenai/OLMo-2-0425-1B	Q4	179.97 tok/sEstimated Auto-generated benchmark	1GB
microsoft/Phi-3-mini-128k-instruct	Q4	179.92 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2b	Q4	179.85 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.1-8B	Q4	179.79 tok/sEstimated Auto-generated benchmark	4GB
huggyllama/llama-7b	Q4	179.31 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	178.45 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-medium	Q4	177.98 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	177.79 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B	Q4	177.61 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-vision-instruct	Q4	177.59 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-3.2-1B-Instruct

1GB

217.30 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

214.24 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

214.03 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

212.74 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

212.54 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

212.05 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

211.11 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

210.12 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

204.89 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

203.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

203.65 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

201.76 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

200.78 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

199.63 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

199.35 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

196.99 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

195.04 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

193.63 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

189.25 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

188.19 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

187.25 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

187.09 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

186.38 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

185.56 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

185.02 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

184.67 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

184.49 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

184.04 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

182.66 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

181.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

181.70 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

181.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

181.25 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

181.09 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

4GB

181.05 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

180.92 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

4GB

180.81 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

4GB

180.74 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

2GB

180.52 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

180.38 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

179.97 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

179.92 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

179.85 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

179.79 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

4GB

179.31 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

178.45 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

177.98 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

177.79 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

177.61 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

177.59 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
zai-org/GLM-4.6-FP8	Q4	Fits comfortably	168.55 tok/sEstimated	4GB (have 48GB)
zai-org/GLM-4.6-FP8	FP16	Fits comfortably	65.21 tok/sEstimated	15GB (have 48GB)
microsoft/DialoGPT-medium	FP16	Fits comfortably	59.91 tok/sEstimated	15GB (have 48GB)
MiniMaxAI/MiniMax-M2	FP16	Fits comfortably	64.29 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	150.38 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	108.31 tok/sEstimated	5GB (have 48GB)
microsoft/phi-4	Q8	Fits comfortably	119.29 tok/sEstimated	7GB (have 48GB)
microsoft/phi-4	FP16	Fits comfortably	58.08 tok/sEstimated	15GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	66.39 tok/sEstimated	17GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	FP16	Fits comfortably	58.34 tok/sEstimated	11GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	FP16	Fits comfortably	67.26 tok/sEstimated	15GB (have 48GB)
EleutherAI/gpt-neo-125m	Q4	Fits comfortably	158.77 tok/sEstimated	4GB (have 48GB)
EleutherAI/gpt-neo-125m	Q8	Fits comfortably	106.22 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-1.7B-Base	FP16	Fits comfortably	63.25 tok/sEstimated	15GB (have 48GB)
ibm-granite/granite-3.3-8b-instruct	Q4	Fits comfortably	172.20 tok/sEstimated	4GB (have 48GB)
ibm-granite/granite-3.3-8b-instruct	FP16	Fits comfortably	63.75 tok/sEstimated	17GB (have 48GB)
Qwen/QwQ-32B-Preview	Q4	Fits comfortably	53.56 tok/sEstimated	17GB (have 48GB)
Qwen/QwQ-32B-Preview	Q8	Fits comfortably	38.25 tok/sEstimated	34GB (have 48GB)
deepseek-ai/DeepSeek-Coder-V2-Instruct-0724	FP16	Not supported	9.50 tok/sEstimated	461GB (have 48GB)
facebook/sam3	Q8	Fits comfortably	142.84 tok/sEstimated	1GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512	Q4	Fits comfortably	134.78 tok/sEstimated	8GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512	Q8	Fits comfortably	80.65 tok/sEstimated	16GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512	FP16	Fits comfortably	48.69 tok/sEstimated	32GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	163.82 tok/sEstimated	4GB (have 48GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	Fits comfortably	59.67 tok/sEstimated	34GB (have 48GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	142.98 tok/sEstimated	3GB (have 48GB)
meta-llama/Llama-3.2-3B-Instruct	FP16	Fits comfortably	77.93 tok/sEstimated	6GB (have 48GB)
vikhyatk/moondream2	FP16	Fits comfortably	63.63 tok/sEstimated	15GB (have 48GB)
petals-team/StableBeluga2	Q8	Fits comfortably	117.38 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3-mini-4k-instruct	FP16	Fits comfortably	68.82 tok/sEstimated	15GB (have 48GB)
openai-community/gpt2-large	Q4	Fits comfortably	166.24 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2-large	Q8	Fits comfortably	110.46 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	153.43 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	114.20 tok/sEstimated	7GB (have 48GB)
MiniMaxAI/MiniMax-M2	Q8	Fits comfortably	114.24 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-32B	FP16	Not supported	23.91 tok/sEstimated	66GB (have 48GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	83.13 tok/sEstimated	9GB (have 48GB)
black-forest-labs/FLUX.1-dev	Q8	Fits comfortably	116.30 tok/sEstimated	8GB (have 48GB)
tencent/HunyuanVideo-1.5	Q4	Fits comfortably	168.43 tok/sEstimated	4GB (have 48GB)
tencent/HunyuanVideo-1.5	Q8	Fits comfortably	118.91 tok/sEstimated	8GB (have 48GB)
tencent/HunyuanVideo-1.5	FP16	Fits comfortably	65.47 tok/sEstimated	16GB (have 48GB)
nari-labs/Dia2-2B	Q4	Fits comfortably	185.02 tok/sEstimated	2GB (have 48GB)
nari-labs/Dia2-2B	Q8	Fits comfortably	152.56 tok/sEstimated	3GB (have 48GB)
nari-labs/Dia2-2B	FP16	Fits comfortably	80.35 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen3-4B	FP16	Fits comfortably	64.72 tok/sEstimated	9GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	92.59 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q8	Fits comfortably	57.63 tok/sEstimated	31GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	FP16	Not supported	37.79 tok/sEstimated	61GB (have 48GB)
google-t5/t5-3b	FP16	Fits comfortably	71.28 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	119.42 tok/sEstimated	5GB (have 48GB)

zai-org/GLM-4.6-FP8Q4

Fits comfortably4GB required · 48GB available

168.55 tok/sEstimated

zai-org/GLM-4.6-FP8FP16

Fits comfortably15GB required · 48GB available

65.21 tok/sEstimated

microsoft/DialoGPT-mediumFP16

Fits comfortably15GB required · 48GB available

59.91 tok/sEstimated

MiniMaxAI/MiniMax-M2FP16

Fits comfortably15GB required · 48GB available

64.29 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 48GB available

150.38 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 48GB available

108.31 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 48GB available

119.29 tok/sEstimated

microsoft/phi-4FP16

Fits comfortably15GB required · 48GB available

58.08 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 48GB available

66.39 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BFP16

Fits comfortably11GB required · 48GB available

58.34 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMFP16

Fits comfortably15GB required · 48GB available

67.26 tok/sEstimated

EleutherAI/gpt-neo-125mQ4

Fits comfortably4GB required · 48GB available

158.77 tok/sEstimated

EleutherAI/gpt-neo-125mQ8

Fits comfortably7GB required · 48GB available

106.22 tok/sEstimated

Qwen/Qwen3-1.7B-BaseFP16

Fits comfortably15GB required · 48GB available

63.25 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ4

Fits comfortably4GB required · 48GB available

172.20 tok/sEstimated

ibm-granite/granite-3.3-8b-instructFP16

Fits comfortably17GB required · 48GB available

63.75 tok/sEstimated

Qwen/QwQ-32B-PreviewQ4

Fits comfortably17GB required · 48GB available

53.56 tok/sEstimated

Qwen/QwQ-32B-PreviewQ8

Fits comfortably34GB required · 48GB available

38.25 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Instruct-0724FP16

Not supported461GB required · 48GB available

9.50 tok/sEstimated

facebook/sam3Q8

Fits comfortably1GB required · 48GB available

142.84 tok/sEstimated

mistralai/Ministral-3-14B-Instruct-2512Q4

Fits comfortably8GB required · 48GB available

134.78 tok/sEstimated

mistralai/Ministral-3-14B-Instruct-2512Q8

Fits comfortably16GB required · 48GB available

80.65 tok/sEstimated

mistralai/Ministral-3-14B-Instruct-2512FP16

Fits comfortably32GB required · 48GB available

48.69 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 48GB available

163.82 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4

Fits comfortably34GB required · 48GB available

59.67 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 48GB available

142.98 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructFP16

Fits comfortably6GB required · 48GB available

77.93 tok/sEstimated

vikhyatk/moondream2FP16

Fits comfortably15GB required · 48GB available

63.63 tok/sEstimated

petals-team/StableBeluga2Q8

Fits comfortably7GB required · 48GB available

117.38 tok/sEstimated

microsoft/Phi-3-mini-4k-instructFP16

Fits comfortably15GB required · 48GB available

68.82 tok/sEstimated

openai-community/gpt2-largeQ4

Fits comfortably4GB required · 48GB available

166.24 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 48GB available

110.46 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 48GB available

153.43 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 48GB available

114.20 tok/sEstimated

MiniMaxAI/MiniMax-M2Q8

Fits comfortably7GB required · 48GB available

114.24 tok/sEstimated

Qwen/Qwen2.5-32BFP16

Not supported66GB required · 48GB available

23.91 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 48GB available

83.13 tok/sEstimated

black-forest-labs/FLUX.1-devQ8

Fits comfortably8GB required · 48GB available

116.30 tok/sEstimated

tencent/HunyuanVideo-1.5Q4

Fits comfortably4GB required · 48GB available

168.43 tok/sEstimated

tencent/HunyuanVideo-1.5Q8

Fits comfortably8GB required · 48GB available

118.91 tok/sEstimated

tencent/HunyuanVideo-1.5FP16

Fits comfortably16GB required · 48GB available

65.47 tok/sEstimated

nari-labs/Dia2-2BQ4

Fits comfortably2GB required · 48GB available

185.02 tok/sEstimated

nari-labs/Dia2-2BQ8

Fits comfortably3GB required · 48GB available

152.56 tok/sEstimated

nari-labs/Dia2-2BFP16

Fits comfortably5GB required · 48GB available

80.35 tok/sEstimated

Qwen/Qwen3-4BFP16

Fits comfortably9GB required · 48GB available

64.72 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 48GB available

92.59 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q8

Fits comfortably31GB required · 48GB available

57.63 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507FP16

Not supported61GB required · 48GB available

37.79 tok/sEstimated

google-t5/t5-3bFP16

Fits comfortably6GB required · 48GB available

71.28 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 48GB available

119.42 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

NVIDIA RTX 6000 Ada

48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

NVIDIA A6000

48GB

Explore how NVIDIA A6000 stacks up for local inference workloads.

RTX 4090

24GB

Explore how RTX 4090 stacks up for local inference workloads.

RTX 4080

16GB

Explore how RTX 4080 stacks up for local inference workloads.

NVIDIA A5000

24GB

Explore how NVIDIA A5000 stacks up for local inference workloads.

NVIDIA L40

Unknown

By NVIDIAReleased 2022-10MSRP $7,999.00

This GPU offers reliable throughput for local AI workloads. Pair it with the right model quantization to hit your desired tokens/sec, and monitor prices below to catch the best deal.

Check Price on Amazon View Benchmarks

Specs snapshot

Key hardware metrics for AI workloads.

VRAM48GB

Cores18,176

TDP300W

ArchitectureAda Lovelace

Where to Buy

Buy directly on Amazon with fast shipping and reliable customer service.

AmazonUnknown

See price on Amazon

Buy on Amazon

💡 Not ready to buy? Try cloud GPUs first

Test NVIDIA L40 performance in the cloud before investing in hardware. Pay by the hour with no commitment.

Vast.aifrom $0.20/hr RunPodfrom $0.30/hr Lambda Labsenterprise-grade

AI benchmarks

Model	Quantization	Tokens/sec	VRAM used
meta-llama/Llama-3.2-1B-Instruct	Q4	217.30 tok/sEstimated Auto-generated benchmark	1GB
apple/OpenELM-1_1B-Instruct	Q4	214.24 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-Guard-3-1B	Q4	214.03 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-1B	Q4	212.74 tok/sEstimated Auto-generated benchmark	1GB
google-t5/t5-3b	Q4	212.54 tok/sEstimated Auto-generated benchmark	2GB
google/gemma-2-2b-it	Q4	212.05 tok/sEstimated Auto-generated benchmark	1GB
google/embeddinggemma-300m	Q4	211.11 tok/sEstimated Auto-generated benchmark	1GB
facebook/sam3	Q4	210.12 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/DeepSeek-OCR	Q4	204.89 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B-Instruct	Q4	203.91 tok/sEstimated Auto-generated benchmark	2GB
Qwen/Qwen2.5-3B	Q4	203.65 tok/sEstimated Auto-generated benchmark	2GB
inference-net/Schematron-3B	Q4	201.76 tok/sEstimated Auto-generated benchmark	2GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0	Q4	200.78 tok/sEstimated Auto-generated benchmark	1GB
google/gemma-3-1b-it	Q4	199.63 tok/sEstimated Auto-generated benchmark	1GB
unsloth/gemma-3-1b-it	Q4	199.35 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-3B-Instruct	Q4	196.99 tok/sEstimated Auto-generated benchmark	2GB
tencent/HunyuanOCR	Q4	195.04 tok/sEstimated Auto-generated benchmark	1GB
WeiboAI/VibeThinker-1.5B	Q4	193.63 tok/sEstimated Auto-generated benchmark	1GB
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16	Q4	189.25 tok/sEstimated Auto-generated benchmark	2GB
ibm-research/PowerMoE-3b	Q4	188.19 tok/sEstimated Auto-generated benchmark	2GB
meta-llama/Llama-3.2-3B	Q4	187.25 tok/sEstimated Auto-generated benchmark	2GB
bigcode/starcoder2-3b	Q4	187.09 tok/sEstimated Auto-generated benchmark	2GB
google-bert/bert-base-uncased	Q4	186.38 tok/sEstimated Auto-generated benchmark	1GB
unsloth/Llama-3.2-1B-Instruct	Q4	185.56 tok/sEstimated Auto-generated benchmark	1GB
nari-labs/Dia2-2B	Q4	185.02 tok/sEstimated Auto-generated benchmark	2GB
LiquidAI/LFM2-1.2B	Q4	184.67 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.2-3B-Instruct	Q4	184.49 tok/sEstimated Auto-generated benchmark	2GB
ibm-granite/granite-3.3-2b-instruct	Q4	184.04 tok/sEstimated Auto-generated benchmark	1GB
deepseek-ai/deepseek-coder-1.3b-instruct	Q4	182.66 tok/sEstimated Auto-generated benchmark	2GB
microsoft/phi-2	Q4	181.91 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-8B-Base	Q4	181.70 tok/sEstimated Auto-generated benchmark	4GB
microsoft/DialoGPT-small	Q4	181.65 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-Embedding-4B	Q4	181.25 tok/sEstimated Auto-generated benchmark	2GB
microsoft/Phi-4-multimodal-instruct	Q4	181.09 tok/sEstimated Auto-generated benchmark	4GB
hmellor/tiny-random-LlamaForCausalLM	Q4	181.05 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-4B-Thinking-2507-FP8	Q4	180.92 tok/sEstimated Auto-generated benchmark	2GB
facebook/opt-125m	Q4	180.81 tok/sEstimated Auto-generated benchmark	4GB
lmsys/vicuna-7b-v1.5	Q4	180.74 tok/sEstimated Auto-generated benchmark	4GB
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit	Q4	180.52 tok/sEstimated Auto-generated benchmark	2GB
zai-org/GLM-4.5-Air	Q4	180.38 tok/sEstimated Auto-generated benchmark	4GB
allenai/OLMo-2-0425-1B	Q4	179.97 tok/sEstimated Auto-generated benchmark	1GB
microsoft/Phi-3-mini-128k-instruct	Q4	179.92 tok/sEstimated Auto-generated benchmark	4GB
google/gemma-2b	Q4	179.85 tok/sEstimated Auto-generated benchmark	1GB
meta-llama/Llama-3.1-8B	Q4	179.79 tok/sEstimated Auto-generated benchmark	4GB
huggyllama/llama-7b	Q4	179.31 tok/sEstimated Auto-generated benchmark	4GB
skt/kogpt2-base-v2	Q4	178.45 tok/sEstimated Auto-generated benchmark	4GB
openai-community/gpt2-medium	Q4	177.98 tok/sEstimated Auto-generated benchmark	4GB
deepseek-ai/DeepSeek-R1-0528	Q4	177.79 tok/sEstimated Auto-generated benchmark	4GB
Qwen/Qwen3-0.6B	Q4	177.61 tok/sEstimated Auto-generated benchmark	3GB
microsoft/Phi-3.5-vision-instruct	Q4	177.59 tok/sEstimated Auto-generated benchmark	4GB

meta-llama/Llama-3.2-1B-Instruct

1GB

217.30 tok/sEstimated

Auto-generated benchmark

apple/OpenELM-1_1B-Instruct

1GB

214.24 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-Guard-3-1B

1GB

214.03 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-1B

1GB

212.74 tok/sEstimated

Auto-generated benchmark

google-t5/t5-3b

2GB

212.54 tok/sEstimated

Auto-generated benchmark

google/gemma-2-2b-it

1GB

212.05 tok/sEstimated

Auto-generated benchmark

google/embeddinggemma-300m

1GB

211.11 tok/sEstimated

Auto-generated benchmark

facebook/sam3

1GB

210.12 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-OCR

2GB

204.89 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B-Instruct

2GB

203.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen2.5-3B

2GB

203.65 tok/sEstimated

Auto-generated benchmark

inference-net/Schematron-3B

2GB

201.76 tok/sEstimated

Auto-generated benchmark

TinyLlama/TinyLlama-1.1B-Chat-v1.0

1GB

200.78 tok/sEstimated

Auto-generated benchmark

google/gemma-3-1b-it

1GB

199.63 tok/sEstimated

Auto-generated benchmark

unsloth/gemma-3-1b-it

1GB

199.35 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-3B-Instruct

2GB

196.99 tok/sEstimated

Auto-generated benchmark

tencent/HunyuanOCR

1GB

195.04 tok/sEstimated

Auto-generated benchmark

WeiboAI/VibeThinker-1.5B

1GB

193.63 tok/sEstimated

Auto-generated benchmark

context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

2GB

189.25 tok/sEstimated

Auto-generated benchmark

ibm-research/PowerMoE-3b

2GB

188.19 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B

2GB

187.25 tok/sEstimated

Auto-generated benchmark

bigcode/starcoder2-3b

2GB

187.09 tok/sEstimated

Auto-generated benchmark

google-bert/bert-base-uncased

1GB

186.38 tok/sEstimated

Auto-generated benchmark

unsloth/Llama-3.2-1B-Instruct

1GB

185.56 tok/sEstimated

Auto-generated benchmark

nari-labs/Dia2-2B

2GB

185.02 tok/sEstimated

Auto-generated benchmark

LiquidAI/LFM2-1.2B

1GB

184.67 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.2-3B-Instruct

2GB

184.49 tok/sEstimated

Auto-generated benchmark

ibm-granite/granite-3.3-2b-instruct

1GB

184.04 tok/sEstimated

Auto-generated benchmark

deepseek-ai/deepseek-coder-1.3b-instruct

2GB

182.66 tok/sEstimated

Auto-generated benchmark

microsoft/phi-2

4GB

181.91 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-8B-Base

4GB

181.70 tok/sEstimated

Auto-generated benchmark

microsoft/DialoGPT-small

4GB

181.65 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-Embedding-4B

2GB

181.25 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-4-multimodal-instruct

4GB

181.09 tok/sEstimated

Auto-generated benchmark

hmellor/tiny-random-LlamaForCausalLM

4GB

181.05 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-4B-Thinking-2507-FP8

2GB

180.92 tok/sEstimated

Auto-generated benchmark

facebook/opt-125m

4GB

180.81 tok/sEstimated

Auto-generated benchmark

lmsys/vicuna-7b-v1.5

4GB

180.74 tok/sEstimated

Auto-generated benchmark

lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit

2GB

180.52 tok/sEstimated

Auto-generated benchmark

zai-org/GLM-4.5-Air

4GB

180.38 tok/sEstimated

Auto-generated benchmark

allenai/OLMo-2-0425-1B

1GB

179.97 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3-mini-128k-instruct

4GB

179.92 tok/sEstimated

Auto-generated benchmark

google/gemma-2b

1GB

179.85 tok/sEstimated

Auto-generated benchmark

meta-llama/Llama-3.1-8B

4GB

179.79 tok/sEstimated

Auto-generated benchmark

huggyllama/llama-7b

4GB

179.31 tok/sEstimated

Auto-generated benchmark

skt/kogpt2-base-v2

4GB

178.45 tok/sEstimated

Auto-generated benchmark

openai-community/gpt2-medium

4GB

177.98 tok/sEstimated

Auto-generated benchmark

deepseek-ai/DeepSeek-R1-0528

4GB

177.79 tok/sEstimated

Auto-generated benchmark

Qwen/Qwen3-0.6B

3GB

177.61 tok/sEstimated

Auto-generated benchmark

microsoft/Phi-3.5-vision-instruct

4GB

177.59 tok/sEstimated

Auto-generated benchmark

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Model compatibility

Model	Quantization	Verdict	Estimated speed	VRAM needed
zai-org/GLM-4.6-FP8	Q4	Fits comfortably	168.55 tok/sEstimated	4GB (have 48GB)
zai-org/GLM-4.6-FP8	FP16	Fits comfortably	65.21 tok/sEstimated	15GB (have 48GB)
microsoft/DialoGPT-medium	FP16	Fits comfortably	59.91 tok/sEstimated	15GB (have 48GB)
MiniMaxAI/MiniMax-M2	FP16	Fits comfortably	64.29 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen2-0.5B	Q4	Fits comfortably	150.38 tok/sEstimated	3GB (have 48GB)
Qwen/Qwen2-0.5B	Q8	Fits comfortably	108.31 tok/sEstimated	5GB (have 48GB)
microsoft/phi-4	Q8	Fits comfortably	119.29 tok/sEstimated	7GB (have 48GB)
microsoft/phi-4	FP16	Fits comfortably	58.08 tok/sEstimated	15GB (have 48GB)
unsloth/Meta-Llama-3.1-8B-Instruct	FP16	Fits comfortably	66.39 tok/sEstimated	17GB (have 48GB)
Qwen/Qwen2.5-Math-1.5B	FP16	Fits comfortably	58.34 tok/sEstimated	11GB (have 48GB)
trl-internal-testing/tiny-random-LlamaForCausalLM	FP16	Fits comfortably	67.26 tok/sEstimated	15GB (have 48GB)
EleutherAI/gpt-neo-125m	Q4	Fits comfortably	158.77 tok/sEstimated	4GB (have 48GB)
EleutherAI/gpt-neo-125m	Q8	Fits comfortably	106.22 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-1.7B-Base	FP16	Fits comfortably	63.25 tok/sEstimated	15GB (have 48GB)
ibm-granite/granite-3.3-8b-instruct	Q4	Fits comfortably	172.20 tok/sEstimated	4GB (have 48GB)
ibm-granite/granite-3.3-8b-instruct	FP16	Fits comfortably	63.75 tok/sEstimated	17GB (have 48GB)
Qwen/QwQ-32B-Preview	Q4	Fits comfortably	53.56 tok/sEstimated	17GB (have 48GB)
Qwen/QwQ-32B-Preview	Q8	Fits comfortably	38.25 tok/sEstimated	34GB (have 48GB)
deepseek-ai/DeepSeek-Coder-V2-Instruct-0724	FP16	Not supported	9.50 tok/sEstimated	461GB (have 48GB)
facebook/sam3	Q8	Fits comfortably	142.84 tok/sEstimated	1GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512	Q4	Fits comfortably	134.78 tok/sEstimated	8GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512	Q8	Fits comfortably	80.65 tok/sEstimated	16GB (have 48GB)
mistralai/Ministral-3-14B-Instruct-2512	FP16	Fits comfortably	48.69 tok/sEstimated	32GB (have 48GB)
mistralai/Mistral-7B-Instruct-v0.2	Q4	Fits comfortably	163.82 tok/sEstimated	4GB (have 48GB)
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic	Q4	Fits comfortably	59.67 tok/sEstimated	34GB (have 48GB)
meta-llama/Llama-3.2-3B-Instruct	Q8	Fits comfortably	142.98 tok/sEstimated	3GB (have 48GB)
meta-llama/Llama-3.2-3B-Instruct	FP16	Fits comfortably	77.93 tok/sEstimated	6GB (have 48GB)
vikhyatk/moondream2	FP16	Fits comfortably	63.63 tok/sEstimated	15GB (have 48GB)
petals-team/StableBeluga2	Q8	Fits comfortably	117.38 tok/sEstimated	7GB (have 48GB)
microsoft/Phi-3-mini-4k-instruct	FP16	Fits comfortably	68.82 tok/sEstimated	15GB (have 48GB)
openai-community/gpt2-large	Q4	Fits comfortably	166.24 tok/sEstimated	4GB (have 48GB)
openai-community/gpt2-large	Q8	Fits comfortably	110.46 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen3-1.7B	Q4	Fits comfortably	153.43 tok/sEstimated	4GB (have 48GB)
Qwen/Qwen3-1.7B	Q8	Fits comfortably	114.20 tok/sEstimated	7GB (have 48GB)
MiniMaxAI/MiniMax-M2	Q8	Fits comfortably	114.24 tok/sEstimated	7GB (have 48GB)
Qwen/Qwen2.5-32B	FP16	Not supported	23.91 tok/sEstimated	66GB (have 48GB)
meta-llama/Llama-3.1-8B-Instruct	Q8	Fits comfortably	83.13 tok/sEstimated	9GB (have 48GB)
black-forest-labs/FLUX.1-dev	Q8	Fits comfortably	116.30 tok/sEstimated	8GB (have 48GB)
tencent/HunyuanVideo-1.5	Q4	Fits comfortably	168.43 tok/sEstimated	4GB (have 48GB)
tencent/HunyuanVideo-1.5	Q8	Fits comfortably	118.91 tok/sEstimated	8GB (have 48GB)
tencent/HunyuanVideo-1.5	FP16	Fits comfortably	65.47 tok/sEstimated	16GB (have 48GB)
nari-labs/Dia2-2B	Q4	Fits comfortably	185.02 tok/sEstimated	2GB (have 48GB)
nari-labs/Dia2-2B	Q8	Fits comfortably	152.56 tok/sEstimated	3GB (have 48GB)
nari-labs/Dia2-2B	FP16	Fits comfortably	80.35 tok/sEstimated	5GB (have 48GB)
Qwen/Qwen3-4B	FP16	Fits comfortably	64.72 tok/sEstimated	9GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q4	Fits comfortably	92.59 tok/sEstimated	15GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	Q8	Fits comfortably	57.63 tok/sEstimated	31GB (have 48GB)
Qwen/Qwen3-30B-A3B-Instruct-2507	FP16	Not supported	37.79 tok/sEstimated	61GB (have 48GB)
google-t5/t5-3b	FP16	Fits comfortably	71.28 tok/sEstimated	6GB (have 48GB)
Qwen/Qwen2.5-0.5B	Q8	Fits comfortably	119.42 tok/sEstimated	5GB (have 48GB)

zai-org/GLM-4.6-FP8Q4

Fits comfortably4GB required · 48GB available

168.55 tok/sEstimated

zai-org/GLM-4.6-FP8FP16

Fits comfortably15GB required · 48GB available

65.21 tok/sEstimated

microsoft/DialoGPT-mediumFP16

Fits comfortably15GB required · 48GB available

59.91 tok/sEstimated

MiniMaxAI/MiniMax-M2FP16

Fits comfortably15GB required · 48GB available

64.29 tok/sEstimated

Qwen/Qwen2-0.5BQ4

Fits comfortably3GB required · 48GB available

150.38 tok/sEstimated

Qwen/Qwen2-0.5BQ8

Fits comfortably5GB required · 48GB available

108.31 tok/sEstimated

microsoft/phi-4Q8

Fits comfortably7GB required · 48GB available

119.29 tok/sEstimated

microsoft/phi-4FP16

Fits comfortably15GB required · 48GB available

58.08 tok/sEstimated

unsloth/Meta-Llama-3.1-8B-InstructFP16

Fits comfortably17GB required · 48GB available

66.39 tok/sEstimated

Qwen/Qwen2.5-Math-1.5BFP16

Fits comfortably11GB required · 48GB available

58.34 tok/sEstimated

trl-internal-testing/tiny-random-LlamaForCausalLMFP16

Fits comfortably15GB required · 48GB available

67.26 tok/sEstimated

EleutherAI/gpt-neo-125mQ4

Fits comfortably4GB required · 48GB available

158.77 tok/sEstimated

EleutherAI/gpt-neo-125mQ8

Fits comfortably7GB required · 48GB available

106.22 tok/sEstimated

Qwen/Qwen3-1.7B-BaseFP16

Fits comfortably15GB required · 48GB available

63.25 tok/sEstimated

ibm-granite/granite-3.3-8b-instructQ4

Fits comfortably4GB required · 48GB available

172.20 tok/sEstimated

ibm-granite/granite-3.3-8b-instructFP16

Fits comfortably17GB required · 48GB available

63.75 tok/sEstimated

Qwen/QwQ-32B-PreviewQ4

Fits comfortably17GB required · 48GB available

53.56 tok/sEstimated

Qwen/QwQ-32B-PreviewQ8

Fits comfortably34GB required · 48GB available

38.25 tok/sEstimated

deepseek-ai/DeepSeek-Coder-V2-Instruct-0724FP16

Not supported461GB required · 48GB available

9.50 tok/sEstimated

facebook/sam3Q8

Fits comfortably1GB required · 48GB available

142.84 tok/sEstimated

mistralai/Ministral-3-14B-Instruct-2512Q4

Fits comfortably8GB required · 48GB available

134.78 tok/sEstimated

mistralai/Ministral-3-14B-Instruct-2512Q8

Fits comfortably16GB required · 48GB available

80.65 tok/sEstimated

mistralai/Ministral-3-14B-Instruct-2512FP16

Fits comfortably32GB required · 48GB available

48.69 tok/sEstimated

mistralai/Mistral-7B-Instruct-v0.2Q4

Fits comfortably4GB required · 48GB available

163.82 tok/sEstimated

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicQ4

Fits comfortably34GB required · 48GB available

59.67 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructQ8

Fits comfortably3GB required · 48GB available

142.98 tok/sEstimated

meta-llama/Llama-3.2-3B-InstructFP16

Fits comfortably6GB required · 48GB available

77.93 tok/sEstimated

vikhyatk/moondream2FP16

Fits comfortably15GB required · 48GB available

63.63 tok/sEstimated

petals-team/StableBeluga2Q8

Fits comfortably7GB required · 48GB available

117.38 tok/sEstimated

microsoft/Phi-3-mini-4k-instructFP16

Fits comfortably15GB required · 48GB available

68.82 tok/sEstimated

openai-community/gpt2-largeQ4

Fits comfortably4GB required · 48GB available

166.24 tok/sEstimated

openai-community/gpt2-largeQ8

Fits comfortably7GB required · 48GB available

110.46 tok/sEstimated

Qwen/Qwen3-1.7BQ4

Fits comfortably4GB required · 48GB available

153.43 tok/sEstimated

Qwen/Qwen3-1.7BQ8

Fits comfortably7GB required · 48GB available

114.20 tok/sEstimated

MiniMaxAI/MiniMax-M2Q8

Fits comfortably7GB required · 48GB available

114.24 tok/sEstimated

Qwen/Qwen2.5-32BFP16

Not supported66GB required · 48GB available

23.91 tok/sEstimated

meta-llama/Llama-3.1-8B-InstructQ8

Fits comfortably9GB required · 48GB available

83.13 tok/sEstimated

black-forest-labs/FLUX.1-devQ8

Fits comfortably8GB required · 48GB available

116.30 tok/sEstimated

tencent/HunyuanVideo-1.5Q4

Fits comfortably4GB required · 48GB available

168.43 tok/sEstimated

tencent/HunyuanVideo-1.5Q8

Fits comfortably8GB required · 48GB available

118.91 tok/sEstimated

tencent/HunyuanVideo-1.5FP16

Fits comfortably16GB required · 48GB available

65.47 tok/sEstimated

nari-labs/Dia2-2BQ4

Fits comfortably2GB required · 48GB available

185.02 tok/sEstimated

nari-labs/Dia2-2BQ8

Fits comfortably3GB required · 48GB available

152.56 tok/sEstimated

nari-labs/Dia2-2BFP16

Fits comfortably5GB required · 48GB available

80.35 tok/sEstimated

Qwen/Qwen3-4BFP16

Fits comfortably9GB required · 48GB available

64.72 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q4

Fits comfortably15GB required · 48GB available

92.59 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507Q8

Fits comfortably31GB required · 48GB available

57.63 tok/sEstimated

Qwen/Qwen3-30B-A3B-Instruct-2507FP16

Not supported61GB required · 48GB available

37.79 tok/sEstimated

google-t5/t5-3bFP16

Fits comfortably6GB required · 48GB available

71.28 tok/sEstimated

Qwen/Qwen2.5-0.5BQ8

Fits comfortably5GB required · 48GB available

119.42 tok/sEstimated

Note: Performance estimates are calculated. Real results may vary. Methodology · Submit real data

Alternative GPUs

NVIDIA RTX 6000 Ada

48GB

Explore how NVIDIA RTX 6000 Ada stacks up for local inference workloads.

NVIDIA A6000

48GB

Explore how NVIDIA A6000 stacks up for local inference workloads.

RTX 4090

24GB

Explore how RTX 4090 stacks up for local inference workloads.

RTX 4080

16GB

Explore how RTX 4080 stacks up for local inference workloads.

NVIDIA A5000

24GB

Explore how NVIDIA A5000 stacks up for local inference workloads.