Build the perfect PC for running language models
GPU VRAM is the primary bottleneck for running LLMs. Understanding VRAM requirements helps you choose the right hardware.
Rough formula: VRAM (GB) ≈ Parameters (B) × 0.5 for Q4 quantization, × 1.0 for Q8, × 2.0 for FP16. Example: Llama 3.1 70B needs ~35GB for Q4, ~70GB for Q8.
7-8B models: 4-6GB (Q4). 13B models: 8-10GB (Q4). 32-34B models: 16-20GB (Q4). 70B models: 35-40GB (Q4). 405B models: 200GB+ (requires multiple GPUs).
Longer context requires more VRAM. 8K context adds ~1GB per 10B parameters. 32K context adds ~4GB. 128K context can double VRAM needs.
NVIDIA dominates for LLMs due to CUDA ecosystem. AMD is catching up with ROCm.
Intel Arc B580 12GB ($249) - Best value 12GB. RTX 3060 12GB ($270-350) - Proven, great CUDA support. RX 7600 8GB ($250) - Gaming focus, limited AI.
RTX 4060 Ti 16GB ($450) - Cheapest 16GB NVIDIA. RTX 4070 Super 12GB ($600) - Fast but limited by 12GB. RTX 4070 Ti Super 16GB ($800) - Sweet spot for 32B models.
RX 7900 XTX 24GB ($900) - Best value 24GB. RTX 4080 Super 16GB ($1000) - Fast but only 16GB. RTX 4090 24GB ($1600) - Best consumer GPU for LLMs.
RTX 6000 Ada 48GB ($6000) - Double 4090 VRAM. A100 80GB ($15000+) - Enterprise standard. H100 80GB ($30000+) - Fastest training.
System RAM matters for loading models and handling context.
16GB for 7B models. 32GB for 13-32B models. 64GB for 70B+ models. Models partially load to RAM before GPU transfer.
When VRAM is insufficient, layers can offload to RAM. This is slow (10-100x) but enables running larger models. 128GB+ RAM enables 70B models on 12GB GPUs (very slow).
DDR5 is 20-30% faster than DDR4 for CPU inference. For GPU inference, RAM speed matters less. Prioritize GPU VRAM budget over RAM speed.
LLM model files are large. Fast storage improves loading times.
7B model: 4-8GB. 13B model: 8-15GB. 70B model: 35-70GB. Complete model collection: 500GB-2TB. NVMe SSD strongly recommended.
NVMe loads 70B in ~30 seconds. SATA SSD: ~60 seconds. HDD: 3-5 minutes. Model loading is one-time, but frequent switching benefits from fast storage.
Recommended builds at different price points.
RTX 3060 12GB ($300) + Ryzen 5 5600 ($120) + 32GB DDR4 ($70) + 1TB NVMe ($80) + B550 motherboard ($100) + 650W PSU ($70) + Case ($60). Runs 7B-13B models well.
RTX 4070 Ti Super 16GB ($800) + Ryzen 7 7700X ($300) + 32GB DDR5 ($100) + 2TB NVMe ($150) + B650 motherboard ($150). Runs 32B models, fast inference.
RTX 4090 24GB ($1600) + Ryzen 7 7800X3D ($400) + 64GB DDR5 ($200) + 2TB NVMe ($150) + X670 motherboard ($200). Runs 70B models, no compromises.
Check our step-by-step setup guides and GPU recommendations.