Quick Answer: Linus Tech Tips runs a 4x RTX 4090 (96GB VRAM total) configuration for tech reviews & ai hardware testing. This setup handles mistralai/Mixtral-8x22B-Instruct-v0.1 at 105.2 tokens/sec and can run 70B models locally.
Specs & Performance
| Component | Product | Price | Purchase |
|---|---|---|---|
| GPU | RTX 4090×4 Multi-GPU for testing and production rendering | $6,396 $1,599 each | View on Amazon |
| CPU | AMD Threadripper PRO 7995WX 96-core beast for Labs testing | $9,999 | View on Amazon |
| MOTHERBOARD | Asus Pro WS WRX90E-SAGE SE Workstation board with 7 PCIe slots |
| Model | Quantization | Tokens/sec | VRAM Used |
|---|---|---|---|
| mistralai/Mixtral-8x22B-Instruct-v0.1 | Q4 | 105.2 tok/s | 69GB |
| mistralai/Mistral-Large-Instruct-2411 | Q4 | 84.6 tok/s | 60GB |
| openai/gpt-oss-120b | Q4 | 70.6 tok/s | 59GB |
| RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic | Q8 | 50 tok/s | 88GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Q8 | 58.8 tok/s | 78GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking-FP8 | Q8 | 57.6 tok/s | 78GB |
| Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 | Q8 | 52.1 tok/s | 78GB |
| Qwen/Qwen3-Next-80B-A3B-Thinking | Q8 | 49.8 tok/s | 78GB |
| Qwen/Qwen3-Coder-Next | Q8 | 57.4 tok/s | 90GB |
| Qwen/Qwen2.5-72B-Instruct | Q8 | 56.1 tok/s | 71GB |
Yes. 4x RTX 4090 provides 96GB VRAM, enough for Llama 70B Q4 quantization (needs ~40GB). Expect 105.2 tok/s with tensor parallelism.
$25,295 total. Budget alternatives: Single RTX 4090 (~$4,200) or RTX 4080 (~$2,400) for smaller models.
Llama 405B and similar 400B+ models need 200GB+ VRAM (requires 8x A100 or H100 GPUs). This 96GB setup handles up to 70B models.
Yes for 70B models. Single GPU runs 70B at ~57 tok/s vs 105.2 tok/s multi-GPU. For 7B-13B models, single GPU is sufficient.
| $1,200 |
| View on Amazon |
| RAM | 512GB DDR5 ECC Maximum capacity for testing scenarios | $2,000 | View on Amazon |
| STORAGE | 8TB NVMe RAID Array Massive fast storage for footage and datasets | $1,200 | View on Amazon |
| PSU | Corsair AX1600i 1600W×2 Dual PSU for 4x 4090 | $1,000 $500 each | View on Amazon |
| CASE | Custom Open Frame Labs-built for maximum cooling | $500 | View on Amazon |
| COOLING | Custom Loop Liquid Cooling Full custom loop for 4 GPUs and CPU | $3,000 | View on Amazon |