Loading content...
Loading AI researcher setups...
Loading researcher workstation...
Quick Answer: ThePrimeagen runs a RTX 4090 (24GB VRAM total) configuration for software engineering & performance. This setup handles google/gemma-2-9b-it at 48 tokens/sec optimized for 7B-13B models.
Specs & Performance
| Component | Product | Price | Purchase |
|---|---|---|---|
| GPU | RTX 4090 For streaming and AI experiments | $1,599 | View on Amazon |
| CPU | AMD Ryzen 9 7950X 16-core for streaming + coding | $549 | View on Amazon |
| MOTHERBOARD | Gigabyte X670E AORUS Master High-end X670E with great VRMs | $480 | View on Amazon |
| Model | Quantization | Tokens/sec | VRAM Used |
|---|---|---|---|
| google/gemma-2-9b-it | FP16 | 48 tok/s | 20GB |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | FP16 | 52 tok/s | 19GB |
| EssentialAI/rnj-1 | FP16 | 55 tok/s | 19GB |
| NousResearch/Hermes-3-Llama-3.1-8B | FP16 | 54 tok/s | 17GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 50 tok/s | 17GB |
| NousResearch/Meta-Llama-3.1-8B-Instruct | FP16 | 75 tok/s | 17GB |
| meta-llama/Llama-Guard-3-8B | FP16 | 74 tok/s | 17GB |
| meta-llama/Llama-3.1-8B-Instruct | FP16 | 74 tok/s | 17GB |
| lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit | FP16 | 72 tok/s | 17GB |
| deepseek-ai/DeepSeek-R1-Distill-Llama-8B | FP16 | 71 tok/s | 17GB |
No. This setup has 24GB VRAM but Llama 70B needs 40GB minimum. It can run Llama 13B and smaller models.
$3,577 total. Budget alternatives: Single RTX 4090 (~$4,200) or RTX 4080 (~$2,400) for smaller models.
Llama 405B and similar 400B+ models need 200GB+ VRAM (requires 8x A100 or H100 GPUs). This 24GB setup handles up to 70B models.
| RAM | 64GB DDR5-6000 Fast for multitasking | $250 | View on Amazon |
| STORAGE | Samsung 990 Pro 2TB Fast NVMe | $180 | View on Amazon |
| PSU | Corsair HX1000i 1000W 80+ Platinum | $230 | View on Amazon |
| CASE | Fractal Design Define 7 Quiet case for streaming audio | $189 | View on Amazon |
| COOLING | Noctua NH-D15 Quiet air cooling for streaming | $100 | View on Amazon |