Quick Answer: Linus Tech Tips runs a 4x RTX 4090 (96GB VRAM total) configuration for tech reviews & ai hardware testing. This setup handles mistralai/Mixtral-8x22B-Instruct-v0.1 at 105.2 tokens/sec and can run 70B models locally.

Linus Tech Tips AI Computer Setup

16000K followers

Specs & Performance

96GB VRAM • 105.2 tok/s on 70BVerified 12/25/2025

$25,295

Total Cost

Twitter

Complete Hardware Specs

All components with current pricing and affiliate purchase links

Component	Product	Price	Purchase
GPU	RTX 4090×4 Multi-GPU for testing and production rendering	$6,396 $1,599 each	View on Amazon
CPU	AMD Threadripper PRO 7995WX 96-core beast for Labs testing	$9,999	View on Amazon
MOTHERBOARD	Asus Pro WS WRX90E-SAGE SE Workstation board with 7 PCIe slots

Linus Tech Tips's AI Computer Performance

LLM inference speed for models with 20+ tok/s on this 96GB VRAM setup

Model	Quantization	Tokens/sec	VRAM Used
mistralai/Mixtral-8x22B-Instruct-v0.1	Q4	105.2 tok/s	69GB
mistralai/Mistral-Large-Instruct-2411	Q4	84.6 tok/s	60GB
openai/gpt-oss-120b	Q4	70.6 tok/s	59GB
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic	Q8	50 tok/s	88GB
Qwen/Qwen3-Next-80B-A3B-Instruct	Q8	58.8 tok/s	78GB
Qwen/Qwen3-Next-80B-A3B-Thinking-FP8	Q8	57.6 tok/s	78GB
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	Q8	52.1 tok/s	78GB
Qwen/Qwen3-Next-80B-A3B-Thinking	Q8	49.8 tok/s	78GB
Qwen/Qwen3-Coder-Next	Q8	57.4 tok/s	90GB
Qwen/Qwen2.5-72B-Instruct	Q8	56.1 tok/s	71GB

Multi-GPU Performance: Estimates based on tensor parallelism across 4x GPUs. Single GPU would achieve ~54% of these speeds.

VRAM Requirements & Usage

What this 96GB configuration can and cannot run

✅ What this setup can do

✓Run Llama 70B Q4 (needs 40GB)
✓Fine-tune 13B models with QLoRA
✓Serve multiple 7B models in parallel
✓Train custom models up to 6B parameters
✓Fast Stable Diffusion XL generation

❌ What it can't do

✗Run Llama 405B (needs 200GB+ VRAM)
✗Full fine-tuning of 70B models (needs A100/H100)
✗Multi-GPU training on 100B+ models

Total VRAM:96GB

Usable for single model:94GB (after OS overhead)

Verified Sources

All specs compiled from public sources

VIDEO
LTT Labs GPU testing for AI workloads
August 20, 2024
VIDEO
LTT builds $100,000 AI workstation
June 15, 2024
ARTICLE
LTT Labs - Scientific testing methodology

Frequently Asked Questions

Can this setup run Llama 3 70B?

Yes. 4x RTX 4090 provides 96GB VRAM, enough for Llama 70B Q4 quantization (needs ~40GB). Expect 105.2 tok/s with tensor parallelism.

How much does it cost to build a similar rig?

$25,295 total. Budget alternatives: Single RTX 4090 (~$4,200) or RTX 4080 (~$2,400) for smaller models.

What models can't this run?

Llama 405B and similar 400B+ models need 200GB+ VRAM (requires 8x A100 or H100 GPUs). This 96GB setup handles up to 70B models.

Is multi-GPU worth it vs single GPU?

Yes for 70B models. Single GPU runs 70B at ~57 tok/s vs 105.2 tok/s multi-GPU. For 7B-13B models, single GPU is sufficient.

Similar AI Setups

Other workstations and builds you might find interesting

PewDiePie

2x RTX 4000 Ada 20GB • $39,500 total

View setup →

MKBHD

2x RTX 4090 • $11,197 total

View setup →

Lex Fridman

2x RTX 4090 • $5,956 total

View setup →