Complete guide to running Llama 3 on your own hardware
Running Llama locally gives you complete privacy, no API costs, and the ability to customize for your needs. This guide walks through setting up Llama 3 on Windows, Mac, or Linux using Jan - a beautiful desktop app that makes local AI simple.
Jan is a free, open-source desktop app that runs AI models locally. It has a beautiful ChatGPT-like interface and a built-in Model Hub.
# Download from:
https://jan.ai/download
# Available for: Windows, macOS, Linux
# Just download, install, and launch!š” Jan automatically detects your GPU (NVIDIA, AMD, or Apple Silicon) and configures optimal settings.
Open Jan and click on the Model Hub tab. Search for 'Llama 3' to see all available versions.
In Jan Model Hub, search for:
⢠"Llama 3.1 8B" - Best for 8-12GB VRAM
⢠"Llama 3.1 70B" - Best for 24GB+ VRAM
⢠"Llama 3.2 3B" - Best for 6-8GB VRAM (smaller, faster)š” The Model Hub shows VRAM requirements for each model so you can pick the right size.
Click the download button next to the model you want. Jan will download it to your local machine.
Download sizes:
⢠Llama 3.2 3B Q4: ~2GB
⢠Llama 3.1 8B Q4: ~5GB
⢠Llama 3.1 70B Q4: ~40GBš” First download takes a few minutes depending on your internet speed. Models are stored locally forever.
Once downloaded, click on the model to start a new chat. Everything runs 100% locally - no internet needed after download.
You: Hello! What can you help me with?
Llama: I'm an AI assistant running entirely on your
local hardware. I can help with writing, coding,
analysis, and much more...ā Model runs slowly or uses CPU
ā Go to Settings > Advanced and check that GPU acceleration is enabled. Make sure you have the latest GPU drivers installed.
ā Not enough VRAM error
ā Try a smaller model (Llama 3.2 3B) or one with more aggressive quantization (Q4 instead of Q8). The Model Hub shows VRAM requirements.
ā Jan won't launch
ā On Windows, try running as Administrator. On Mac, check Security & Privacy to allow the app. Make sure you have 4GB+ free RAM.