10-20 minutesBeginner

How to Run Llama Locally

Complete guide to running Llama 3 on your own hardware

Running Llama locally gives you complete privacy, no API costs, and the ability to customize for your needs. This guide walks through setting up Llama 3 on Windows, Mac, or Linux using Jan - a beautiful desktop app that makes local AI simple.

Hardware Requirements

GPU VRAMMin: 8GB (Llama 3 8B Q4)Rec: 24GB (Llama 3 70B Q4)More VRAM = larger models or faster speed

System RAMMin: 16GBRec: 32GBFor loading model weights

StorageMin: 20GB freeRec: 100GB+ SSDModels are 4-40GB each

Step-by-Step Guide

1Download and Install Jan

Jan is a free, open-source desktop app that runs AI models locally. It has a beautiful ChatGPT-like interface and a built-in Model Hub.

# Download from:
https://jan.ai/download

# Available for: Windows, macOS, Linux
# Just download, install, and launch!

💡 Jan automatically detects your GPU (NVIDIA, AMD, or Apple Silicon) and configures optimal settings.

2Find Llama in the Model Hub

Open Jan and click on the Model Hub tab. Search for 'Llama 3' to see all available versions.

In Jan Model Hub, search for:
• "Llama 3.1 8B" - Best for 8-12GB VRAM
• "Llama 3.1 70B" - Best for 24GB+ VRAM
• "Llama 3.2 3B" - Best for 6-8GB VRAM (smaller, faster)

💡 The Model Hub shows VRAM requirements for each model so you can pick the right size.

3Download Your Model

Click the download button next to the model you want. Jan will download it to your local machine.

Download sizes:
• Llama 3.2 3B Q4: ~2GB
• Llama 3.1 8B Q4: ~5GB  
• Llama 3.1 70B Q4: ~40GB

💡 First download takes a few minutes depending on your internet speed. Models are stored locally forever.

4Start Chatting!

Once downloaded, click on the model to start a new chat. Everything runs 100% locally - no internet needed after download.

You: Hello! What can you help me with?

Llama: I'm an AI assistant running entirely on your 
local hardware. I can help with writing, coding, 
analysis, and much more...

Recommended GPUs

Budget

RTX 3060 12GB

Runs Llama 3 8B at ~30 tok/s. Best value entry point.

View GPU

Recommended

RTX 4070 Ti Super 16GB

Runs Llama 3 8B at ~80 tok/s. Can handle 32B models.

View GPU

Premium

RTX 4090 24GB

Runs Llama 3 70B. Best consumer GPU for LLMs.

View GPU

Troubleshooting

❓ Model runs slowly or uses CPU

✅ Go to Settings > Advanced and check that GPU acceleration is enabled. Make sure you have the latest GPU drivers installed.

❓ Not enough VRAM error

✅ Try a smaller model (Llama 3.2 3B) or one with more aggressive quantization (Q4 instead of Q8). The Model Hub shows VRAM requirements.

❓ Jan won't launch

✅ On Windows, try running as Administrator. On Mac, check Security & Privacy to allow the app. Make sure you have 4GB+ free RAM.

Step-by-Step Guide

1Download and Install Jan

Jan is a free, open-source desktop app that runs AI models locally. It has a beautiful ChatGPT-like interface and a built-in Model Hub.

# Download from:
https://jan.ai/download

# Available for: Windows, macOS, Linux
# Just download, install, and launch!

💡 Jan automatically detects your GPU (NVIDIA, AMD, or Apple Silicon) and configures optimal settings.

2Find Llama in the Model Hub

Open Jan and click on the Model Hub tab. Search for 'Llama 3' to see all available versions.

In Jan Model Hub, search for:
• "Llama 3.1 8B" - Best for 8-12GB VRAM
• "Llama 3.1 70B" - Best for 24GB+ VRAM
• "Llama 3.2 3B" - Best for 6-8GB VRAM (smaller, faster)

💡 The Model Hub shows VRAM requirements for each model so you can pick the right size.

3Download Your Model

Click the download button next to the model you want. Jan will download it to your local machine.

Download sizes:
• Llama 3.2 3B Q4: ~2GB
• Llama 3.1 8B Q4: ~5GB  
• Llama 3.1 70B Q4: ~40GB

💡 First download takes a few minutes depending on your internet speed. Models are stored locally forever.

4Start Chatting!

Once downloaded, click on the model to start a new chat. Everything runs 100% locally - no internet needed after download.

You: Hello! What can you help me with?

Llama: I'm an AI assistant running entirely on your 
local hardware. I can help with writing, coding, 
analysis, and much more...

Troubleshooting

❓ Model runs slowly or uses CPU

✅ Go to Settings > Advanced and check that GPU acceleration is enabled. Make sure you have the latest GPU drivers installed.

❓ Not enough VRAM error

✅ Try a smaller model (Llama 3.2 3B) or one with more aggressive quantization (Q4 instead of Q8). The Model Hub shows VRAM requirements.

❓ Jan won't launch

✅ On Windows, try running as Administrator. On Mac, check Security & Privacy to allow the app. Make sure you have 4GB+ free RAM.

How to Run Llama Locally

Hardware Requirements

Step-by-Step Guide

Recommended GPUs

Troubleshooting

Related Guides

How to Run Llama Locally

Hardware Requirements

Step-by-Step Guide

Recommended GPUs

Troubleshooting

Related Guides