20-30 minutesIntermediate

How to Run LLaVA Locally

Understand images with this vision-language model

LLaVA (Large Language and Vision Assistant) combines language understanding with image analysis. Ask questions about images, get descriptions, and extract information locally.

Hardware Requirements

GPU VRAMMin: 8GB (LLaVA 7B)Rec: 16GB (LLaVA 13B)Vision models need more VRAM than text-only

System RAMMin: 16GBRec: 32GB

StorageMin: 20GB freeRec: 50GB SSD

Step-by-Step Guide

1Install Jan

Jan supports multimodal models including LLaVA.

# Download: https://jan.ai/download

2Find LLaVA in Model Hub

Search 'LLaVA' or 'vision' in the Model Hub.

Vision models available:
• LLaVA 1.6 7B - Good balance
• LLaVA 1.6 13B - Better quality
• Llama 3.2 Vision - Latest option

3Use with Images

Drag and drop images into the chat to analyze them.

Example prompts:
• "What's in this image?"
• "Describe the chart data"
• "Read the text in this screenshot"
• "What breed is this dog?"

Recommended GPUs

Budget

RTX 4060 Ti 16GB

Runs LLaVA 13B with room for images.

View GPU

Recommended

RTX 4070 Ti Super 16GB

Fast vision processing with headroom.

View GPU

Troubleshooting

❓ Image not processing

✅ Ensure image is under 10MB. JPEG and PNG work best. Try resizing large images.

❓ Slow responses with images

✅ Vision models process images first, then generate text. This takes longer than text-only.

Related Guides

Run Llama Locally Run Gemma Locally