//
Understand images with this vision-language model
LLaVA (Large Language and Vision Assistant) combines language understanding with image analysis. Ask questions about images, get descriptions, and extract information locally.
Jan supports multimodal models including LLaVA.
# Download: https://jan.ai/downloadSearch 'LLaVA' or 'vision' in the Model Hub.
Vision models available:
• LLaVA 1.6 7B - Good balance
• LLaVA 1.6 13B - Better quality
• Llama 3.2 Vision - Latest optionDrag and drop images into the chat to analyze them.
Example prompts:
• "What's in this image?"
• "Describe the chart data"
• "Read the text in this screenshot"
• "What breed is this dog?"❓ Image not processing
✅ Ensure image is under 10MB. JPEG and PNG work best. Try resizing large images.
❓ Slow responses with images
✅ Vision models process images first, then generate text. This takes longer than text-only.