//
Give AI access to your documents
RAG (Retrieval Augmented Generation) lets LLMs answer questions using your specific documents. Perfect for company knowledge bases, research papers, or personal notes.
Set up LangChain and a vector database.
pip install langchain chromadb sentence-transformers
pip install llama-cpp-python # For local LLMIngest PDFs, text files, or web pages.
from langchain.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load documents
loader = DirectoryLoader('./docs', glob="**/*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()
# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)Embed documents and store in ChromaDB.
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)Ask questions about your documents.
from langchain.chains import RetrievalQA
from langchain.llms import LlamaCpp
llm = LlamaCpp(model_path="./models/llama-3-8b.gguf", n_ctx=4096)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever(),
)
answer = qa_chain.run("What does the policy say about refunds?")❓ Irrelevant answers
✅ Improve chunking strategy. Use better embedding models. Increase number of retrieved documents.
❓ Slow retrieval
✅ Use GPU-accelerated embeddings. Consider smaller embedding models for speed.