Developing RAG Systems with DeepSeek R1 & Ollama (Complete Code Included)

4 min readJan 24, 2025

Ever wished you could directly ask questions to a PDF or technical manual? This guide will show you how to build a Retrieval-Augmented Generation (RAG) system using DeepSeek R1, an open-source reasoning tool, and Ollama, a lightweight framework for running local AI models.

Pro tip: Streamline API Testing with Apidog

Looking to simplify your API workflows? Apidog acts as an all-in-one solution for creating, managing, and running tests and mock servers. With Apidog, you can:

Automate critical workflows without juggling multiple tools or writing extensive scripts.
Maintain smooth CI/CD pipelines.
Identify bottlenecks to ensure API reliability.

Save time and focus on perfecting your product. Ready to try it? Give Apidog a spin!

Why DeepSeek R1?

DeepSeek R1, a model comparable to OpenAI’s o1 but 95% cheaper, is revolutionizing RAG systems. Developers love it for its:

Focused retrieval: Uses only 3 document chunks per answer.
Strict prompting: Avoids hallucinations with an “I don’t know” response.
Local execution: Eliminates cloud API latency.

What You’ll Need to Build a Local RAG System

1. Ollama

Ollama lets you run models like DeepSeek R1 locally.

Download: Ollama
Setup: Install and run the following command via your terminal.

ollama run deepseek-r1  # For the 7B model (default)

2. DeepSeek R1 Model Variants

DeepSeek R1 ranges from 1.5B to 671B parameters. Start small with the 1.5B model for lightweight RAG applications.

ollama run deepseek-r1:1.5b

Pro Tip: Larger models (e.g., 70B) offer better reasoning but need more RAM.

Step-by-Step Guide to Building the RAG Pipeline

Step 1: Import Libraries

We’ll use:

LangChain for document processing and retrieval.
Streamlit for the user-friendly web interface.

import streamlit as st  
from langchain_community.document_loaders import PDFPlumberLoader  
from langchain_experimental.text_splitter import SemanticChunker  
from langchain_community.embeddings import HuggingFaceEmbeddings  
from langchain_community.vectorstores import FAISS  
from langchain_community.llms import Ollama

Step 2: Upload & Process PDFs

Leverage Streamlit’s file uploader to select a local PDF. Use PDFPlumberLoader to extract text efficiently without manual parsing.

# Streamlit file uploader  
uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")  

if uploaded_file:  
    # Save PDF temporarily  
    with open("temp.pdf", "wb") as f:  
        f.write(uploaded_file.getvalue())  

    # Load PDF text  
    loader = PDFPlumberLoader("temp.pdf")  
    docs = loader.load()

Step 3: Chunk Documents Strategically

Leverage Streamlit’s file uploader to select a local PDF. Use PDFPlumberLoader to extract text efficiently without manual parsing.

# Split text into semantic chunks  
text_splitter = SemanticChunker(HuggingFaceEmbeddings())  
documents = text_splitter.split_documents(docs)

Step 4: Create a Searchable Knowledge Base

Generate vector embeddings for the chunks and store them in a FAISS index.

Embeddings allow fast, contextually relevant searches.

# Generate embeddings  
embeddings = HuggingFaceEmbeddings()  
vector_store = FAISS.from_documents(documents, embeddings)  

# Connect retriever  
retriever = vector_store.as_retriever(search_kwargs={"k": 3})  # Fetch top 3 chunks

Step 5: Configure DeepSeek R1

Set up a RetrievalQA chain using the DeepSeek R1 1.5B model.

This ensures answers are grounded in the PDF’s content rather than relying on the model’s training data.

llm = Ollama(model="deepseek-r1:1.5b")  # Our 1.5B parameter model  

# Craft the prompt template  
prompt = """  
1. Use ONLY the context below.  
2. If unsure, say "I don’t know".  
3. Keep answers under 4 sentences.  

Context: {context}  

Question: {question}  

Answer:  
"""  
QA_CHAIN_PROMPT = PromptTemplate.from_template(prompt)

Step 6: Assemble the RAG Chain

Integrate uploading, chunking, and retrieval into a cohesive pipeline.

This approach gives the model verified context, enhancing accuracy.

# Chain 1: Generate answers  
llm_chain = LLMChain(llm=llm, prompt=QA_CHAIN_PROMPT)  

# Chain 2: Combine document chunks  
document_prompt = PromptTemplate(  
    template="Context:\ncontent:{page_content}\nsource:{source}",  
    input_variables=["page_content", "source"]  
)  

# Final RAG pipeline  
qa = RetrievalQA(  
    combine_documents_chain=StuffDocumentsChain(  
        llm_chain=llm_chain,  
        document_prompt=document_prompt  
    ),  
    retriever=retriever  
)

Step 7: Launch the Web Interface

Launch the Web Interface

Streamlit enables users to type questions and receive instant answers.

Queries retrieve matching chunks, feed them to the model, and display results in real-time.

# Streamlit UI  
user_input = st.text_input("Ask your PDF a question:")  

if user_input:  
    with st.spinner("Thinking..."):  
        response = qa(user_input)["result"]  
        st.write(response)

You can find the complete code here: https://gist.github.com/lisakim0/0204d7504d17cefceaf2d37261c1b7d5

The Future of RAG with DeepSeek

DeepSeek R1 is just the beginning. With upcoming features like self-verification and multi-hop reasoning, future RAG systems could debate and refine their logic autonomously.

Build your own RAG system today and unlock the full potential of document-based AI!

Source of this article: How to Run Deepseek R1 Locally Using Ollama ?