How to Build an Open-Source Alternative to OpenAI’s Deep Research

Sebastian Petrus
3 min readFeb 8, 2025

--

Open-source AI tools are revolutionizing how we approach complex research tasks. In this guide, we’ll break down how to recreate a system like OpenAI’s Deep Research using Jina AI’s open-source projectnode-DeepResearch. Let’s dive in!

Pro Tip: Looking for an all-in-one API development tool? Apidog simplifies API design, testing, mocking, and documentation — so you can streamline your workflow without switching between multiple tools. Try Apidog today and boost your API development efficiency! 🚀

What is DeepResearch?

DeepResearch mimics human-like research workflows: it searches, reads, and refines answers iteratively until it finds a definitive response or exhausts computational resources (a “token budget”). Think of it as an AI research assistant that:

  1. Takes a query (e.g., “Who is bigger: Cohere, Jina AI, or Voyage?”).
  2. Loops through actions: searching the web, reading content, reflecting on gaps, or answering if confident.

Step 1: Installation & Setup

Requirements

API Keys:

  • GEMINI_API_KEY: For Google’s Gemini language model.
  • JINA_API_KEY: Free key from Jina Reader.
  • BRAVE_API_KEY (optional): For Brave Search; defaults to DuckDuckGo.

Quick Start

# Clone the repo and install dependencies
git clone https://github.com/jina-ai/node-DeepResearch.git
cd node-DeepResearch
npm install

# Set API keys (replace ... with your keys)
export GEMINI_API_KEY=...
export JINA_API_KEY=jina_...
export BRAVE_API_KEY=... # Optional

# Run queries
npm run dev "What is Jina AI's 2025 strategy?"

Step 2:Core Components Explained

1. agent.ts – The Brain

This file orchestrates the research loop. Key features:

  • Prompt Engineering: Guides the AI with context, prior actions, and failure analysis.
  • Action Selection: Chooses between searching, reading, reflecting, or answering.
  • Token Tracking: Prevents exceeding computational limits.

Example Workflow:

// Simplified loop logic
while (tokenBudget > 0) {
generatePrompt();
const action = await getAIAction(); // e.g., "search" or "answer"
executeAction(action);
updateTokenTracker();
}

2. config.ts – Environment Setup

  • Loads API keys and configures search providers (Brave/DuckDuckGo).
  • Adjusts AI model settings (e.g., creativity vs. determinism).

3. server.ts – Web API

  • Exposes endpoints for submitting queries and streaming real-time progress.
  • Uses Server-Sent Events (SSE) to push updates like:
{
"type": "progress",
"step": 7,
"action": "search",
"searchQuery": "Jina AI investor ownership percentages"
}

4. types.ts – Data Structures

Defines actions (e.g., SearchAction, AnswerAction) to ensure consistent responses across modules.

Step 3: Iterative Research Process

  1. Start: The system begins with your query (e.g., “Who will be US president in 2028?”).
  2. Search: Generates keywords and fetches URLs from search engines.
  3. Read: Extracts content from webpages using the Jina Reader.
  4. Reflect: Identifies knowledge gaps and creates follow-up questions.
  5. Answer: Delivers a response once confident or switches to “Beast Mode” for a final guess.

Step 4: Real-Time Feedback

Track progress via the web server’s streaming API:

  • Monitor token usage.
  • View actions taken (e.g., “Visited 5 URLs”).
  • Debug failed attempts.

Step 5: Customization & Best Practices

Extend the System

  • Add Search Engines: Integrate Google Custom Search or Serper.dev.
  • Enhance Reading: Use NLP models like spaCy for better text analysis.
  • Build a UI: Create a React app for interactive queries.

Security & Performance

  • Securely store API keys using tools like dotenv or Kubernetes secrets.
  • Implement rate limiting to avoid API bans.
  • Validate inputs to prevent malicious queries.

Why This Matters

Projects like node-DeepResearch democratize AI research by:

  • Providing transparency into how AI systems reason.
  • Allowing customization for domain-specific tasks (e.g., medical research or market analysis).
  • Reducing reliance on closed-source tools like ChatGPT.

Final Thoughts

Ready to experiment? Clone the repo, run a few queries, and tweak the code to fit your needs. Whether you’re building a research assistant or exploring AI workflows, this open-source project is a powerful starting point.

--

--

Sebastian Petrus
Sebastian Petrus

Written by Sebastian Petrus

Asist Prof @U of Waterloo, AI/ML, e/acc

Responses (4)