How to Build an Open-Source Alternative to OpenAI’s Deep Research
Open-source AI tools are revolutionizing how we approach complex research tasks. In this guide, we’ll break down how to recreate a system like OpenAI’s Deep Research using Jina AI’s open-source project — node-DeepResearch. Let’s dive in!
Pro Tip: Looking for an all-in-one API development tool? Apidog simplifies API design, testing, mocking, and documentation — so you can streamline your workflow without switching between multiple tools. Try Apidog today and boost your API development efficiency! 🚀
What is DeepResearch?
DeepResearch mimics human-like research workflows: it searches, reads, and refines answers iteratively until it finds a definitive response or exhausts computational resources (a “token budget”). Think of it as an AI research assistant that:
- Takes a query (e.g., “Who is bigger: Cohere, Jina AI, or Voyage?”).
- Loops through actions: searching the web, reading content, reflecting on gaps, or answering if confident.
Step 1: Installation & Setup
Requirements
API Keys:
GEMINI_API_KEY
: For Google’s Gemini language model.JINA_API_KEY
: Free key from Jina Reader.BRAVE_API_KEY
(optional): For Brave Search; defaults to DuckDuckGo.
Quick Start
# Clone the repo and install dependencies
git clone https://github.com/jina-ai/node-DeepResearch.git
cd node-DeepResearch
npm install
# Set API keys (replace ... with your keys)
export GEMINI_API_KEY=...
export JINA_API_KEY=jina_...
export BRAVE_API_KEY=... # Optional
# Run queries
npm run dev "What is Jina AI's 2025 strategy?"
Step 2:Core Components Explained
1. agent.ts
– The Brain
This file orchestrates the research loop. Key features:
- Prompt Engineering: Guides the AI with context, prior actions, and failure analysis.
- Action Selection: Chooses between searching, reading, reflecting, or answering.
- Token Tracking: Prevents exceeding computational limits.
Example Workflow:
// Simplified loop logic
while (tokenBudget > 0) {
generatePrompt();
const action = await getAIAction(); // e.g., "search" or "answer"
executeAction(action);
updateTokenTracker();
}
2. config.ts
– Environment Setup
- Loads API keys and configures search providers (Brave/DuckDuckGo).
- Adjusts AI model settings (e.g., creativity vs. determinism).
3. server.ts
– Web API
- Exposes endpoints for submitting queries and streaming real-time progress.
- Uses Server-Sent Events (SSE) to push updates like:
{
"type": "progress",
"step": 7,
"action": "search",
"searchQuery": "Jina AI investor ownership percentages"
}
4. types.ts
– Data Structures
Defines actions (e.g., SearchAction
, AnswerAction
) to ensure consistent responses across modules.
Step 3: Iterative Research Process
- Start: The system begins with your query (e.g., “Who will be US president in 2028?”).
- Search: Generates keywords and fetches URLs from search engines.
- Read: Extracts content from webpages using the Jina Reader.
- Reflect: Identifies knowledge gaps and creates follow-up questions.
- Answer: Delivers a response once confident or switches to “Beast Mode” for a final guess.
Step 4: Real-Time Feedback
Track progress via the web server’s streaming API:
- Monitor token usage.
- View actions taken (e.g., “Visited 5 URLs”).
- Debug failed attempts.
Step 5: Customization & Best Practices
Extend the System
- Add Search Engines: Integrate Google Custom Search or Serper.dev.
- Enhance Reading: Use NLP models like spaCy for better text analysis.
- Build a UI: Create a React app for interactive queries.
Security & Performance
- Securely store API keys using tools like
dotenv
or Kubernetes secrets. - Implement rate limiting to avoid API bans.
- Validate inputs to prevent malicious queries.
Why This Matters
Projects like node-DeepResearch democratize AI research by:
- Providing transparency into how AI systems reason.
- Allowing customization for domain-specific tasks (e.g., medical research or market analysis).
- Reducing reliance on closed-source tools like ChatGPT.
Final Thoughts
Ready to experiment? Clone the repo, run a few queries, and tweak the code to fit your needs. Whether you’re building a research assistant or exploring AI workflows, this open-source project is a powerful starting point.