Top 10 RAG Frameworks Github Repos 2024

10 min readSep 4, 2024

Retrieval-Augmented Generation (RAG) has emerged as a powerful technique for enhancing the capabilities of large language models.

RAG frameworks combine the strengths of retrieval-based systems with generative models, allowing for more accurate, context-aware, and up-to-date responses. As the demand for sophisticated AI solutions grows, numerous open-source RAG frameworks have surfaced on GitHub, each offering unique features and capabilities.

What Does RAG Framework Do?

Retrieval-Augmented Generation (RAG) is an AI framework that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.

RAG works by retrieving relevant information from a knowledge base and using it to augment the LLM’s input, allowing the model to generate more accurate, up-to-date, and contextually relevant responses.

This approach helps overcome limitations such as knowledge cutoff dates and reduces the risk of hallucinations in LLM outputs.

Why Can’t I Just Use LangChain?

While LangChain is a powerful tool for building LLM applications, it’s not a direct alternative to RAG. Instead, LangChain can be used to implement RAG systems. Here’s why you might need RAG in addition to LangChain:

External knowledge: RAG allows you to incorporate domain-specific or up-to-date information that may not be present in the LLM’s training data.
Improved accuracy: By grounding responses in retrieved information, RAG can significantly reduce errors and hallucinations.
Customization: RAG enables you to tailor responses to specific datasets or knowledge bases, which is crucial for many business applications.
Transparency: RAG makes it easier to trace the sources of information used in generating responses, improving auditability.

In essence, while LangChain provides the tools and abstractions to build LLM applications, RAG is a specific technique that can be implemented using LangChain to enhance the quality and reliability of LLM outputs.

Top 10 Best RAG Frameworks on GitHub that You Can Use Now

In this article, we’ll explore the top 10 RAG frameworks currently available on GitHub. These frameworks represent the cutting edge of RAG technology and are worth investigating for developers, researchers, and organizations looking to implement or improve their AI-powered applications.

1. Haystack by deepset-ai

GitHub Stars: 14.6k stars

Haystack is a powerful and flexible framework for building end-to-end question answering and search systems. It offers a modular architecture that allows developers to easily create pipelines for various NLP tasks, including document retrieval, question answering, and summarization.Key features of Haystack include:

Support for multiple document stores (Elasticsearch, FAISS, SQL, etc.)
Integration with popular language models (BERT, RoBERTa, DPR, etc.)
Scalable architecture for processing large volumes of documents
Easy-to-use API for building custom NLP pipelines

Haystack’s versatility and extensive documentation make it an excellent choice for both beginners and experienced developers looking to implement RAG systems.

GitHub — deepset-ai/haystack: :mag: LLM orchestration framework to build customizable…

mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models…

github.com

2. RAGFlow by infiniflow

GitHub Stars: 11.6k

RAGFlow is a relatively new entrant in the RAG framework space, but it has quickly gained traction due to its focus on simplicity and efficiency. This framework aims to streamline the process of building RAG-based applications by providing a set of pre-built components and workflows.Notable features of RAGFlow include:

Intuitive workflow design interface
Pre-configured RAG pipelines for common use cases
Integration with popular vector databases
Support for custom embedding models

RAGFlow’s user-friendly approach makes it an attractive option for developers who want to quickly prototype and deploy RAG applications without diving deep into the underlying complexities.

GitHub — infiniflow/ragflow: RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine…

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. …

github.com

3. txtai by neuml

GitHub Stars: 7.5k

txtai is a versatile AI-powered data platform that goes beyond traditional RAG frameworks. It offers a comprehensive suite of tools for building semantic search, language model workflows, and document processing pipelines.Key capabilities of txtai include:

Embeddings database for efficient similarity search
API for integrating language models and other AI services
Extensible architecture for custom workflows
Support for multiple languages and data types

txtai’s all-in-one approach makes it an excellent choice for organizations looking to implement a wide range of AI-powered features within a single framework.

GitHub — neuml/txtai: 💡 All-in-one open-source embeddings database for semantic search, LLM…

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows …

github.com

4. STORM by stanford-oval

GitHub Stars: 5k Stars

STORM (Stanford Open-source RAG Model) is a research-oriented RAG framework developed by Stanford University. While it may have fewer stars compared to some other frameworks, its academic pedigree and focus on cutting-edge techniques make it a valuable resource for researchers and developers interested in the latest advancements in RAG technology.Notable aspects of STORM include:

Implementation of novel RAG algorithms and techniques
Focus on improving the accuracy and efficiency of retrieval mechanisms
Integration with state-of-the-art language models
Extensive documentation and research papers

For those looking to explore the frontiers of RAG technology, STORM offers a solid foundation backed by academic rigor.

GitHub — stanford-oval/storm: An LLM-powered knowledge curation system that researches a topic and…

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations. …

github.com

5. LLM-App by pathwaycom

GitHub Stars: 3.4K

LLM-App is a collection of templates and tools for building dynamic RAG applications. It stands out for its focus on real-time data synchronization and containerized deployment.Key features of LLM-App include:

Ready-to-run Docker containers for quick deployment
Support for dynamic data sources and real-time updates
Integration with popular LLMs and vector databases
Customizable templates for various RAG use cases

LLM-App’s emphasis on operational aspects and real-time capabilities makes it an attractive option for organizations looking to deploy production-ready RAG systems.

GitHub — pathwaycom/llm-app: Dynamic RAG for enterprise. Ready to run with Docker,⚡in sync with…

Dynamic RAG for enterprise. Ready to run with Docker,⚡in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL…

github.com

6. Cognita by truefoundry

GitHub Stars: 3k stars

Cognita is a newer entrant in the RAG framework space, focusing on providing a unified platform for building and deploying AI applications. While it has fewer stars compared to some other frameworks, its comprehensive approach and emphasis on MLOps principles make it worth considering.Notable features of Cognita include:

End-to-end platform for RAG application development
Integration with popular ML frameworks and tools
Built-in monitoring and observability features
Support for model versioning and experiment tracking

Cognita’s holistic approach to AI application development makes it a compelling choice for organizations looking to streamline their entire ML lifecycle.

GitHub — truefoundry/cognita: RAG (Retrieval Augmented Generation) Framework for building modular…

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by…

github.com

7. R2R by SciPhi-AI

GitHub Stars: 2.5k stars

R2R (Retrieval-to-Retrieval) is a specialized RAG framework that focuses on improving the retrieval process through iterative refinement. While it may have fewer stars, its innovative approach to retrieval makes it a framework to watch.Key aspects of R2R include:

Implementation of novel retrieval algorithms
Support for multi-step retrieval processes
Integration with various embedding models and vector stores
Tools for analyzing and visualizing retrieval performance

For developers and researchers interested in pushing the boundaries of retrieval techniques, R2R offers a unique and powerful set of tools.

8. Neurite by satellitecomponent

GitHub Stars: 909 stars

Neurite is an emerging RAG framework that aims to simplify the process of building AI-powered applications. While it has a smaller user base compared to some other frameworks, its focus on developer experience and rapid prototyping makes it worth exploring.Notable features of Neurite include:

Intuitive API for building RAG pipelines
Support for multiple data sources and embedding models
Built-in caching and optimization mechanisms
Extensible architecture for custom components

Neurite’s emphasis on simplicity and flexibility makes it an attractive option for developers looking to quickly implement RAG functionality in their applications.

GitHub — satellitecomponent/Neurite: Fractal Graph-of-Thought. Experimental Mind-Mapping for…

Fractal Graph-of-Thought. Experimental Mind-Mapping for Ai-Agents, Web-Links, Notes, and Code. …

github.com

9. FlashRAG by RUC-NLPIR

GitHub Stars: 905 Stars

FlashRAG is a lightweight and efficient RAG framework developed by the Natural Language Processing & Information Retrieval Lab at Renmin University of China. While it may have fewer stars, its focus on performance and efficiency makes it a noteworthy contender.Key aspects of FlashRAG include:

Optimized retrieval algorithms for improved speed
Support for distributed processing and scaling
Integration with popular language models and vector stores
Tools for benchmarking and performance analysis

For applications where speed and efficiency are critical, FlashRAG offers a specialized set of tools and optimizations.

GitHub — RUC-NLPIR/FlashRAG: ⚡FlashRAG: A Python Toolkit for Efficient RAG Research

⚡FlashRAG: A Python Toolkit for Efficient RAG Research — RUC-NLPIR/FlashRAG

github.com

10. Canopy by pinecone-io

GitHub Stars: 923

Canopy is a RAG framework developed by Pinecone, a company known for its vector database technology. It leverages Pinecone’s expertise in efficient vector search to provide a powerful and scalable RAG solution.Notable features of Canopy include:

Tight integration with Pinecone’s vector database
Support for streaming and real-time updates
Advanced query processing and reranking capabilities
Tools for managing and versioning knowledge bases

Canopy’s focus on scalability and integration with Pinecone’s ecosystem makes it an excellent choice for organizations already using or considering Pinecone for their vector search needs.

GitHub — pinecone-io/canopy: Retrieval Augmented Generation (RAG) framework and context engine…

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone — pinecone-io/canopy

github.com

Before we conclude, let’s talk about something that we all face during development: API Testing with Postman for your Development Team.

Yeah, I’ve heard of it as well, Postman is getting worse year by year, but, you are working as a team and you need some collaboration tools for your development process, right? So you paid Postman Enterprise for…. $49/month.

Now I am telling you: You Don’t Have to:

APIDog: You Get Everything from Postman Paid Version, But CHEAPER

That’s right, APIDog gives you all the features that comes with Postman paid version, at a fraction of the cost. Migration has been so easily that you only need to click a few buttons, and APIDog will do everything for you.

How to Migrate Postman Collections/Environments to Apidog

Postman is the most widely used API debugging tool globally. However, it has many shortcomings. We will explore…

apidog.com

APIDog has a comprehensive, easy to use GUI that makes you spend no time to get started working (If you have migrated from Postman). It’s elegant, collaborate, easy to use, with Dark Mode too!

APIDog makes you very easy to migrate from Postman with No Learning Curve

Want a Good Alternative to Postman? APIDog is definitely worth a shot. But if you are the Tech Lead of a Dev Team that really want to dump Postman for something Better, and Cheaper, Check out APIDog!

Apidog An integrated platform for API design, debugging, development, mock, and testing

REAL API Design-first Development Platform. Design. Debug. Test. Document. Mock. Build APIs Faster & Together.

apidog.com

Conclusion

The world of RAG frameworks is diverse and rapidly evolving, with each of the ten frameworks we’ve explored offering unique strengths and capabilities. From the comprehensive and well-established Haystack to emerging specialized frameworks like FlashRAG and R2R, there’s a solution to fit a wide range of needs and use cases.When choosing a RAG framework, consider factors such as:

The specific requirements of your project
The level of customization and flexibility you need
The scalability and performance characteristics of the framework
The size and activity of the community around the framework
The quality of documentation and support available

By carefully evaluating these factors and experimenting with different frameworks, you can find the RAG solution that best fits your needs and helps you build more intelligent, context-aware AI applications.As the field of AI continues to advance, we can expect these frameworks to evolve and new ones to emerge. Staying informed about the latest developments in RAG technology will be crucial for developers and organizations looking to leverage the power of AI in their applications and services.

Top 10 RAG Frameworks Github Repos 2024

What Does RAG Framework Do?

Why Can’t I Just Use LangChain?

Top 10 Best RAG Frameworks on GitHub that You Can Use Now

1. Haystack by deepset-ai

GitHub — deepset-ai/haystack: :mag: LLM orchestration framework to build customizable…

mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models…

2. RAGFlow by infiniflow

GitHub — infiniflow/ragflow: RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine…

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. …

3. txtai by neuml

GitHub — neuml/txtai: 💡 All-in-one open-source embeddings database for semantic search, LLM…

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows …

4. STORM by stanford-oval

GitHub — stanford-oval/storm: An LLM-powered knowledge curation system that researches a topic and…

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations. …

5. LLM-App by pathwaycom

GitHub — pathwaycom/llm-app: Dynamic RAG for enterprise. Ready to run with Docker,⚡in sync with…

Dynamic RAG for enterprise. Ready to run with Docker,⚡in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL…

6. Cognita by truefoundry

GitHub — truefoundry/cognita: RAG (Retrieval Augmented Generation) Framework for building modular…

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by…

7. R2R by SciPhi-AI

8. Neurite by satellitecomponent

GitHub — satellitecomponent/Neurite: Fractal Graph-of-Thought. Experimental Mind-Mapping for…

Fractal Graph-of-Thought. Experimental Mind-Mapping for Ai-Agents, Web-Links, Notes, and Code. …

9. FlashRAG by RUC-NLPIR

GitHub — RUC-NLPIR/FlashRAG: ⚡FlashRAG: A Python Toolkit for Efficient RAG Research

⚡FlashRAG: A Python Toolkit for Efficient RAG Research — RUC-NLPIR/FlashRAG

10. Canopy by pinecone-io

GitHub — pinecone-io/canopy: Retrieval Augmented Generation (RAG) framework and context engine…

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone — pinecone-io/canopy

How to Migrate Postman Collections/Environments to Apidog

Postman is the most widely used API debugging tool globally. However, it has many shortcomings. We will explore…

Apidog An integrated platform for API design, debugging, development, mock, and testing

REAL API Design-first Development Platform. Design. Debug. Test. Document. Mock. Build APIs Faster & Together.

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sebastian Petrus

Responses (11)