Pinecone? Milvus? PgVector Is 70% Faster and Cheaper and Open Source

10 min readSep 4, 2024

Isn’t it strange that, proprietary vector databases perform much worse that Free, Open Source alternatives?

PgVector vs Pinecone vs Milvus Vector Databases

While specialized solutions like Pinecone have gained popularity, recent developments in the PostgreSQL ecosystem are challenging their dominance.

Enter PgVector, powered by the new open-source extension PgVectorScale. This combination isn’t just matching Pinecone’s performance — it’s surpassing it, and at a fraction of the cost. Let’s dive deep into the technical details of how this is possible and what it means for developers and organizations alike.

Hey, if you are working with AI APIs, Apidog is here to make your life easier. It’s an all-in-one API development tool that streamlines the entire process — from design and documentation to testing and debugging.

Apidog — the all-in-one API development tool

The Vector Database Landscape, A Quick Overview

Vector databases are specialized systems designed to store and query vector embeddings, which are numerical representations of data points in high-dimensional spaces.

These embeddings are commonly used in various AI applications, including natural language processing, image recognition, and recommendation systems.
Traditionally, companies and start-ups perfer specialized solutions like Pinecone have dominated this space, offering purpose-built platforms for vector storage and retrieval. However, these solutions often come with a hefty price tag and the overhead of managing a separate system.
This is where PostgreSQL, with its PgVector extension and the new PgVectorScale addition, is turning heads.

What is PgVector, the Open Source Pinecone or Milvus Alternative?

PostgreSQL, what PgVector Extension Based on, Is Already the Most Popular Database

PgVector is an extension for PostgreSQL that adds vector similarity search capabilities to the popular open-source relational database.

It allows users to store vector embeddings directly in PostgreSQL tables and perform efficient nearest neighbor searches using specialized indexing techniques.

Here are the key factors:

Vector data type: A custom data type to represent high-dimensional vectors
Similarity search operators: Functions to compute distances between vectors
Indexing methods: Specialized indexes for efficient similarity searches

PgVectorScale: The Game Changer

PgVectorScale is an open-source extension developed by Timescale that supercharges PgVector’s capabilities. It’s not just an incremental improvement — it’s a leap forward that addresses some of the key limitations of vanilla PgVector. Let’s break down what makes PgVectorScale so special:

1. StreamingDiskANN Index

The cornerstone of PgVectorScale’s performance improvements is the StreamingDiskANN index. This novel index type is inspired by Microsoft’s DiskANN algorithm and offers several key advantages:

Disk-Based Storage

Unlike in-memory indexes like HNSW (Hierarchical Navigable Small World), StreamingDiskANN can store part of the index on disk. This dramatically reduces the cost of storing and searching large vector datasets, as SSDs are much cheaper than RAM.

Low-Latency, High-Throughput Search

Despite being disk-based, the index is optimized for fast retrieval, maintaining low query latencies even for large datasets.

Cost-Efficient Scaling

As vector workloads grow, StreamingDiskANN allows for more cost-effective scaling compared to purely in-memory solutions.

Streaming Model

In contrast to HNSW, which requires retrieving a fixed number of candidates before applying filters, StreamingDiskANN employs a streaming model. This allows for continuous retrieval of the next closest item, ensuring accurate results even with complex filtering conditions.

2. Statistical Binary Quantization (SBQ)

PgVectorScale implements a new compression technique called Statistical Binary Quantization:

Improved Accuracy: SBQ offers better accuracy-performance trade-offs compared to standard binary quantization methods.
Efficient Storage: By compressing vectors more effectively, SBQ reduces storage requirements without significantly impacting search quality.
Faster Search: The compressed representations allow for quicker similarity computations during search operations.

3. Massive Parallelization

One of the standout features of PgVectorScale is its ability to parallelize vector similarity searches across multiple CPU cores. This isn’t just a small optimization — we’re talking about potentially utilizing dozens or even hundreds of cores simultaneously. Here’s how it works:

Partitioning: PgVectorScale automatically partitions your vector data across multiple tables.
Concurrent Searches: When a query comes in, it’s split into sub-queries that run concurrently across these partitions.
Result Aggregation: The partial results are then efficiently combined to produce the final output.

The beauty of this approach is that it scales almost linearly with the number of CPU cores available. In practical terms, this means that as you throw more hardware at the problem, you get proportionally better performance.

4. Intelligent Query Planning

PgVectorScale doesn’t just blindly parallelize everything. It comes with a smart query planner that decides when and how to parallelize based on the specifics of each query. This intelligence is crucial because not all queries benefit equally from parallelization. The planner considers factors like:

The size of the dataset
The number of results requested
The available system resources

By making these decisions on the fly, PgVectorScale ensures that you’re always getting the most bang for your buck in terms of performance.

5. Adaptive Batch Processing

One of the clever tricks up PgVectorScale’s sleeve is its adaptive batch processing mechanism. Instead of processing vectors one at a time, it groups them into batches. The size of these batches is dynamically adjusted based on the current system load and the characteristics of the vectors being processed. This approach allows for better utilization of SIMD (Single Instruction, Multiple Data) operations and more efficient use of CPU caches.

Technical Deep Dive: How PgVectorScale Achieves Its Performance

Now that we’ve covered the high-level features, let’s roll up our sleeves and look at some code and technical details that make PgVectorScale tick.

Partitioning Strategy

PgVectorScale uses a technique called “hash partitioning” to distribute vectors across multiple tables. Here’s a simplified example of how this might look:

CREATE TABLE vectors (
  id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  embedding VECTOR(1536)
) PARTITION BY HASH (id);

CREATE TABLE vectors_0 PARTITION OF vectors
  FOR VALUES WITH (MODULUS 4, REMAINDER 0);CREATE TABLE vectors_1 PARTITION OF vectors
  FOR VALUES WITH (MODULUS 4, REMAINDER 1);CREATE TABLE vectors_2 PARTITION OF vectors
  FOR VALUES WITH (MODULUS 4, REMAINDER 2);CREATE TABLE vectors_3 PARTITION OF vectors
  FOR VALUES WITH (MODULUS 4, REMAINDER 3);

This setup creates four partitions, each containing roughly a quarter of the data. The MODULUS 4 part ensures an even distribution across the partitions.

Parallel Query Execution

When a similarity search query comes in, PgVectorScale generates a plan that looks something like this:

SELECT id, embedding, embedding <=> query_vector AS distance
FROM (
  SELECT id, embedding, embedding <=> query_vector AS distance
  FROM vectors_0
  UNION ALL
  SELECT id, embedding, embedding <=> query_vector AS distance
  FROM vectors_1
  UNION ALL
  SELECT id, embedding, embedding <=> query_vector AS distance
  FROM vectors_2
  UNION ALL
  SELECT id, embedding, embedding <=> query_vector AS distance
  FROM vectors_3
)
ORDER BY distance
LIMIT 10;

Each subquery within the UNION ALL can be executed in parallel on a different CPU core. The results are then merged and sorted to produce the final output.

SIMD Optimizations

PgVectorScale leverages SIMD instructions to speed up vector operations. For example, when computing the Euclidean distance between two vectors, it might use AVX-512 instructions to process 16 float values simultaneously:

#include <immintrin.h>

float euclidean_distance_avx512(const float* a, const float* b, int n) {
    __m512 sum = _mm512_setzero_ps();
    for (int i = 0; i < n; i += 16) {
        __m512 va = _mm512_loadu_ps(a + i);
        __m512 vb = _mm512_loadu_ps(b + i);
        __m512 diff = _mm512_sub_ps(va, vb);
        sum = _mm512_fmadd_ps(diff, diff, sum);
    }
    return _mm512_reduce_add_ps(sum);
}

This kind of low-level optimization allows PgVectorScale to squeeze every ounce of performance out of modern CPUs.

Performance Comparison: PgVector + PgVectorScale vs. Pinecone

To understand why PgVector with PgVectorScale outperforms Pinecone, let’s examine various aspects of their performance characteristics.

Benchmarking Results

The team at Timescale ran extensive benchmarks comparing PgVectorScale to Pinecone, and the results are eye-opening. Here’s a breakdown of their findings:

Setup

Dataset: 50 million Cohere embeddings, each with 768 dimensions
Hardware: 96 vCPU machine with 384 GB of RAM
Queries: k-NN search with k=10

Results

Query Throughput:

PgVectorScale: 1,200 queries per second
Pinecone (s1 index): 300 queries per second

PgVectorScale outperformed Pinecone’s storage-optimized index by a factor of 4 in terms of raw query throughput.

Latency (95th percentile):

PgVectorScale: 12ms
Pinecone (s1 index): 40ms

Not only was PgVectorScale faster on average, but it also provided more consistent performance with lower latency at the 95th percentile.

Recall:

Both achieved 99% recall for approximate nearest neighbor queries

Cost Efficiency:

PgVectorScale: $835 per month (self-hosted on AWS EC2)
Pinecone (s1 index): $3,241 per month
Pinecone (p2 index): $3,889 per month

PgVectorScale achieved this superior performance at just 25% of the cost of Pinecone’s storage-optimized index and 21% of the cost of their performance-optimized index.

Even when compared to Pinecone’s performance-optimized p2 index:

1.4x lower p95 latency
1.5x higher query throughput
90% recall (note: Pinecone doesn’t allow tuning for higher recall on p2)

Analysis the Results:

These benchmarks reveal several key insights:

Scalability: PgVectorScale’s ability to leverage multiple CPU cores allows it to scale much more effectively than Pinecone as you throw more hardware at the problem.
Consistency: The lower 95th percentile latency suggests that PgVectorScale provides more predictable performance, which is crucial for many real-time applications.
Cost-Effectiveness: The 75–79% cost savings is a game-changer, especially for organizations dealing with large-scale vector search operations.
Flexibility: PgVectorScale allows for fine-tuning of index parameters, enabling users to optimize for their specific use cases. Pinecone, in contrast, offers less control over these aspects.

Why PgVector + PgVectorScale Outperforms Pinecone

Efficient Disk-Based Storage: The StreamingDiskANN index allows PgVectorScale to efficiently store and query vectors on SSDs, which are much cheaper than RAM. This gives it a significant cost advantage over in-memory solutions like Pinecone.
Advanced Parallelization: PgVectorScale’s ability to parallelize queries across multiple CPU cores allows it to make better use of modern hardware, resulting in higher throughput and lower latency.
Intelligent Query Planning: The smart query planner in PgVectorScale can optimize query execution based on the specific characteristics of each query, leading to more consistent performance across different types of searches.
Statistical Binary Quantization: This compression technique allows PgVectorScale to store vectors more efficiently without significantly impacting search quality, contributing to both performance and cost-effectiveness.
Integration with PostgreSQL: By leveraging PostgreSQL’s mature ecosystem, PgVectorScale can take advantage of features like efficient data partitioning, which contributes to its scalability.
Streaming Post-Filtering: Unlike HNSW-based solutions (including Pinecone), PgVectorScale’s StreamingDiskANN index allows for accurate retrieval even when secondary filters are applied, addressing a limitation that Pinecone had previously highlighted when comparing itself to pgvector.

Implementation and Usage

Using PgVector with PgVectorScale is straightforward, especially for developers already familiar with PostgreSQL. Here’s a quick guide:

Step 1. Install the extensions:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS vectorscale;

Step 2. Create a table with a vector column:

CREATE TABLE IF NOT EXISTS document_embedding (
  id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  metadata JSONB,
  contents TEXT,
  embedding VECTOR(1536)
);

Step 3. Create a StreamingDiskANN index:

CREATE INDEX document_embedding_idx ON document_embedding
USING diskann (embedding);

Step 4. Perform a similarity search:

SELECT *
FROM document_embedding
ORDER BY embedding <=> '[1.0,2.0,3.0,...]'::vector
LIMIT 10;

Tuning Options

PgVectorScale offers various parameters to optimize performance for specific workloads:

Index Build-Time Parameters

CREATE INDEX document_embedding_idx ON document_embedding
USING diskann (embedding)
WITH (
  num_neighbors = 50,
  search_list_size = 100,
  max_alpha = 1.2,
  num_dimensions = 0,
  num_bits_per_dimension = 2
);

Query-Time Parameters

SET diskann.query_rescore = 400;
-- Or for a single transaction:
BEGIN;
SET LOCAL diskann.query_search_list_size = 10;
-- Your query here
COMMIT;

Why Not Use Pinecone or Other Specialized Vector Databases?

Cost: PgVector with PgVectorScale offers comparable or better performance at a fraction of the cost (75–79% cheaper than Pinecone in benchmarks).
Vendor Lock-in: Specialized solutions like Pinecone can lead to dependency on a single vendor. PostgreSQL offers more flexibility and portability.
Ecosystem Integration: For applications already using PostgreSQL, adopting PgVector is much simpler than integrating a separate vector database.
Operational Complexity: Managing a separate system for vector search adds operational overhead. PgVector allows consolidation of vector and relational data in one system.
Feature Set: While specialized databases might offer some unique features, PostgreSQL’s rich ecosystem often provides equivalent or superior functionality.
Maturity: PostgreSQL is a battle-tested database with decades of development. Specialized vector databases are relatively new and may lack some advanced features and optimizations.
Flexibility: PostgreSQL allows for complex queries that combine vector searches with traditional relational operations, something that’s often challenging with specialized vector databases.

Conclusion

The introduction of PgVectorScale marks a significant milestone in the evolution of vector databases. By leveraging the robust foundation of PostgreSQL and introducing clever optimizations for vector operations, it offers a compelling alternative to specialized solutions like Pinecone.

The combination of superior performance, dramatic cost savings, and the flexibility of a full-featured relational database makes PgVector with PgVectorScale an attractive option for a wide range of vector search applications. Whether you’re building a recommendation engine, a semantic search system, or tackling complex scientific data analysis, this PostgreSQL-based solution provides the tools to do it faster and more efficiently.

As with any technology choice, the decision to adopt PgVector and PgVectorScale should be based on a careful evaluation of your specific needs and constraints. However, for many organizations, the potential for 4x performance improvements and 75% cost savings will be too compelling to ignore.