Vector Engine

High-performance similarity search powered by FAISS

Overview

The Vector Engine in ShibuDB provides high-performance similarity search capabilities powered by FAISS (Facebook AI Similarity Search). It enables efficient storage and retrieval of high-dimensional vectors for applications like recommendation systems, image search, natural language processing, and machine learning.

Key Features

  • Multiple Index Types: Support for Flat, HNSW, IVF, and PQ indexes
  • Various Distance Metrics: L2, Inner Product, L1, and more
  • High Performance: Optimized for large-scale vector operations
  • Automatic Training: Index training happens automatically
  • Batch Operations: Efficient bulk vector insertion
  • Real-time Search: Fast similarity search with configurable parameters

Architecture

Vector Engine Architecture
┌─────────────────────────────────────┐
│           Vector Space              │
├─────────────────────────────────────┤
│  FAISS Index (similarity search)    │
├─────────────────────────────────────┤
│  In-Memory Buffer (batch ops)       │
├─────────────────────────────────────┤
│  Write-Ahead Log (durability)       │
├─────────────────────────────────────┤
│  Data Files (persistent storage)    │
└─────────────────────────────────────┘

Vector Space Management

Vector data is organized in spaces with specific index types and distance metrics.

Creating a Vector Space

Create Vector Space
# Create a basic vector space (128 dimensions, Flat index, L2 metric)
CREATE-SPACE embeddings --engine vector --dimension 128

# Create with specific index type
CREATE-SPACE image_vectors --engine vector --dimension 512 --index-type HNSW32 --metric L2

# Create with custom parameters
CREATE-SPACE text_embeddings --engine vector --dimension 768 --index-type IVF32 --metric InnerProduct

Parameters:

  • --engine vector: Specifies vector engine type
  • --dimension N: Vector dimension (required for vector spaces)
  • --index-type TYPE: FAISS index type (default: Flat)
  • --metric METRIC: Distance metric (default: L2)

Supported Index Types

Index Types Comparison
| Index Type | Description | Use Case | Memory | Speed |
|------------|-------------|----------|--------|-------|
| Flat       | Exact search | Small datasets, high accuracy | High | Slow |
| HNSW32     | Approximate search | Fast similarity search | Medium | Fast |
| IVF32      | Inverted file index | Large datasets | Low | Medium |
| PQ4        | Product quantization | Very large datasets | Very Low | Fast |

Using a Vector Space

Switch Space
# Switch to vector space
USE embeddings

# Verify current space (prompt will show current space)
[embeddings]>

FAISS Index Types

Different index types for various use cases and performance requirements.

1. Flat Index (Exact Search)

Best for: Small datasets (< 1M vectors), high accuracy requirements

Flat Index
# Create flat index
CREATE-SPACE exact_search --engine vector --dimension 128 --index-type Flat --metric L2
USE exact_search

# Insert vectors
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
INSERT-VECTOR 2 1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
INSERT-VECTOR 3 9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0

2. HNSW Index (Approximate Search)

Best for: Fast similarity search with good accuracy

HNSW Index
# Create HNSW index
CREATE-SPACE fast_search --engine vector --dimension 128 --index-type HNSW32 --metric L2
USE fast_search

# Insert vectors
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
INSERT-VECTOR 2 1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1

# Search with HNSW
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5

3. IVF Index (Inverted File)

Best for: Large datasets with balanced performance

IVF Index
# Create IVF index
CREATE-SPACE large_dataset --engine vector --dimension 128 --index-type IVF32 --metric L2
USE large_dataset

# Insert many vectors
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
INSERT-VECTOR 2 1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
# ... insert more vectors

# Search with IVF
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 10

Basic Vector Operations

Core operations for managing vector data.

INSERT-VECTOR - Add Vectors

Insert Vectors
# Insert single vector
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0

# Insert multiple vectors
INSERT-VECTOR 2 1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
INSERT-VECTOR 3 0.5,1.5,2.5,3.5,4.5,5.5,6.5,7.5

# Insert with different dimensions
INSERT-VECTOR 4 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0

GET-VECTOR - Retrieve Vectors

Get Vectors
# Get vector by ID
GET-VECTOR 1

# Get multiple vectors
GET-VECTOR 1 2 3

# Check if vector exists
EXISTS-VECTOR 1

DELETE-VECTOR - Remove Vectors

Delete Vectors
# Delete single vector
DELETE-VECTOR 1

# Delete multiple vectors
DELETE-VECTOR 2 3 4

Utility Operations

Utility Commands
# Count vectors in space
COUNT-VECTORS

# List all vector IDs
LIST-VECTORS

# Get space information
INFO-SPACE

Search Operations

Advanced search capabilities for finding similar vectors.

SEARCH-TOPK - Top-K Similarity Search

Top-K Search
# Search for top 5 similar vectors
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5

# Search with different query vector
SEARCH-TOPK 2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0 10

# Search with specific parameters
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5 --nprobe 10

RANGE-SEARCH - Range-Based Search

Range Search
# Find vectors within distance threshold
RANGE-SEARCH 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 0.5

# Range search with custom parameters
RANGE-SEARCH 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 1.0 --nprobe 20

Search Parameters

  • nprobe: Number of clusters to visit (for IVF indexes)
  • efSearch: Search depth (for HNSW indexes)
  • k: Number of results to return
  • threshold: Distance threshold for range search

Distance Metrics

Different distance metrics for measuring vector similarity.

Supported Metrics

  • L2 (Euclidean): Standard Euclidean distance
  • InnerProduct: Dot product (for normalized vectors)
  • L1 (Manhattan): Manhattan distance
  • Cosine: Cosine similarity

Choosing the Right Metric

Metric Selection
# For general similarity (L2)
CREATE-SPACE general --engine vector --dimension 128 --metric L2

# For normalized embeddings (InnerProduct)
CREATE-SPACE embeddings --engine vector --dimension 768 --metric InnerProduct

# For sparse vectors (L1)
CREATE-SPACE sparse --engine vector --dimension 256 --metric L1

Performance Optimization

Tips for optimizing vector search performance.

Performance Tips

  • Choose Right Index: Flat for small datasets, HNSW for speed, IVF for large datasets
  • Batch Operations: Use batch insert for multiple vectors
  • Optimize Dimensions: Reduce vector dimensions when possible
  • Tune Parameters: Adjust nprobe, efSearch based on your use case
  • Memory Management: Monitor memory usage for large indexes

Memory Usage

Different index types have different memory requirements:

  • Flat: High memory, exact search
  • HNSW: Medium memory, fast approximate search
  • IVF: Low memory, good for large datasets
  • PQ: Very low memory, compressed vectors

Best Practices

Recommended practices for using the vector engine effectively.

Index Selection

  • Use Flat for datasets < 1M vectors and high accuracy requirements
  • Use HNSW for fast similarity search with good accuracy
  • Use IVF for large datasets with balanced performance
  • Use PQ for very large datasets with memory constraints

Vector Preparation

  • Normalize vectors for consistent distance calculations
  • Use appropriate dimensions for your use case
  • Preprocess vectors to remove noise and outliers
  • Consider dimensionality reduction for high-dimensional vectors

Search Optimization

  • Start with default parameters and tune based on results
  • Use batch operations for multiple queries
  • Monitor search latency and adjust parameters accordingly
  • Consider using approximate search for real-time applications

Examples and Use Cases

Common use cases and practical examples.

Image Similarity Search

Image Search
# Create image vectors space
CREATE-SPACE image_search --engine vector --dimension 512 --index-type HNSW32 --metric L2
USE image_search

# Insert image embeddings
INSERT-VECTOR img_001 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0
INSERT-VECTOR img_002 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,0.1
INSERT-VECTOR img_003 0.9,0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1,0.0

# Search for similar images
SEARCH-TOPK 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0 5

Text Embedding Search

Text Search
# Create text embeddings space
CREATE-SPACE text_search --engine vector --dimension 768 --index-type IVF32 --metric InnerProduct
USE text_search

# Insert text embeddings
INSERT-VECTOR doc_001 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8
INSERT-VECTOR doc_002 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9
INSERT-VECTOR doc_003 0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1

# Search for similar documents
SEARCH-TOPK 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 10

Recommendation System

Recommendations
# Create user preferences space
CREATE-SPACE recommendations --engine vector --dimension 128 --index-type HNSW32 --metric L2
USE recommendations

# Insert user preference vectors
INSERT-VECTOR user_001 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8
INSERT-VECTOR user_002 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9
INSERT-VECTOR user_003 0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1

# Find similar users for recommendations
SEARCH-TOPK 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 5