High-performance similarity search powered by FAISS
The Vector Engine in ShibuDb provides high-performance similarity search capabilities powered by FAISS (Facebook AI Similarity Search). It enables efficient storage and retrieval of high-dimensional vectors for applications like recommendation systems, image search, natural language processing, and machine learning.
┌─────────────────────────────────────┐
│ Vector Space │
├─────────────────────────────────────┤
│ FAISS Index (similarity search) │
├─────────────────────────────────────┤
│ In-Memory Buffer (batch ops) │
├─────────────────────────────────────┤
│ Write-Ahead Log (durability) │
├─────────────────────────────────────┤
│ Data Files (persistent storage) │
└─────────────────────────────────────┘
Vector data is organized in spaces with specific index types and distance metrics.
# Create a basic vector space (128 dimensions, Flat index, L2 metric)
CREATE-SPACE embeddings --engine vector --dimension 128
# Create with specific index type
CREATE-SPACE image_vectors --engine vector --dimension 512 --index-type HNSW32 --metric L2
# Create with custom parameters
CREATE-SPACE text_embeddings --engine vector --dimension 768 --index-type IVF32 --metric InnerProduct
# Create with different HNSW configurations
CREATE-SPACE fast_search --engine vector --dimension 128 --index-type HNSW64 --metric L2
CREATE-SPACE ultra_fast --engine vector --dimension 128 --index-type HNSW256 --metric L2
# Create with different IVF configurations
CREATE-SPACE large_dataset --engine vector --dimension 128 --index-type IVF64 --metric L2
CREATE-SPACE huge_dataset --engine vector --dimension 128 --index-type IVF256 --metric L2
# Create with different PQ configurations
CREATE-SPACE memory_efficient --engine vector --dimension 128 --index-type PQ8 --metric L2
CREATE-SPACE ultra_efficient --engine vector --dimension 128 --index-type PQ32 --metric L2
# Create with composite indices
CREATE-SPACE accurate_large --engine vector --dimension 128 --index-type IVF32,Flat --metric L2
CREATE-SPACE fast_accurate --engine vector --dimension 128 --index-type HNSW64,Flat --metric L2
CREATE-SPACE efficient_accurate --engine vector --dimension 128 --index-type PQ8,Flat --metric L2
CREATE-SPACE balanced_large --engine vector --dimension 128 --index-type IVF64,PQ16 --metric L2
CREATE-SPACE fast_efficient --engine vector --dimension 128 --index-type HNSW128,PQ32 --metric L2
# Create with WAL enabled (for enhanced durability)
CREATE-SPACE durable_embeddings --engine vector --dimension 128 --enable-wal
# Create with WAL disabled (default, for maximum performance)
CREATE-SPACE fast_embeddings --engine vector --dimension 128 --disable-wal
Parameters:
--engine vector
: Specifies vector engine type--dimension N
: Vector dimension (required for vector spaces)--index-type TYPE
: FAISS index type (default: Flat)--metric METRIC
: Distance metric (default: L2)--enable-wal
: Enable Write-Ahead Logging for enhanced durability (default: disabled for vector spaces)--disable-wal
: Disable Write-Ahead Logging for maximum performance (default for vector spaces)Different index types have different minimum vector requirements before search operations become available:
# HNSW32 - search available immediately
CREATE-SPACE hnsw_space --engine vector --dimension 128 --index-type HNSW32
USE hnsw_space
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5 # Works immediately
# IVF32 - search available after 32 vectors
CREATE-SPACE ivf_space --engine vector --dimension 128 --index-type IVF32
USE ivf_space
# Need to insert at least 32 vectors before search works
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
# ... insert 31 more vectors ...
INSERT-VECTOR 32 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5 # Now works
# PQ8 - search available after 256 vectors
CREATE-SPACE pq_space --engine vector --dimension 128 --index-type PQ8
USE pq_space
# Need to insert at least 256 vectors before search works
# ... insert 256 vectors ...
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5 # Now works
# Switch to vector space
USE embeddings
# Verify current space (prompt will show current space)
[embeddings]>
Vector spaces support configurable Write-Ahead Logging (WAL) to balance performance and durability:
# Create vector space with WAL enabled (enhanced durability)
CREATE-SPACE production_vectors --engine vector --dimension 128 --index-type HNSW32 --enable-wal
# Create vector space with WAL disabled (maximum performance, default)
CREATE-SPACE fast_vectors --engine vector --dimension 128 --index-type HNSW32 --disable-wal
# Create vector space with WAL disabled (explicit, same as default)
CREATE-SPACE performance_vectors --engine vector --dimension 128 --index-type HNSW32
ShibuDb supports various FAISS index types with hardcoded configurations and composite indices for different use cases.
| Index Type | Description | Use Case | Memory | Speed | Min Vectors Required |
|------------|-------------|----------|--------|-------|---------------------|
| `Flat` | Exact search | Small datasets, high accuracy | High | Slow | 0 |
| `HNSW{n}` | Hierarchical Navigable Small World | Fast similarity search | Medium | Fast | 0 |
| `IVF{n}` | Inverted file index | Large datasets | Low | Medium | n |
| `PQ{n}` | Product quantization | Very large datasets | Very Low | Fast | 256 |
HNSW{n}
where n is a power of 2 from 2 to 256
HNSW2
, HNSW4
, HNSW8
, HNSW16
, HNSW32
, HNSW64
, HNSW128
, HNSW256
IVF{n}
where n is a power of 2 from 2 to 256
IVF2
, IVF4
, IVF8
, IVF16
, IVF32
, IVF64
, IVF128
, IVF256
PQ{n}
where n is a power of 2 from 2 to 256
PQ2
, PQ4
, PQ8
, PQ16
, PQ32
, PQ64
, PQ128
, PQ256
Composite indices combine multiple index types for enhanced performance and functionality:
| Composite Index | Description | Min Vectors Required | Use Case |
|-----------------|-------------|---------------------|----------|
| `IVF{n},Flat` | IVF clustering with exact search refinement | max(n, 1) | Large datasets with high accuracy |
| `HNSW{n},Flat` | HNSW search with exact search refinement | 0 | Fast search with high accuracy |
| `PQ{n},Flat` | PQ quantization with exact search refinement | 256 | Memory-efficient with high accuracy |
| `IVF{n},PQ{m}` | IVF clustering with PQ quantization | max(n, 256) | Very large datasets with balanced performance |
| `HNSW{n},PQ{m}` | HNSW search with PQ quantization | 256 | Fast search with memory efficiency |
Composite Index Examples:
IVF32,Flat
: 32 clusters with exact search refinement (min 32 vectors)HNSW64,Flat
: 64 neighbors with exact search refinement (min 0 vectors)PQ8,Flat
: 8-bit quantization with exact search refinement (min 256 vectors)IVF64,PQ16
: 64 clusters with 16-bit quantization (min 256 vectors)HNSW128,PQ32
: 128 neighbors with 32-bit quantization (min 256 vectors)Best for: Small datasets (< 1M vectors), high accuracy requirements
# Create flat index
CREATE-SPACE exact_search --engine vector --dimension 128 --index-type Flat --metric L2
USE exact_search
# Insert vectors
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
INSERT-VECTOR 2 1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
INSERT-VECTOR 3 9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0
# Search for similar vectors
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 3
Note: Only numerical IDs are supported for vector spaces.
Characteristics:
Best for: Fast similarity search, medium datasets
# Create HNSW index
CREATE-SPACE fast_search --engine vector --dimension 128 --index-type HNSW32 --metric L2
USE fast_search
# Insert vectors
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
INSERT-VECTOR 2 1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
# Search with HNSW
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5
Best for: Large datasets with balanced performance
# Create IVF index
CREATE-SPACE large_dataset --engine vector --dimension 128 --index-type IVF32 --metric L2
USE large_dataset
# Insert many vectors
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
INSERT-VECTOR 2 1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
# ... insert more vectors
# Search with IVF
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 10
Core operations for managing vector data.
# Insert a single vector
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
# Insert with numeric ID
INSERT-VECTOR 1001 1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5
Format: INSERT-VECTOR <id> <comma-separated-floats>
# Get vector by ID
GET-VECTOR 1
# Find top 5 most similar vectors
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5
# Find top 1 most similar vector
SEARCH-TOPK 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 1
Format: SEARCH-TOPK <query-vector> <k>
# Find all vectors within radius 0.5
RANGE-SEARCH 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 0.5
# Find all vectors within radius 1.0
RANGE-SEARCH 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 1.0
Format: RANGE-SEARCH <query-vector> <radius>
Advanced features for efficient data management and querying.
# Insert multiple vectors efficiently
INSERT-VECTOR 1 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0
INSERT-VECTOR 2 1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1
INSERT-VECTOR 3 1.2,2.2,3.2,4.2,5.2,6.2,7.2,8.2
INSERT-VECTOR 4 1.3,2.3,3.3,4.3,5.3,6.3,7.3,8.3
INSERT-VECTOR 5 1.4,2.4,3.4,4.4,5.4,6.4,7.4,8.4
# Search for similar vectors
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 10
# Search with different query vectors
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 5
SEARCH-TOPK 9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0 5
SEARCH-TOPK 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 5
# Use range search to find candidates
RANGE-SEARCH 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 1.0
# Then use top-k search for ranking
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 10
Different distance metrics for measuring vector similarity.
| Metric | Description | Use Case | Formula |
|--------|-------------|----------|---------|
| `L2` | Euclidean distance | General purpose | √(Σ(x₁-y₁)²) |
| `InnerProduct` | Inner product similarity | Cosine similarity (normalized vectors) | Σ(x₁y₁) |
| `L1` | Manhattan distance | Robust to outliers | Σ|x₁-y₁| |
| `Lp` | Lp norm distance | Configurable norm | (Σ|x₁-y₁|ᵖ)^(1/p) |
| `Canberra` | Canberra distance | Weighted differences | Σ|x₁-y₁|/(|x₁|+|y₁|) |
| `BrayCurtis` | Bray-Curtis distance | Ecological data | Σ|x₁-y₁|/Σ(x₁+y₁) |
| `JensenShannon` | Jensen-Shannon divergence | Probability distributions | JS(P||Q) |
| `Linf` | L-infinity distance | Maximum difference | max|x₁-y₁| |
# Good for general-purpose similarity
CREATE-SPACE general --engine vector --dimension 128 --metric L2
# Good for normalized vectors (embeddings)
CREATE-SPACE embeddings --engine vector --dimension 768 --metric InnerProduct
# Good for robust similarity (outlier-resistant)
CREATE-SPACE robust --engine vector --dimension 128 --metric L1
Tips for optimizing vector search performance.
# Use Flat index for exact search
CREATE-SPACE small_dataset --engine vector --dimension 128 --index-type Flat
# Or use HNSW for faster approximate search
CREATE-SPACE small_fast --engine vector --dimension 128 --index-type HNSW16
# Use HNSW for fast approximate search
CREATE-SPACE medium_dataset --engine vector --dimension 128 --index-type HNSW32
# For higher accuracy, use HNSW64 or HNSW128
CREATE-SPACE medium_accurate --engine vector --dimension 128 --index-type HNSW64
# For very high accuracy with exact refinement
CREATE-SPACE medium_exact --engine vector --dimension 128 --index-type HNSW32,Flat
# Use IVF for balanced performance
CREATE-SPACE large_dataset --engine vector --dimension 128 --index-type IVF32
# For larger datasets, use more clusters
CREATE-SPACE large_many_clusters --engine vector --dimension 128 --index-type IVF64
# For high accuracy with exact refinement
CREATE-SPACE large_accurate --engine vector --dimension 128 --index-type IVF32,Flat
# Use PQ for memory efficiency
CREATE-SPACE huge_dataset --engine vector --dimension 128 --index-type PQ8
# For better accuracy, use higher PQ bits
CREATE-SPACE huge_accurate --engine vector --dimension 128 --index-type PQ16
# For balanced performance with clustering
CREATE-SPACE huge_balanced --engine vector --dimension 128 --index-type IVF64,PQ16
# For fast search with memory efficiency
CREATE-SPACE huge_fast --engine vector --dimension 128 --index-type HNSW128,PQ32
HNSW2-HNSW16
: Very fast, lower accuracy, good for real-time applicationsHNSW32-HNSW64
: Balanced speed and accuracy, good for most applicationsHNSW128-HNSW256
: Higher accuracy, slower, good for precision-critical applicationsIVF2-IVF16
: Good for smaller large datasets (1M-5M vectors)IVF32-IVF64
: Good for medium large datasets (5M-20M vectors)IVF128-IVF256
: Good for very large datasets (20M+ vectors)PQ2-PQ8
: Very memory efficient, lower accuracyPQ16-PQ32
: Balanced memory and accuracyPQ64-PQ256
: Higher accuracy, more memory usage# Lower dimensions = less memory
CREATE-SPACE low_dim --engine vector --dimension 64 --index-type HNSW32
# Higher dimensions = more memory
CREATE-SPACE high_dim --engine vector --dimension 1024 --index-type HNSW32
# Flat: Highest memory usage
CREATE-SPACE flat_index --engine vector --dimension 128 --index-type Flat
# HNSW: Medium memory usage (varies by neighbor count)
CREATE-SPACE hnsw_small --engine vector --dimension 128 --index-type HNSW16
CREATE-SPACE hnsw_medium --engine vector --dimension 128 --index-type HNSW32
CREATE-SPACE hnsw_large --engine vector --dimension 128 --index-type HNSW64
# IVF: Lower memory usage (varies by cluster count)
CREATE-SPACE ivf_small --engine vector --dimension 128 --index-type IVF16
CREATE-SPACE ivf_medium --engine vector --dimension 128 --index-type IVF32
CREATE-SPACE ivf_large --engine vector --dimension 128 --index-type IVF64
# PQ: Lowest memory usage (varies by quantization bits)
CREATE-SPACE pq_small --engine vector --dimension 128 --index-type PQ4
CREATE-SPACE pq_medium --engine vector --dimension 128 --index-type PQ8
CREATE-SPACE pq_large --engine vector --dimension 128 --index-type PQ16
# Composite indices: Memory usage depends on components
CREATE-SPACE composite_accurate --engine vector --dimension 128 --index-type IVF32,Flat
CREATE-SPACE composite_efficient --engine vector --dimension 128 --index-type HNSW64,PQ16
# Smaller k = faster search
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 1
# Larger k = slower search
SEARCH-TOPK 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 100
# Smaller radius = fewer results, faster
RANGE-SEARCH 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 0.1
# Larger radius = more results, slower
RANGE-SEARCH 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0 10.0
Recommended practices for using the vector engine effectively.
Common use cases and practical examples.
# Create image vectors space
CREATE-SPACE image_search --engine vector --dimension 512 --index-type HNSW32 --metric L2
USE image_search
# Insert image embeddings
INSERT-VECTOR img_001 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0
INSERT-VECTOR img_002 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,0.1
INSERT-VECTOR img_003 0.9,0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1,0.0
# Search for similar images
SEARCH-TOPK 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0 5
# Create text embeddings space
CREATE-SPACE text_search --engine vector --dimension 768 --index-type IVF32 --metric InnerProduct
USE text_search
# Insert text embeddings
INSERT-VECTOR doc_001 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8
INSERT-VECTOR doc_002 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9
INSERT-VECTOR doc_003 0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1
# Search for similar documents
SEARCH-TOPK 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 10
# Create user preferences space
CREATE-SPACE recommendations --engine vector --dimension 128 --index-type HNSW32 --metric L2
USE recommendations
# Insert user preference vectors
INSERT-VECTOR user_001 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8
INSERT-VECTOR user_002 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9
INSERT-VECTOR user_003 0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1
# Find similar users for recommendations
SEARCH-TOPK 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 5