# Vector Store

Category: AI

Runtime: 3.4

Available: Free

# Overview

This Kumologica node client provides stateful interaction with a memory vector store, retaining data throughout the container lifecycle.

Text Chunking: Supports various text splitters to chunk input documents efficiently.
Embedding Support: Compatible with OpenAI, VoyageAI, and Fake Embeddings for vector generation.
Similarity Search: Enables similarity search with scoring and Maximal Marginal Relevance (MMR).
Get/Set Operations: Includes operations to persist and load vectors from your chosen storage.

This node is ideal for applications requiring vector storage, similarity searches, and integration with popular embedding providers.

# Setup

# Prerequisites

Node JS installed.
Kumologica SDK Installed.
OpenAI or VoyageAI API KEY with enough tokens available.

# Installation

Go to your project workspace where you can see package.json file.
Run the below npm command:

   npm i @kumologica/kumologica-contrib-vectorstore

# Technical Details

# Properties

# Operation: Add Document

Adds document with metadata into vector store.

# Splitter

SplitterType - The type of the text splitter to use. Required. The following splitters are supported:
- Character Text Splitter - breaks text into chunks based purely on character count, without considering natural language boundaries. It's useful for tasks requiring strict token or size limits, like embeddings or processing large text data.
- Sentence Text Splitter - divides text into individual sentences, ensuring each chunk is contextually complete. This is ideal for tasks like summarization, embeddings, and analysis where sentence-level granularity is important.
- Recursive Character Text Splitter - splits text by recursively breaking it down at natural points like sentences or paragraphs, ensuring each chunk stays within a character limit. This helps maintain context while fitting size constraints for tasks like embeddings or tokenization.
- Markdown Text Splitter - splits markdown files into smaller chunks, preserving the structure of headings, lists, and other markdown elements. This ensures organized processing for tasks like embeddings and semantic search.
- Document Text Splitter - breaks down large documents into smaller chunks based on structure, such as sections or paragraphs. This ensures better handling of long texts for tasks like embeddings and search without losing context.
- Html Text Splitter - extracts and splits text from HTML content while ignoring tags and metadata. It ensures clean, structured text chunks, ideal for processing tasks like semantic search and embedding generation.
- Language Aware Text Splitter - intelligently divides text into chunks at natural language boundaries, preserving coherence. This enhances tasks like summarization, semantic search, and embedding generation by maintaining context.
Chunk Size - The maximum size of the chunk. Optional.
Chunk Overlap - Target overlap between chunks. Overlapping chunks helps to mitigate loss of information when context is divided between chunks. Optional.
Separator - The separator if the chunks. Optional.
Separators - The list of separators of the chunk. Optional.

# Embeddings

Embedding - The OpenAi, Voyage or Fake Embeddings. Required. The following embeddings are supported:
- OpenAI see https://platform.openai.com/docs/guides/embeddings (opens new window)
  - text-embedding-3-small - generates compact, high-quality text embeddings that capture semantic meaning. It's optimized for efficiency, making it ideal for tasks like similarity search, classification, and clustering with minimal resource usage.
  - text-embedding-3-large - produces rich, high-dimensional text embeddings that capture nuanced semantic relationships. It is designed for tasks requiring deep contextual understanding, such as complex similarity searches and advanced natural language processing applications.
  - text-embedding-ada-002 - generates versatile, high-quality text embeddings that excel in capturing semantic meaning across diverse contexts. It's optimized for various natural language processing tasks, including semantic search, classification, and recommendation systems, while maintaining efficiency and speed.
- VoyageAI see https://docs.voyageai.com/docs/embeddings (opens new window)
  - voyage-3 - Optimized for general-purpose and multilingual retrieval quality.
  - voyage-3-lite - Optimized for latency and cost
  - voyage-finance-2 - Optimized for finance retrieval and RAG
  - voyage-multilingual-2 - Optimized for multilingual retrieval and RAG.
  - voyage-law-2 - Optimized for legal and long-context retrieval and RAG. Also improved performance across all domains. -voyage-code-2 - Optimized for code retrieval (17% better than alternatives).
- Fake Embeddings - are placeholder vectors used for testing or development purposes in embedding-based systems. They mimic the structure of real embeddings without requiring actual embedding generation models, allowing for quick prototyping and performance testing.
Encoding Format - The encoding of embeddings. Optional.
API Key - The open ai or voyager api key. Optional.
Dimensions - The number of dimensions the resulting output embeddings should have. Only supported in OpenAI text-embedding-3 and later models. Default 3072. Optional.
Document - The text document to add to the vector store. Required.
Metadata - The metadata for the document. Optional.

# Operation: Query

Allows querying of vector store. Operation returns documents, metadata and optionally (for Similarity Search with Score) Score between 0 and 1.

Search Type - Either Similarity Search with Score or Maximal Marginal Relevance. Required.
Query - The text of the user query to run. Required.
Fetch K - For MMR only, Number of documents to fetch before passing to the MMR algorithm. 5 if not provided. Optional.
Lambda - For MMR only, Number between 0 and 1 that determines the degree of diversity among the results, 0: maximum diversity, 1: minimum diversity. Optional.
#Documents - Number of documents to return. 2 if not provided. Optional

# Operation: Get Vectors

Returns content of vector store: documents, metadata and embeddings.

# Operation: Set Vectors

Sets content of vector store: documents, metadata and embeddings. The structure must be the same as returned by Get Vectors otherwise the vector store will be corrupted.

# Throws

VectorStoreError - when any of the operations fail. possible causes:
- Invalid API Key
- Query on empty vector store

← Vault Xero →