Comprehensive guide to managing embeddings documents in the View Vector Database platform for vector storage and retrieval.
Overview
Embeddings documents contain vector embeddings and relevant metadata about the source document from which the embeddings were generated. They serve as the foundation for AI-powered search, semantic analysis, and vector-based document retrieval within the View Vector Database platform.
Embeddings documents are managed via the View Vector API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/documents
and support comprehensive operations including document creation, search, and deletion with full semantic processing capabilities.
Embeddings Document Object Structure
Embeddings documents contain comprehensive vector embeddings and metadata about processed documents:
{
"GUID": "ac16a21a-88a4-4083-8fcd-75f49bd02384",
"TenantGUID": "default",
"CollectionGUID": "default",
"SourceDocumentGUID": "94dcfa65-adec-4ddf-bea7-2e4f290ace24",
"BucketGUID": "example-data-bucket",
"VectorRepositoryGUID": "example-vector-repository",
"GraphRepositoryGUID": "ac16a21a-88a4-4083-8fcd-75f49bd02384",
"GraphNodeIdentifier": "94dcfa65-adec-4ddf-bea7-2e4f290ace24",
"ObjectGUID": "2462d84b-dba8-4deb-ba3b-f1dee2106376",
"ObjectKey": "1.pdf",
"ObjectVersion": "1",
"Model": "all-MiniLM-L6-v2",
"Score": 1.3349401950836182,
"SemanticCells": [
{
"GUID": "0e20c037-7eeb-407d-b2fc-b4b3db422e22",
"CellType": "Text",
"MD5Hash": "A382429550056CCFA2BFFC602EC86605",
"SHA1Hash": "1BFE6FBDFC07B9217E3DC03D10D89551EBD3BAFE",
"SHA256Hash": "539821158F217EB540C1DC83E9FDD7DC48BAE9874BD3A7807613C6F8FED0F064",
"Position": 0,
"Length": 0,
"Chunks": [
{
"GUID": "e45dc62b-e662-47b7-a1b9-d98f112813f0",
"MD5Hash": "A382429550056CCFA2BFFC602EC86605",
"SHA1Hash": "1BFE6FBDFC07B9217E3DC03D10D89551EBD3BAFE",
"SHA256Hash": "539821158F217EB540C1DC83E9FDD7DC48BAE9874BD3A7807613C6F8FED0F064",
"Position": 0,
"Start": 0,
"End": 0,
"Length": 0,
"Content": "The quick brown fox jumped over the lazy dog",
"Embeddings": [
-0.013084393,
...
]
},
...
],
"Children": []
},
...
],
"CreatedUtc": "2024-10-25T02:14:08.000000Z"
}
Field Descriptions
- GUID (GUID): Globally unique identifier for the embeddings document object
- TenantGUID (GUID): Globally unique identifier for the tenant that owns this document
- CollectionGUID (GUID): Globally unique identifier for the collection this document belongs to
- SourceDocumentGUID (GUID): Globally unique identifier for the original source document
- BucketGUID (GUID): Globally unique identifier for the bucket where the object is stored
- DataRepositoryGUID (GUID): Globally unique identifier for the data repository where the object is stored
- VectorRepositoryGUID (GUID): Globally unique identifier for the vector repository where the embeddings are stored
- GraphRepositoryGUID (GUID): Globally unique identifier for the graph repository where relationship metadata is stored
- GraphNodeIdentifier (GUID): Globally unique identifier for the graph node where relationship metadata is stored
- ObjectGUID (GUID): Globally unique identifier for the object
- ObjectKey (string): Key/filename of the object
- ObjectVersion (string): Version of the object
- Model (string): Machine learning model from which embeddings were generated
- Score (float): Similarity score for the document
- SemanticCells (array): Array of semantic cells within the document containing chunks and embeddings
- CreatedUtc (datetime): Timestamp indicating when the embeddings document was created, in UTC time
Create Embeddings Document
Creates a new embeddings document with comprehensive vector embeddings and semantic processing data using POST /v1.0/tenants/[tenant-guid]/documents
. Stores document metadata along with vector embeddings and semantic analysis information for AI-powered search and retrieval.
Request Parameters
Required Parameters
- TenantGUID (string, Body, Required): GUID of the tenant that owns this document
- BucketGUID (string, Body, Required): GUID of the storage bucket containing the source object
- CollectionGUID (string, Body, Required): GUID of the collection this document belongs to
- SourceDocumentGUID (string, Body, Required): GUID of the original source document
- ObjectGUID (string, Body, Required): GUID of the object stored in the bucket
- VectorRepositoryGUID (string, Body, Required): GUID of the vector repository for embeddings storage
- ObjectKey (string, Body, Required): Key/filename of the object
- ObjectVersion (string, Body, Required): Version of the stored object
- CreatedUtc (datetime, Body, Required): Timestamp when the document was created
Optional Parameters
- GraphRepositoryGUID (string, Body, Optional): GUID of the graph repository for relationship metadata
- GraphNodeIdentifier (string, Body, Optional): Node identifier within a semantic or knowledge graph
- Model (string, Body, Optional): Machine learning model used for semantic processing
- Score (float, Body, Optional): Similarity score for the document
- SemanticCells (array, Body, Optional): Array of semantic cell representations with embeddings
{
"TenantGUID": "default",
"BucketGUID": "data",
"CollectionGUID": "default",
"SourceDocumentGUID": "default",
"ObjectGUID": "default",
"VectorRepositoryGUID": "example-vector-repository",
"ObjectKey": "hello.json",
"ObjectVersion": "1",
"CreatedUtc": "2024-06-01",
"SemanticCells": [
{
"GUID": "example-semantic-cell-1",
"CellType": "Text",
"MD5Hash": "000",
"SHA1Hash": "111",
"SHA256Hash": "222",
"Position": 0,
"Chunks": [
{
"GUID": "example-semantic-chunk-1",
"MD5Hash": "000",
"SHA1Hash": "111",
"SHA256Hash": "222",
"Position": 0,
"Content": "This is a sample chunk",
"Embeddings": [0.16624743426880373,...]
},
{
"GUID": "example-semantic-chunk-2",
"MD5Hash": "000",
"SHA1Hash": "111",
"SHA256Hash": "222",
"Position": 1,
"Content": "This is a sample chunk",
"Embeddings": [0.16624743426880373,..]
}
],
"Children": [ ]
},
{
"GUID": "example-semantic-cell-2",
"CellType": "Text",
"MD5Hash": "000",
"SHA1Hash": "111",
"SHA256Hash": "222",
"Position": 1,
"Chunks": [
{
"GUID": "example-semantic-chunk-3",
"MD5Hash": "000",
"SHA1Hash": "111",
"SHA256Hash": "222",
"Position": 0,
"Content": "This is a sample chunk",
"Embeddings": [0.16624743426880373,...]
},
{
"GUID": "example-semantic-chunk-4",
"MD5Hash": "000",
"SHA1Hash": "111",
"SHA256Hash": "222",
"Position": 1,
"Content": "This is a sample chunk",
"Embeddings": [0.16624743426880373,...]
}
],
"Children": [ ]
}
]
}
When writing embeddings, each semantic chunk is split into a separate database row, with the encapsulating semantic cell and document metadata attached. The result will be an array.
Response
Returns the created embeddings document object with all metadata and processing information.
Search Embeddings
Performs vector similarity search using PUT /v1.0/tenants/[tenant-guid]/search
with comprehensive search capabilities. View Vector currently supports inner product search InnerProduct
, cosine distance search CosineDistance
, and L2 distance search L2Distance
for finding similar documents based on vector embeddings.
Request Parameters
Required Parameters
- SearchType (string, Body, Required): Type of search to perform (InnerProduct, CosineDistance, or L2Distance)
- VectorRepositoryGUID (string, Body, Required): GUID of the vector repository to search within
- MaxResults (integer, Body, Required): Maximum number of search results to return
- Embeddings (array, Body, Required): Array of floating-point numbers representing the query vector
{
"SearchType": "CosineDistance",
"VectorRepositoryGUID": "example-vector-repository",
"MaxResults": 5,
"Embeddings": [0.16624743426880373,...]
}
The result will be an array of distinct embeddings documents including document metadata, semantic cells, semantic chunks, and the embeddings for each.
Response
Returns an array of distinct embeddings documents including document metadata, semantic cells, semantic chunks, and the embeddings for each.
Delete Embeddings Document
Deletes embeddings document metadata from the vector repository using DELETE /v1.0/tenants/[tenant-guid]/documents
with comprehensive filtering capabilities. Any populated parameters will be ANDed together for precise deletion targeting. This operation is destructive and cannot be undone.
Request Parameters
Optional Parameters (Any combination can be used for filtering)
- VectorRepositoryGUID (string, Body, Optional): GUID of the vector repository to delete from
- TenantGUID (string, Body, Optional): GUID of the tenant to filter by
- CollectionGUID (string, Body, Optional): GUID of the collection to filter by
- DataRepositoryGUID (string, Body, Optional): GUID of the data repository to filter by
- BucketGUID (string, Body, Optional): GUID of the bucket to filter by
- ObjectGUID (string, Body, Optional): GUID of the object to filter by
- Key (string, Body, Optional): Object key to filter by
- Version (string, Body, Optional): Object version to filter by
{
"VectorRepositoryGUID": "example-vector-repository",
"TenantGUID": "default",
"CollectionGUID": null,
"DataRepositoryGUID": null,
"BucketGUID": "data",
"ObjectGUID": null,
"Key": "2.txt",
"Version": "1"
}
Response
Returns 200 OK on successful deletion. No response body is returned.
Truncate Vector Repository
Truncates an entire vector table from the vector repository using DELETE /v1.0/tenants/[tenant-guid]/documents?truncate
. This operation removes all embeddings documents and associated data from the specified vector repository. This operation is destructive and cannot be undone.
Request Parameters
Required Parameters
- VectorRepositoryGUID (string, Body, Required): GUID of the vector repository to truncate
{
"VectorRepositoryGUID": "example-vector-repository"
}
Response
Returns 200 OK on successful truncation. No response body is returned.
Best Practices
When managing embeddings documents in the View Vector Database, consider the following recommendations for optimal vector storage, search performance, and embeddings management:
- Document Organization: Organize embeddings documents within logical collections and repositories based on content type, domain, or search requirements
- Embedding Quality: Ensure high-quality vector embeddings are generated using appropriate models for your specific use case and content types
- Semantic Processing: Use comprehensive semantic cell and chunk processing to maximize search accuracy and content understanding
- Search Optimization: Choose appropriate distance metrics (inner product, cosine distance, L2 distance) based on your specific search requirements
- Performance Monitoring: Monitor embeddings document processing and search performance to optimize your AI-powered applications
Next Steps
After successfully managing embeddings documents, you can:
- Vector Search: Implement advanced vector search operations using the comprehensive search capabilities
- Semantic Analysis: Analyze and manage semantic cells and chunks for detailed content understanding and processing
- Vector Document Management: Work with vector documents for comprehensive document metadata and processing
- Search Integration: Integrate embeddings document management with search functionality for AI-powered document discovery
- Performance Optimization: Monitor and optimize embeddings document processing and search performance for your applications