This page provides an overview of APIs related to embeddings persistence, management, and deletion.

Object Overview

Embeddings documents contain vector embeddings and relevant metadata about the source document from which the embeddings were generated.

Endpoint, URL, and Supported Methods

Embeddings documents are managed via the View Vector API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/documents

By default, View Vector is accessible on port 8311.

Supported methods include: PUT POST DELETE

Structure

Objects have the following structure:

{
    "GUID": "ac16a21a-88a4-4083-8fcd-75f49bd02384",
    "TenantGUID": "default",
    "CollectionGUID": "default",
    "SourceDocumentGUID": "94dcfa65-adec-4ddf-bea7-2e4f290ace24",
    "BucketGUID": "example-data-bucket",
    "VectorRepositoryGUID": "example-vector-repository",
    "GraphRepositoryGUID": "ac16a21a-88a4-4083-8fcd-75f49bd02384",
    "GraphNodeIdentifier": "94dcfa65-adec-4ddf-bea7-2e4f290ace24",
    "ObjectGUID": "2462d84b-dba8-4deb-ba3b-f1dee2106376",
    "ObjectKey": "1.pdf",
    "ObjectVersion": "1",
    "Model": "all-MiniLM-L6-v2",
    "Score": 1.3349401950836182,
    "SemanticCells": [
        {
            "GUID": "0e20c037-7eeb-407d-b2fc-b4b3db422e22",
            "CellType": "Text",
            "MD5Hash": "A382429550056CCFA2BFFC602EC86605",
            "SHA1Hash": "1BFE6FBDFC07B9217E3DC03D10D89551EBD3BAFE",
            "SHA256Hash": "539821158F217EB540C1DC83E9FDD7DC48BAE9874BD3A7807613C6F8FED0F064",
            "Position": 0,
            "Length": 0,
            "Chunks": [
                {
                    "GUID": "e45dc62b-e662-47b7-a1b9-d98f112813f0",
                    "MD5Hash": "A382429550056CCFA2BFFC602EC86605",
                    "SHA1Hash": "1BFE6FBDFC07B9217E3DC03D10D89551EBD3BAFE",
                    "SHA256Hash": "539821158F217EB540C1DC83E9FDD7DC48BAE9874BD3A7807613C6F8FED0F064",
                    "Position": 0,
                    "Start": 0,
                    "End": 0,
                    "Length": 0,
                    "Content": "The quick brown fox jumped over the lazy dog",
                    "Embeddings": [
                        -0.013084393,
                        ...
                    ]
                },
                ...
            ],
            "Children": []
        },
        ...
    ],
    "CreatedUtc": "2024-10-25T02:14:08.000000Z"
}

Properties:

  • GUID string globally unique identifier for the object
  • TenantGUID string globally unique identifier for the tenant
  • CollectionGUID string globally unique identifier for the collection
  • SourceDocumentGUID string globally unique identifier for the source document
  • BucketGUID string globally unique identifier for the bucket where the object is stored
  • DataRepositoryGUID string globally unique identifier for the data repository where the object is stored
  • VectorRepositoryGUID string globally unique identifier for the vector repository where the embeddings are stored
  • GraphRepositoryGUID string globally unique identifier for the graph repository where relationship metadata is stored
  • GraphNodeIdentifier string globally unique identifier for the graph node where relationship metadata is stored
  • ObjectGUID string globally unique identifier for the object
  • ObjectKey string key for the object
  • ObjectVersion string version of the object
  • Model string model from which embeddings were generated
  • Score float score for the document
  • SemanticCells array an array of semantic cells within the document
  • CreatedUtc datetime timestamp from creation, in UTC time

Create

To write an embeddings document, call POST /v1.0/tenants/[tenant-guid]/documents with a fully-populated embeddings document.

An example request appears as follows:

{
    "TenantGUID": "default",
    "BucketGUID": "data",
    "CollectionGUID": "default",
    "SourceDocumentGUID": "default",
    "ObjectGUID": "default",
    "VectorRepositoryGUID": "example-vector-repository",
    "ObjectKey": "hello.json",
    "ObjectVersion": "1",
    "CreatedUtc": "2024-06-01",
    "SemanticCells": [
        {
            "GUID": "example-semantic-cell-1",
            "CellType": "Text",
            "MD5Hash": "000",
            "SHA1Hash": "111",
            "SHA256Hash": "222",
            "Position": 0,
            "Chunks": [
                {
                    "GUID": "example-semantic-chunk-1",
                    "MD5Hash": "000",
                    "SHA1Hash": "111",
                    "SHA256Hash": "222",
                    "Position": 0,
                    "Content": "This is a sample chunk",
                    "Embeddings": [0.16624743426880373,...]
                },
                {
                    "GUID": "example-semantic-chunk-2",
                    "MD5Hash": "000",
                    "SHA1Hash": "111",
                    "SHA256Hash": "222",
                    "Position": 1,
                    "Content": "This is a sample chunk",
                    "Embeddings": [0.16624743426880373,..]
                }
            ],
            "Children": [ ]
        },
        {
            "GUID": "example-semantic-cell-2",
            "CellType": "Text",
            "MD5Hash": "000",
            "SHA1Hash": "111",
            "SHA256Hash": "222",
            "Position": 1,
            "Chunks": [
                {
                    "GUID": "example-semantic-chunk-3",
                    "MD5Hash": "000",
                    "SHA1Hash": "111",
                    "SHA256Hash": "222",
                    "Position": 0,
                    "Content": "This is a sample chunk",
                    "Embeddings": [0.16624743426880373,...]
                },
                {
                    "GUID": "example-semantic-chunk-4",
                    "MD5Hash": "000",
                    "SHA1Hash": "111",
                    "SHA256Hash": "222",
                    "Position": 1,
                    "Content": "This is a sample chunk",
                    "Embeddings": [0.16624743426880373,...]
                }
            ],
            "Children": [ ]
        }
    ]
}

When writing embeddings, each semantic chunk is split into a separate database row, with the encapsulating semantic cell and document metadata attached. The result will be an array.

Search

To search embeddings, call PUT/v1.0/tenants/[tenant-guid]/search with a vector search request body as follows. View Vector currently supports inner product search InnerProduct, cosine distance search CosineDistance, and L2 distance search L2Distance.

{
    "SearchType": "CosineDistance",
    "VectorRepositoryGUID": "example-vector-repository",
    "MaxResults": 5,
    "Embeddings": [0.16624743426880373,...]
}

The result will be an array of distinct embeddings documents including document metadata, semantic cells, semantic chunks, and the embeddings for each.

Delete

To delete document metadata from Vector, call DELETE /v1.0/tenants/[tenant-guid]/documents with a populated delete request body as follows. Any populated parameters will be ANDed together. This operation is destructive and cannot be undone.

{
    "VectorRepositoryGUID": "example-vector-repository",
    "TenantGUID": "default",
    "CollectionGUID": null,
    "DataRepositoryGUID": null,
    "BucketGUID": "data",
    "ObjectGUID": null,
    "Key": "2.txt",
    "Version": "1"
}

To truncate a vector table entirely from Vector, call DELETE /v1.0/tenants/[tenant-guid]/documents?truncate with a JSON object containing the globally unique identifier of the vector repository object. This operation is destructive and cannot be undone.

{
    "VectorRepositoryGUID": "example-vector-repository"
}