Embeddings Documents

Object Overview

Embeddings documents contain vector embeddings and relevant metadata about the source document from which the embeddings were generated.

Endpoint, URL, and Supported Methods

Embeddings documents are managed via the View Vector API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/documents

Supported methods include: PUT POST DELETE

Structure

Objects have the following structure:

{
    "GUID": "ac16a21a-88a4-4083-8fcd-75f49bd02384",
    "TenantGUID": "default",
    "CollectionGUID": "default",
    "SourceDocumentGUID": "94dcfa65-adec-4ddf-bea7-2e4f290ace24",
    "BucketGUID": "example-data-bucket",
    "VectorRepositoryGUID": "example-vector-repository",
    "GraphRepositoryGUID": "ac16a21a-88a4-4083-8fcd-75f49bd02384",
    "GraphNodeIdentifier": "94dcfa65-adec-4ddf-bea7-2e4f290ace24",
    "ObjectGUID": "2462d84b-dba8-4deb-ba3b-f1dee2106376",
    "ObjectKey": "1.pdf",
    "ObjectVersion": "1",
    "Model": "all-MiniLM-L6-v2",
    "Score": 1.3349401950836182,
    "SemanticCells": [
        {
            "GUID": "0e20c037-7eeb-407d-b2fc-b4b3db422e22",
            "CellType": "Text",
            "MD5Hash": "A382429550056CCFA2BFFC602EC86605",
            "SHA1Hash": "1BFE6FBDFC07B9217E3DC03D10D89551EBD3BAFE",
            "SHA256Hash": "539821158F217EB540C1DC83E9FDD7DC48BAE9874BD3A7807613C6F8FED0F064",
            "Position": 0,
            "Length": 0,
            "Chunks": [
                {
                    "GUID": "e45dc62b-e662-47b7-a1b9-d98f112813f0",
                    "MD5Hash": "A382429550056CCFA2BFFC602EC86605",
                    "SHA1Hash": "1BFE6FBDFC07B9217E3DC03D10D89551EBD3BAFE",
                    "SHA256Hash": "539821158F217EB540C1DC83E9FDD7DC48BAE9874BD3A7807613C6F8FED0F064",
                    "Position": 0,
                    "Start": 0,
                    "End": 0,
                    "Length": 0,
                    "Content": "The quick brown fox jumped over the lazy dog",
                    "Embeddings": [
                        -0.013084393,
                        ...
                    ]
                },
                ...
            ],
            "Children": []
        },
        ...
    ],
    "CreatedUtc": "2024-10-25T02:14:08.000000Z"
}

Properties:

GUID GUID globally unique identifier for the object
TenantGUID GUID globally unique identifier for the tenant
CollectionGUID GUID globally unique identifier for the collection
SourceDocumentGUID GUID globally unique identifier for the source document
BucketGUID GUID globally unique identifier for the bucket where the object is stored
DataRepositoryGUID GUID globally unique identifier for the data repository where the object is stored
VectorRepositoryGUID GUID globally unique identifier for the vector repository where the embeddings are stored
GraphRepositoryGUID GUID globally unique identifier for the graph repository where relationship metadata is stored
GraphNodeIdentifier GUID globally unique identifier for the graph node where relationship metadata is stored
ObjectGUID GUID globally unique identifier for the object
ObjectKey string key for the object
ObjectVersion string version of the object
Model string model from which embeddings were generated
Score float score for the document
SemanticCells array an array of semantic cells within the document
CreatedUtc datetime timestamp from creation, in UTC time

Create

To write an embeddings document, call POST /v1.0/tenants/[tenant-guid]/documents with a fully-populated embeddings document.

An example request appears as follows:

{
    "TenantGUID": "default",
    "BucketGUID": "data",
    "CollectionGUID": "default",
    "SourceDocumentGUID": "default",
    "ObjectGUID": "default",
    "VectorRepositoryGUID": "example-vector-repository",
    "ObjectKey": "hello.json",
    "ObjectVersion": "1",
    "CreatedUtc": "2024-06-01",
    "SemanticCells": [
        {
            "GUID": "example-semantic-cell-1",
            "CellType": "Text",
            "MD5Hash": "000",
            "SHA1Hash": "111",
            "SHA256Hash": "222",
            "Position": 0,
            "Chunks": [
                {
                    "GUID": "example-semantic-chunk-1",
                    "MD5Hash": "000",
                    "SHA1Hash": "111",
                    "SHA256Hash": "222",
                    "Position": 0,
                    "Content": "This is a sample chunk",
                    "Embeddings": [0.16624743426880373,...]
                },
                {
                    "GUID": "example-semantic-chunk-2",
                    "MD5Hash": "000",
                    "SHA1Hash": "111",
                    "SHA256Hash": "222",
                    "Position": 1,
                    "Content": "This is a sample chunk",
                    "Embeddings": [0.16624743426880373,..]
                }
            ],
            "Children": [ ]
        },
        {
            "GUID": "example-semantic-cell-2",
            "CellType": "Text",
            "MD5Hash": "000",
            "SHA1Hash": "111",
            "SHA256Hash": "222",
            "Position": 1,
            "Chunks": [
                {
                    "GUID": "example-semantic-chunk-3",
                    "MD5Hash": "000",
                    "SHA1Hash": "111",
                    "SHA256Hash": "222",
                    "Position": 0,
                    "Content": "This is a sample chunk",
                    "Embeddings": [0.16624743426880373,...]
                },
                {
                    "GUID": "example-semantic-chunk-4",
                    "MD5Hash": "000",
                    "SHA1Hash": "111",
                    "SHA256Hash": "222",
                    "Position": 1,
                    "Content": "This is a sample chunk",
                    "Embeddings": [0.16624743426880373,...]
                }
            ],
            "Children": [ ]
        }
    ]
}

When writing embeddings, each semantic chunk is split into a separate database row, with the encapsulating semantic cell and document metadata attached. The result will be an array.

Search

To search embeddings, call PUT/v1.0/tenants/[tenant-guid]/search with a vector search request body as follows. View Vector currently supports inner product search InnerProduct, cosine distance search CosineDistance, and L2 distance search L2Distance.

{
    "SearchType": "CosineDistance",
    "VectorRepositoryGUID": "example-vector-repository",
    "MaxResults": 5,
    "Embeddings": [0.16624743426880373,...]
}

The result will be an array of distinct embeddings documents including document metadata, semantic cells, semantic chunks, and the embeddings for each.

Delete

To delete document metadata from Vector, call DELETE /v1.0/tenants/[tenant-guid]/documents with a populated delete request body as follows. Any populated parameters will be ANDed together. This operation is destructive and cannot be undone.

{
    "VectorRepositoryGUID": "example-vector-repository",
    "TenantGUID": "default",
    "CollectionGUID": null,
    "DataRepositoryGUID": null,
    "BucketGUID": "data",
    "ObjectGUID": null,
    "Key": "2.txt",
    "Version": "1"
}

To truncate a vector table entirely from Vector, call DELETE /v1.0/tenants/[tenant-guid]/documents?truncate with a JSON object containing the globally unique identifier of the vector repository object. This operation is destructive and cannot be undone.

{
    "VectorRepositoryGUID": "example-vector-repository"
}