Source Documents

This page provides an overview of APIs related to source documents.

Object Overview

Source documents contain metadata about processed objects and are stored inside of collections within Lexi.

Endpoint, URL, and Supported Methods

Source documents are managed via the Lexi server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/collections/[collection-guid]/documents

Supported methods include: GET HEAD PUT DELETE

Structure

Objects have the following structure:

{
    "GUID": "1fdbe0c8-8b85-4b0e-ac42-dd4757684a9f",
    "TenantGUID": "default",
    "BucketGUID": "example-data-bucket",
    "CollectionGUID": "default",
    "ObjectGUID": "f615ac92-d1d1-4b46-8cc5-acf721131067",
    "ObjectKey": "5.pdf",
    "ObjectVersion": "1",
    "ContentType": "application/pdf",
    "DocumentType": "Pdf",
    "SourceUrl": "http://dcc249eaaf06:8001/v1.0/tenants/default/buckets/example-data-bucket/objects/5.pdf",
    "ContentLength": 31811,
    "MD5Hash": "DC477A85FF3882BBFDEB03D7B79ECC9E",
    "SHA1Hash": "CC5D85073F193A578F97D46B8A6E4CE946270B5F",
    "SHA256Hash": "E5285C6023A46E4E8917C67CCB56B91FED2E578A7AA3129680012C029868B321",
    "CreatedUtc": "2024-10-25T14:14:22.000000Z"
}

Properties:

  • GUID GUID globally unique identifier for the object
  • TenantGUID GUID globally unique identifier for the tenant
  • BucketGUID GUID globally unique identifier for the bucket where the object is stored
  • DataRepositoryGUID GUID globally unique identifier for the data repository where the object is stored
  • CollectionGUID GUID globally unique identifier for the collection
  • ObjectGUID GUID globally unique identifier for the object
  • ObjectKey string key for the object
  • ObjectVersion string version of the object
  • ContentType string content-type of the object
  • DocumentType enum document type of the object
  • SourceUrl string source URL from which the source object can be retrieved
  • ContentLength long content length of the source document
  • MD5Hash string the MD5 hash, as a hexadecimal string
  • SHA1Hash string the SHA1 hash, as a hexadecimal string
  • SHA256Hash string the SHA256 hash, as a hexadecimal string
  • CreatedUtc datetime timestamp from creation, in UTC time

Create

To upload a source document, call PUT /v1.0/tenants/[tenant-guid]/collections/[collection-guid]/documents with a fully-populated source document. Attach the UDR document as a JSON object within the UdrDocument parameter at the top-level of the object.

Read

To read a source document by GUID, call GET /v1.0/tenants/[tenant-guid]/collections/[collection-guid]/documents/[document-guid]. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/collections/0000000000000/documents/fd937de1-480a-4db8-9025-c7ac0bd8d66c' \
--header 'Authorization: ••••••'
import { ViewLexiSdk } from "view-sdk";

const api = new ViewLexiSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const retrieveSourceDocument = async () => {
  try {
    const response = await api.sourceDocumentSdk.read(
      "<collection-guid>",
      "<sourcedocument-guid>"
    );
    console.log(response, "SourceDocument fetched successfully");
  } catch (err) {
    console.log("Error fetching SourceDocument:", err);
  }
};

retrieveSourceDocument();
import view_sdk
from view_sdk import lexi

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readSourceDocument():
    document = lexi.SourceDocument.retrieve("<collection-guid>", "<sourcedocument-guid>")
    print(document)

readSourceDocument()
using View.Sdk;
using View.Sdk.Lexi;

ViewLexiSdk sdk = new ViewLexiSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
            
SourceDocument sourceDocument = await sdk.RetrieveDocument(Guid.Parse("<collection-guid>"),Guid.Parse("<sourcedocument-guid>"));

Read with data

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/collections/00000000-0000-0000-0000-000000000000/documents/fd937de1-480a-4db8-9025-c7ac0bd8d66c?incldata=null' \
--header 'Authorization: ••••••'
import { ViewLexiSdk } from "view-sdk";

const api = new ViewLexiSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const retrieveSourceDocumentWithData = async () => {
  try {
    const response = await api.sourceDocumentSdk.read(
      "<collection-guid>",
      "<sourcedocument-guid>",
      true
    );
    console.log(response, "SourceDocument fetched successfully");
  } catch (err) {
    console.log("Error fetching SourceDocument:", err);
  }
};

retrieveSourceDocumentWithData();
import view_sdk
from view_sdk import lexi

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readSourceDocument():
    document = lexi.SourceDocument.retrieve("<collection-guid>", "<sourcedocument-guid>",True)
    print(document)

readSourceDocument()

Read top terms

To read a collection by GUID, call GET /v1.0/tenants/[tenant-guid]/collections/[collection-guid]/documents/[document-guid]/topterms?max-keys=10. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/collections/00000000-0000-0000-0000-000000000000/documents/fd937de1-480a-4db8-9025-c7ac0bd8d66c/topterms?max-keys=10' \
--header 'Authorization: ••••••'
import { ViewLexiSdk } from "view-sdk";

const api = new ViewLexiSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const retrieveSourceDocumentTopTerms = async () => {
  try {
    const response = await api.sourceDocumentSdk.readTopTerms(
      "<collection-guid>",
      "<sourcedocument-guid>"
    );
    console.log(response, "SourceDocument top terms fetched successfully");
  } catch (err) {
    console.log("Error fethcing SourceDocument top terms:", err);
  }
};

retrieveSourceDocumentTopTerms();
import view_sdk
from view_sdk import lexi

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readTopTerms():
    terms = lexi.SourceDocument.retrieve_top_terms("<collection-guid>", "<sourcedocument-guid>")
    print(terms)

readTopTerms()

Read statistics

To read a collection by GUID, call GET /v1.0/tenants/[tenant-guid]/collections/[collection-guid]/documents/[document-guid]/topterms?max-keys=10. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/collections/00000000-0000-0000-0000-000000000000/documents/fd937de1-480a-4db8-9025-c7ac0bd8d66c?stats=null' \
--header 'Authorization: ••••••'
import { ViewLexiSdk } from "view-sdk";

const api = new ViewLexiSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const retrieveSourceDocumentStatistics = async () => {
  try {
    const response = await api.sourceDocumentSdk.readStatistics(
      "<collection-guid>",
      "<sourcedocument-guid>"
    );
    console.log(response, "SourceDocument stats fetched successfully");
  } catch (err) {
    console.log("Error fethcing SourceDocument stats:", err);
  }
};

retrieveSourceDocumentStatistics();
import view_sdk
from view_sdk import lexi

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readSourceDocumentStatistics():
    statistics = lexi.SourceDocument.retrieve_statistics("<collection-guid>", "<sourcedocument-guid>")
    print(statistics)

readSourceDocumentStatistics()
using View.Sdk;
using View.Sdk.Lexi;

ViewLexiSdk sdk = new ViewLexiSdk(Guid.Parse("<tenant-guid>"),"default", "http:/localhost:8000/");
            
SourceDocumentStatistics sourceDocumentStatistics = await sdk.RetrieveDocumentStatistics(Guid.Parse("<collection-guid>"),Guid.Parse("<sourcedocument-guid>"));

Read All

To read a source document, call GET /v1.0/tenants/[tenant-guid]/collections/documents. This API will return a JSON array. If it does not exist, a 404 will be returned with a NotFound error response.

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/collections/00000000-0000-0000-0000-000000000000/documents' \
--header 'Authorization: ••••••'
import { ViewLexiSdk } from "view-sdk";

const api = new ViewLexiSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const retrieveSourceDocuments = async () => {
  try {
    const response = await api.sourceDocumentSdk.readAll(
      "<collection-guid>"
    );
    console.log(response, "SourceDocuments fetched successfully");
  } catch (err) {
    console.log("Error fetching SourceDocuments:", err);
  }
};

retrieveSourceDocuments();
import view_sdk
from view_sdk import lexi

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readAllSourceDocuments():
    documents = lexi.SourceDocument.retrieve_all("<collection-guid>")
    print(documents)

readAllSourceDocuments()
using View.Sdk;
using View.Sdk.Lexi;

ViewLexiSdk sdk = new ViewLexiSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
            
List<SourceDocument> sourceDocuments = await sdk.RetrieveDocuments(Guid.Parse("<collection-guid>"));

Upload

To uplaad a document, call PUT/v1.0/tenants/[tenant-guid]/collections/[collection-guid]/documents. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/collections/00000000-0000-0000-0000-000000000000/documents' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{
  "TenantGUID": "00000000-0000-0000-0000-000000000000",
  "CollectionGUID": "00000000-0000-0000-0000-000000000000",
  "ObjectKey": "blake.json",
  "ObjectVersion": "1",
  "ObjectGUID": "00000000-0000-0000-0000-000000000000",
  "ContentType": "application/json",
  "DocumentType": "JSON",
  "SourceUrl": "http://localhost:9000/tenants/default/buckets/data/objects/sample.json",
  "UdrDocument": {
    "Success": true,
    "AdditionalData": "My additional data",
    "Metadata": {
      "Foo": "Bar"
    },
    "Key": "sample.json",
    "TypeResult": {
        "MimeType": "application/json",
        "Extension": "json",
        "Type": "Json"
    },
    "Terms": [
        "foo",
        "bar",
        "baz"
    ],
    "TopTerms": {
      "foo": 1,
      "bar": 1,
      "baz": 1
    },
    "Postings": [
      {
        "Term": "baz",
        "Count": 2,
        "AbsolutePositions": [
          0
        ],
        "RelativePositions": [
          0
        ]
      },
      {
        "Term": "foo",
        "Count": 2,
        "AbsolutePositions": [
          1
        ],
        "RelativePositions": [
          1
        ]
      },
      {
        "Term": "bar",
        "Count": 2,
        "AbsolutePositions": [
          2
        ],
        "RelativePositions": [
          2
        ]
      }
    ],
    "Schema": {
      "Type": "Json",
      "MaxDepth": 1,
      "NumObjects": 1,
      "NumArrays": 0,
      "NumKeyValues": 1,
      "Schema": {
        "root": "Object",
        "root.Message": "String"
      },
      "Metadata": {
        "Foo": "Bar"
      },
      "Flattened": [
        {
          "Key": "root",
          "Type": "Object"
        },
        {
          "Key": "root.Message",
          "Type": "String",
          "Data": "Your foo is bar baz!"
        }
      ]
    }
  }
}
'
import { ViewLexiSdk } from "view-sdk";

const api = new ViewLexiSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const uploadSourceDocument = async () => {
  try {
    const response = await api.sourceDocumentSdk.upload({
      TenantGUID: "<tenant-guid>",
      CollectionGUID: "<collection-guid>",
      ObjectKey: "blake.json",
      ObjectVersion: "1",
      ObjectGUID: "<object-guid>",
      ContentType: "application/json",
      DocumentType: "JSON",
      SourceUrl:
        "http://localhost:9000/tenants/default/buckets/data/objects/sample.json",
      UdrDocument: {
        Success: true,
        AdditionalData: "My additional data",
        Metadata: {
          Foo: "Bar",
        },
        Key: "sample.json",
        TypeResult: {
          MimeType: "application/json",
          Extension: "json",
          Type: "Json",
        },
        Terms: ["foo", "bar", "baz"],
        TopTerms: {
          foo: 1,
          bar: 1,
          baz: 1,
        },
        Postings: [
          {
            Term: "baz",
            Count: 2,
            AbsolutePositions: [0],
            RelativePositions: [0],
          },
          {
            Term: "foo",
            Count: 2,
            AbsolutePositions: [1],
            RelativePositions: [1],
          },
          {
            Term: "bar",
            Count: 2,
            AbsolutePositions: [2],
            RelativePositions: [2],
          },
        ],
        Schema: {
          Type: "Json",
          MaxDepth: 1,
          NumObjects: 1,
          NumArrays: 0,
          NumKeyValues: 1,
          Schema: {
            root: "Object",
            "root.Message": "String",
          },
          Metadata: {
            Foo: "Bar",
          },
          Flattened: [
            {
              Key: "root",
              Type: "Object",
            },
            {
              Key: "root.Message",
              Type: "String",
              Data: "Your foo is bar baz!",
            },
          ],
        },
      },
    });
    console.log(response, "SourceDocument uploaded successfully");
  } catch (err) {
    console.log("Error uploading SourceDocument:", err);
  }
};

uploadSourceDocument();
import view_sdk
from view_sdk import lexi

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def uploadSourceDocument():
    document = lexi.SourceDocument.upload("<collection-guid>", "<object-guid>", "foo.txt")
    print(document)

uploadSourceDocument()
using View.Sdk;
using View.Sdk.Lexi;
using View.Sdk.Semantic;
using System.Collections.Generic;

ViewLexiSdk sdk = new ViewLexiSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
            
        SourceDocument sourceDocument = new SourceDocument
        {
            GUID = Guid.Parse("<sourcedocument-guid>"),
            TenantGUID = Guid.Parse("<tenant-guid>"),
            BucketGUID = Guid.Parse("<bucket-guid>"),
            CollectionGUID = Guid.Parse("<collection-guid>"),
            ObjectGUID = Guid.Parse("<object-guid>"),
            GraphRepositoryGUID = null,
            GraphNodeIdentifier = null,
            DataRepositoryGUID = null,
            DataFlowRequestGUID = null,
            DataFlowSuccess = null,
            ObjectKey = "10.pdf",
            ObjectVersion = "1",
            ContentType = "application/pdf",
            DocumentType = DocumentTypeEnum.Unknown,
            SourceUrl = "http://localhost:9000/tenants/default/buckets/sample/objects/10.pdf",
            ContentLength = 29096038,
            MD5Hash = "******************************0FCA",
            SHA1Hash = "*******************************************50F",
            SHA256Hash = "*******************************************************D3D",
            CreatedUtc = DateTime.UtcNow,
            ExpirationUtc = null,
            Score = new DocumentScore
            {
                Score = 0.85m,
                TermsScore = 0.9m,
                FiltersScore = 0.75m
            },
            UdrDocument = new UdrDocument
             {
                GUID = Guid.NewGuid(),
                Success = true,
                AdditionalData = "Parsed successfully.",
                Key = "sample.pdf:1",
                Type = DocumentTypeEnum.Pdf,
                Metadata = new Dictionary<string, object>
                {
                    { "Author", "John Doe" },
                    { "Category", "Medical" }
                },
                Terms = new List<string>
                {
                    "botox", "treatment", "dose", "patient",
                    "botox", "injection", "dose", "patient",
                    "botox"
                },
                Postings = new List<Posting>
                {
                    new Posting
                    {
                        Term = "botox",
                        AbsolutePositions = new List<long> { 5, 15, 25 },
                        RelativePositions = new List<long> { 0, 1, 2 }
                    },
                    new Posting
                    {
                        Term = "dose",
                        AbsolutePositions = new List<long> { 35, 55 },
                        RelativePositions = new List<long> { 3, 4 }
                    }
                },
                Schema = new SchemaResult
                {
                    Type = DocumentTypeEnum.Pdf,
                    Schema = new Dictionary<string, DataTypeEnum>
                    {
                        { "root", DataTypeEnum.Object },
                        { "root.Message", DataTypeEnum.String }
                    },

                    Metadata = new Dictionary<string, object>
                    {
                        { "KeyCount", 2 }
                    },
                },
                SemanticCells = new List<SemanticCell>
                {
                    new SemanticCell
                    {
                        GUID = Guid.NewGuid(),
                        CellType = SemanticCellTypeEnum.Text,
                        Position = 0,
                        Chunks = new List<SemanticChunk>
                        {
                            new SemanticChunk
                            {
                                GUID = Guid.NewGuid(),
                                Position = 0,
                                Start = 0,
                                End = 127,
                                Length = 128,
                                Content = "Botox treatment is effective." 
                            }
                        }
                    }
                }
            }
        };

SourceDocument result = await sdk.UploadDocument(sourceDocument);

Delete

To delete a source document by GUID, call DELETE /v1.0/tenants/[tenant-guid]/collections/[collection-guid]/documents/[document-guid].

curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/collections/00000000-0000-0000-0000-000000000000/documents/00000000-0000-0000-0000-000000000000' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••'
import { ViewLexiSdk } from "view-sdk";

const api = new ViewLexiSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const deleteSourceDocument = async () => {
  try {
    const response = await api.sourceDocumentSdk.delete(
      "<collection-guid>",
      "<sourcedocument-guid>"
    );
    console.log(response, "SourceDocument deleted successfully");
  } catch (err) {
    console.log("Error deleting SourceDocument:", err);
  }
};

deleteSourceDocument();
import view_sdk
from view_sdk import lexi

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")
using View.Sdk;
using View.Sdk.Lexi;

ViewLexiSdk sdk = new ViewLexiSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
        
Guid CollectionGuid = Guid.Parse("<collection-guid>");
        
Guid SourceDocumentGuid = Guid.Parse("<sourcedocument-guid>");
        
bool deleted = await sdk.DeleteDocument(CollectionGuid, SourceDocumentGuid);

Delete by key and version

To delete a source document by GUID, call DELETE /v1.0/tenants/[tenant-guid]/collections/[collection-guid]/documents/?key=blake.json&versionId=1

curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/collections/00000000-0000-0000-0000-000000000000/documents?key=blake.json&versionId=1' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' 
import { ViewLexiSdk } from "view-sdk";

const api = new ViewLexiSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const deleteSourceDocumentFromKey = async () => {
  try {
    const response = await api.sourceDocumentSdk.deleteFromKey(
      "<collection-guid>",
      "https://www.traegerforum.com/forums/traeger-recipes.27/", //key
      "1" //version
    );
    console.log(response, "SourceDocument deleted successfully");
  } catch (err) {
    console.log("Error deleting SourceDocument:", err);
  }
};

deleteSourceDocumentFromKey();
import view_sdk
from view_sdk import lexi

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def uploadSourceDocument():
    document = lexi.SourceDocument.create("00000000-0000-0000-0000-000000000000",
        TenantGUID="00000000-0000-0000-0000-000000000000",
        CollectionGUID="00000000-0000-0000-0000-000000000000",
        ObjectKey="blake.json",
        ObjectVersion="1",
        ObjectGUID="00000000-0000-0000-0000-000000000000",
        ContentType="application/json",
        DocumentType="JSON",
        SourceUrl="http://localhost:9000/tenants/default/buckets/data/objects/sample.json",
        UdrDocument={}
    )
    print(document)

uploadSourceDocument()
using View.Sdk;
using View.Sdk.Lexi;

ViewLexiSdk sdk = new ViewLexiSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
        
bool deleted = await sdk.DeleteDocument(Guid.Parse("<collection-guid>"), "10.pdf", "1");

.