Overview of View Objects

Tenants and Nodes

View is a natively multi-tenant system with two top-level objects: tenant and node. A tenant represents an operational, security, and data boundary within a deployment, meaning a single deployment could have separate virtual deployments within. A node represents an individual backend microservice, such as storage, lexi, crawler, or others. Nodes perform specific functions within the deployment and are not explicitly tied to a given tenant.

Users and Credentials

Configuring View and ingesting data requires that a user authenticate with a credential. The user is associated with a tenant, and a credential is associated with a user. The credential object contains both an accesskey and a secretkey.

Processing Rules

Ingested data is processed according to two primary processing rules: a metadatarule object which specifies how metadata is generated and where it is stored, and an embeddings rule which specifies how embeddings are generated and where they are stored. Like other objects, these are tenant-specific, and alongside configuration properties, reference other data repositories by GUID and certain services by URL.

Processing Repositories

View uses multiple repositories to store processed and prepared data in the form of metadata, graph representations, and embeddings. Lexi is a data catalog and search platform that stores sourcedocument objects inside of collections. Source documents are built using Universal Data Representation (UDR) which is a key step in the processing pipeline. A graphrepository object contains metadata about graphs where metadata and relationships about source data and metadata are stored. Finally, and vectorrepository contains configuration-related information about vector repositories.

Ingestion (S3, REST)

Ingestion refers to the process of data being made available for processing by View. When ingesting through S3 or REST, a storagepool must be created, defining where data is physically stored. A bucket is then created, mapping to a storage pool, and once created, objects can then be uploaded into the bucket.

View provides a complete native object storage API and a separate interface for using the S3 API.

Ingestion (Crawler)

Objects can also be made available for processing by using View crawlers. A crawler is informed of a datarepository (e.g. local file system, CIFS, NFS, S3, or Azure BLOB). A crawlschedule defines how frequently a given operation should be run, and is a reusable object that can be applied across multiple jobs. A crawlfilter defines the filters for which objects should be processed, and like crawl schedules, is reusable across jobs.

A crawlplan defines which data repository is crawled, on what schedule, and using what filter. Once created, jobs will run according to the defined schedule. The result of the invocation of a crawl plan is a crawloperation object which gives specifics and statistics about the state of the operation.

Metadata Search

Metadata searches are performed against sourcedocument objects (containing Universal Data Representation, or UDR, metadata) stored inside of collections inside of Lexi. These source documents are generated during data processing and persisted inside of collections according to the supplied metadata rule.

Vector Search

Vector search is performed against View vector which by default uses pgvector for persistence. Vectors are automatically generated during data processing and persisted as embeddingsdocument objects within View Vector, alongside metadata that indicates the source of the data, its relative position, hash information, and the original chunk data from which the vectors were created.

Data Processing

Data processing involves a series of steps including type detection, semantic cell extraction, generation of embeddings, persistence within the data catalog (Lexi), persistence within graph (LiteGraph), and persistence within vector storage (pgvector). The entirety of the processing pipeline runs as a dataflow within Orchestrator, or alternatively, you are able to build your own processing pipelines running on Orchestrator or outside of Orchestrator.

Orchestration

View Orchestrator is a function-as-a-service (FaaS) platform that runs code as independent units based on the invocation of a trigger. Orchestrator currently supports functions written in C# (net8.0) and Python (3.9+).

Within Orchestrator, a trigger is defined as a means by which an operation is invoked. Currently, HTTP triggers are supported, allowing you to specify the HTTP method and URL that must match. A dataflow is defined, which is a decision tree of independent steps, where each step is an independent unit of code. The dataflow object has within it a starting step and a dataflowmap, which provides a decision tree on which step to invoke next based on success, failure, or exception of the preceding step.

Assistant

View Assistant is both a built-in conversational AI experience with industry-leading retrieval augmented generation (RAG) capabilities and also an API platform for building rich conversational experiences using data processed and prepared by View.