Overview

Ingestion refers to the process by which objects are consumed by View and subsequently processed and prepare for consumption by AI and other metadata, graph, or vector-driven applications.

View crawlers enable ingestion of data-at-rest as it resides in storage repositories throughout the enterprise, whether they are on-premises or in the cloud. View crawlers currently support crawling local filesystems, CIFS, NFS, S3 object storage, Azure BLOB repositories, and web content (accessible via HTTP or HTTPS).

The datarepository object defines where data resides, and includes a number of properties related to the network endpoint (e.g. hostname) and credentials used to access the data. A crawlfilter operation specifies what data to crawl, and a crawlschedule specifies when the repository should be crawled.

A crawlplan ties together a data repository, crawl filter, and crawl schedule to specify what repository should be crawled, when it should be crawled, and which data assets should be retrieved from that repository. Further, the crawl plan specifies where resultant objects should be emitted for further processing.

JavaScript SDK Setup

Install SDK from npm

npm install view-sdk

Initialize Crawler SDK

import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "00000000-0000-0000-0000-000000000000", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

Python setup

Install SDK from pip

pip install view-sdk

Initialize Configuration Sdk

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "00000000-0000-0000-0000-000000000000")