This page provides an overview of crawler-related APIs.
Ingestion refers to the process by which objects are consumed by View and subsequently processed and prepare for consumption by AI and other metadata, graph, or vector-driven applications.
View crawlers enable ingestion of data-at-rest as it resides in storage repositories throughout the enterprise, whether they are on-premises or in the cloud. View crawlers currently support crawling local filesystems, CIFS, NFS, S3 object storage, and Azure BLOB repositories.
The datarepository
object defines where data resides, and includes a number of properties related to the network endpoint (e.g. hostname
) and credentials used to access the data. A crawlfilter
operation specifies what data to crawl, and a crawlschedule
specifies when the repository should be crawled.
A crawlplan
ties together a data repository, crawl filter, and crawl schedule to specify what repository should be crawled, when it should be crawled, and which data assets should be retrieved from that repository. Further, the crawl plan specifies where resultant objects should be emitted for further processing.