This page covers configuration and management of View crawl plan objects.
Object Overview
Crawl plans provide a mapping of a data repository to a crawl schedule and a crawl filter, indicating to View the parameters by which a data repository should be crawled.
Endpoint, URL, and Supported Methods
Objects are managed via the crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawlplans
By default, the crawler server is accessible on port 8101
.
Supported methods include: GET
HEAD
PUT
DELETE
Structure
Objects have the following structure:
{
"GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
"TenantGUID": "default",
"DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"Name": "Local files",
"EnumerationDirectory": "./enumerations/",
"EnumerationsToRetain": 16,
"MaxDrainTasks": 4,
"ProcessAdditions": true,
"ProcessDeletions": true,
"ProcessUpdates": true,
"CreatedUtc": "2024-10-23T15:14:26.000000Z"
}
Properties:
GUID
string
globally unique identifier for the objectTenantGUID
string
globally unique identifier for the tenantDataRepositoryGUID
string
globally unique identifier for the data repositoryCrawlScheduleGUID
string
globally unique identifier for the crawl scheduleCrawlFilterGUID
string
globally unique identifier for the crawl filterMetadataRuleGUID
string
globally unique identifier for the metadata ruleEmbeddingsRuleGUID
string
globally unique identifier for the embeddings ruleName
string
the name of objectEnumerationDirectory
string
directory in which previous enumerations of the repository are storedEnumerationsToRetain
int
the number of enumerations to retainMaxDrainTasks
int
the maximum number of objects to emit in parallelProcessAdditions
bool
boolean indicating whether or not new files should be processedProcessDeletions
bool
boolean indicating whether or not deleted files should be processedProcessUpdates
bool
boolean indicating whether or not updated files should be processedCreatedUtc
datetime
timestamp from creation, in UTC time
Create
To create, call PUT /v1.0/tenants/[tenant-guid]/crawlplans
with the following properties using the configuration server: DataRepositoryGUID
CrawlScheduleGUID
CrawlFilterGUID
MetadataRuleGUID
EmbeddingsRuleGUID
EnumerationDirectory
EnumerationsToRetain
MaxDrainTasks
ProcessAdditions
ProcessDeletions
ProcessUpdates
curl -X PUT http://localhost:8601/v1.0/tenants/[tenant-guid]/crawlschedules \
-H "Content-Type: application/json" \
-H "Authorization: Bearer [accesskey]" \
-d '
{
"DataRepositoryGUID": "e9068089-4c90-4ef7-b4bb-bafccb771a9c",
"CrawlScheduleGUID": "default",
"CrawlFilterGUID": "default",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "example-embeddings-rule",
"Name": "My crawl plan",
"EnumerationDirectory": "./enumerations/",
"EnumerationsToRetain": 30,
"MaxDrainTasks": 4,
"ProcessAdditions": true,
"ProcessDeletions": true,
"ProcessUpdates": true
}'
Enumerate
Refer to the Enumeration page in REST API for details about the use of enumeration APIs.
Enumerate objects by using GET /v2.0/tenants/[tenant-guid]/crawlschedules
. The resultant object will appear as:
{
"Success": true,
"Timestamp": {
"Start": "2024-10-21T02:36:37.677751Z",
"TotalMs": 23.58,
"Messages": {}
},
"MaxResults": 10,
"IterationsRequired": 1,
"EndOfResults": true,
"RecordsRemaining": 16,
"Objects": [
{
"GUID": "example-crawlplan",
... crawlplan details ...
},
{ ... }
],
"ContinuationToken": "[continuation-token]"
}
Read
To read an object by GUID, call GET /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]
. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound
error response.
{
"GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
"TenantGUID": "default",
"DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"Name": "Local files",
"EnumerationDirectory": "./enumerations/",
"EnumerationsToRetain": 16,
"MaxDrainTasks": 4,
"ProcessAdditions": true,
"ProcessDeletions": true,
"ProcessUpdates": true,
"CreatedUtc": "2024-10-23T15:14:26.000000Z"
}
Note: the HEAD
method can be used as an alternative to get to simply check the existence of the object. HEAD
requests return either a 200/OK
in the event the object exists, or a 404/Not Found
if not. No response body is returned with a HEAD
request.
Update
To update an object by GUID, call PUT /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]
with a fully populated object in the request body. The updated object will be returned to you.
Note: certain fields cannot be modified and will be preserved across updates.
Request body:
{
"GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
"TenantGUID": "default",
"DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"Name": "My updated local files",
"EnumerationDirectory": "./enumerations/",
"EnumerationsToRetain": 16,
"MaxDrainTasks": 4,
"ProcessAdditions": true,
"ProcessDeletions": true,
"ProcessUpdates": true,
"CreatedUtc": "2024-10-23T15:14:26.000000Z"
}
Response body:
{
"GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
"TenantGUID": "default",
"DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"Name": "My updated local files",
"EnumerationDirectory": "./enumerations/",
"EnumerationsToRetain": 16,
"MaxDrainTasks": 4,
"ProcessAdditions": true,
"ProcessDeletions": true,
"ProcessUpdates": true,
"CreatedUtc": "2024-10-23T15:14:26.000000Z"
}
Delete
To delete an object by GUID, call DELETE /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]
.