This page covers configuration and management of View crawl plan objects.

Object Overview

Crawl plans provide a mapping of a data repository to a crawl schedule and a crawl filter, indicating to View the parameters by which a data repository should be crawled.

Endpoint, URL, and Supported Methods

Objects are managed via the crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawlplans

By default, the crawler server is accessible on port 8101.

Supported methods include: GET HEAD PUT DELETE

Structure

Objects have the following structure:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "Local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Properties:

  • GUID string globally unique identifier for the object
  • TenantGUID string globally unique identifier for the tenant
  • DataRepositoryGUID string globally unique identifier for the data repository
  • CrawlScheduleGUID string globally unique identifier for the crawl schedule
  • CrawlFilterGUID string globally unique identifier for the crawl filter
  • MetadataRuleGUID string globally unique identifier for the metadata rule
  • EmbeddingsRuleGUID string globally unique identifier for the embeddings rule
  • Name string the name of object
  • EnumerationDirectory string directory in which previous enumerations of the repository are stored
  • EnumerationsToRetain int the number of enumerations to retain
  • MaxDrainTasks int the maximum number of objects to emit in parallel
  • ProcessAdditions bool boolean indicating whether or not new files should be processed
  • ProcessDeletions bool boolean indicating whether or not deleted files should be processed
  • ProcessUpdates bool boolean indicating whether or not updated files should be processed
  • CreatedUtc datetime timestamp from creation, in UTC time

Create

To create, call PUT /v1.0/tenants/[tenant-guid]/crawlplans with the following properties using the configuration server: DataRepositoryGUID CrawlScheduleGUID CrawlFilterGUID MetadataRuleGUID EmbeddingsRuleGUID EnumerationDirectory EnumerationsToRetain MaxDrainTasks ProcessAdditions ProcessDeletions ProcessUpdates

curl -X PUT http://localhost:8601/v1.0/tenants/[tenant-guid]/crawlschedules \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer [accesskey]" \
     -d '
{
    "DataRepositoryGUID": "e9068089-4c90-4ef7-b4bb-bafccb771a9c",
    "CrawlScheduleGUID": "default",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "example-embeddings-rule",
    "Name": "My crawl plan",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 30,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true
}'

Enumerate

Refer to the Enumeration page in REST API for details about the use of enumeration APIs.

Enumerate objects by using GET /v2.0/tenants/[tenant-guid]/crawlschedules. The resultant object will appear as:

{
    "Success": true,
    "Timestamp": {
        "Start": "2024-10-21T02:36:37.677751Z",
        "TotalMs": 23.58,
        "Messages": {}
    },
    "MaxResults": 10,
    "IterationsRequired": 1,
    "EndOfResults": true,
    "RecordsRemaining": 16,
    "Objects": [
        {
            "GUID": "example-crawlplan",
            ... crawlplan details ...
        },
        { ... }
    ],
    "ContinuationToken": "[continuation-token]"
}

Read

To read an object by GUID, call GET /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "Local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

Update

To update an object by GUID, call PUT /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid] with a fully populated object in the request body. The updated object will be returned to you.

Note: certain fields cannot be modified and will be preserved across updates.

Request body:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "My updated local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Response body:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "My updated local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Delete

To delete an object by GUID, call DELETE /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid].