This page covers configuration and management of View crawl plan objects.

Object Overview

Crawl plans provide a mapping of a data repository to a crawl schedule and a crawl filter, indicating to View the parameters by which a data repository should be crawled.

Endpoint, URL, and Supported Methods

Objects are managed via the crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawlplans

Supported methods include: GET HEAD PUT DELETE

Structure

Objects have the following structure:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "Local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Properties:

  • GUID GUID globally unique identifier for the object
  • TenantGUID GUID globally unique identifier for the tenant
  • DataRepositoryGUID GUID globally unique identifier for the data repository
  • CrawlScheduleGUID GUID globally unique identifier for the crawl schedule
  • CrawlFilterGUID GUID globally unique identifier for the crawl filter
  • MetadataRuleGUID GUID globally unique identifier for the metadata rule
  • EmbeddingsRuleGUID GUID globally unique identifier for the embeddings rule
  • Name string the name of object
  • EnumerationDirectory string directory in which previous enumerations of the repository are stored
  • EnumerationsToRetain int the number of enumerations to retain
  • MaxDrainTasks int the maximum number of objects to emit in parallel
  • ProcessAdditions bool boolean indicating whether or not new files should be processed
  • ProcessDeletions bool boolean indicating whether or not deleted files should be processed
  • ProcessUpdates bool boolean indicating whether or not updated files should be processed
  • CreatedUtc datetime timestamp from creation, in UTC time

Create

To create, call PUT /v1.0/tenants/[tenant-guid]/crawlplans with the following properties using the configuration server: DataRepositoryGUID CrawlScheduleGUID CrawlFilterGUID MetadataRuleGUID EmbeddingsRuleGUID EnumerationDirectory EnumerationsToRetain MaxDrainTasks ProcessAdditions ProcessDeletions ProcessUpdates

curl -X PUT http://localhost:8601/v1.0/tenants/[tenant-guid]/crawlschedules \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer [accesskey]" \
     -d '
{
    "DataRepositoryGUID": "e9068089-4c90-4ef7-b4bb-bafccb771a9c",
    "CrawlScheduleGUID": "default",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "example-embeddings-rule",
    "Name": "My crawl plan",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 30,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true
}'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "00000000-0000-0000-0000-000000000000", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const createCrawlPlan = async () => {
  try {
    const response = await crawler.createCrawlPlan({
      DataRepositoryGUID: "00000000-0000-0000-0000-000000000000",
      CrawlScheduleGUID: "00000000-0000-0000-0000-000000000000",
      CrawlFilterGUID: "00000000-0000-0000-0000-000000000000",
      Name: "My crawl plan [ASH]",
      EnumerationDirectory: "./enumerations/",
      EnumerationsToRetain: 30,
      MetadataRuleGUID: "00000000-0000-0000-0000-000000000000",
      ProcessingEndpoint:
        "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
      ProcessingAccessKey: "default",
      CleanupEndpoint:
        "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup",
      CleanupAccessKey: "default",
    });
    console.log(response, "Crawl plan created successfully");
  } catch (err) {
    console.log("Error creating Crawl plan:", err);
  }
};

createCrawlPlan();

Enumerate

Refer to the Enumeration page in REST API for details about the use of enumeration APIs.

Enumerate objects by using GET /v2.0/tenants/[tenant-guid]/crawlschedules. The resultant object will appear as:

{
    "Success": true,
    "Timestamp": {
        "Start": "2024-10-21T02:36:37.677751Z",
        "TotalMs": 23.58,
        "Messages": {}
    },
    "MaxResults": 10,
    "IterationsRequired": 1,
    "EndOfResults": true,
    "RecordsRemaining": 16,
    "Objects": [
        {
            "GUID": "example-crawlplan",
            ... crawlplan details ...
        },
        { ... }
    ],
    "ContinuationToken": "[continuation-token]"
}
curl --location 'http://view.homedns.org:8000/v2.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "00000000-0000-0000-0000-000000000000", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const enumerateCrawlPlans = async () => {
  try {
    const response = await crawler.enumerateCrawlPlans();
    console.log(response, "Crawl plans fetched successfully");
  } catch (err) {
    console.log("Error fetching Crawl plans:", err);
  }
};

enumerateCrawlPlans();

Read

To read an object by GUID, call GET /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "Local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "00000000-0000-0000-0000-000000000000", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const readCrawlPlan = async () => {
  try {
    const response = await crawler.retrieveCrawlPlan(
      "418cd284-4a30-4a9b-9e2a-b36645cbc6d7"
    );
    console.log(response, "Crawl plan fetched successfully");
  } catch (err) {
    console.log("Error fetching Crawl plan:", err);
  }
};

readCrawlPlan();

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

Read all

o read all objects, call GET /v1.0/tenants/[tenant-guid]/crawlplans/. If the object exists, it will be returned as an array of JSON object in the response body

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "default", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const readAllCrawlPlans = async () => {
  try {
    const response = await crawler.retrieveCrawlPlans();
    console.log(response, "All crawl plans fetched successfully");
  } catch (err) {
    console.log("Error fetching All crawl plans:", err);
  }
};

readAllCrawlPlans();


Update

To update an object by GUID, call PUT /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid] with a fully populated object in the request body. The updated object will be returned to you.

Note: certain fields cannot be modified and will be preserved across updates.

Request body:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "My updated local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}
curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/00000000-0000-0000-0000-000000000000' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "DataRepositoryGUID": "00000000-0000-0000-0000-000000000000",
    "CrawlScheduleGUID": "00000000-0000-0000-0000-000000000000",
    "CrawlFilterGUID": "00000000-0000-0000-0000-000000000000",
    "Name": "My updated crawl plan",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 30,
    "MetadataRuleGUID": "00000000-0000-0000-0000-000000000000",
    "ProcessingEndpoint": "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
    "ProcessingAccessKey": "default",
    "CleanupEndpoint": "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup",
    "CleanupAccessKey": "default"
}'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "00000000-0000-0000-0000-000000000000", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);


const updateCrawlPlan = async () => {
  try {
    const response = await crawler.updateCrawlPlan({
      GUID: "418cd284-4a30-4a9b-9e2a-b36645cbc6d7",
      TenantGUID: "00000000-0000-0000-0000-000000000000",
      DataRepositoryGUID: "2dc3ae2f-200c-4f5f-8c5a-9bedd7b6447c",
      CrawlScheduleGUID: "00000000-0000-0000-0000-000000000001",
      CrawlFilterGUID: "00000000-0000-0000-0000-000000000000",
      MetadataRuleGUID: "00000000-0000-0000-0000-000000000000",
      EmbeddingsRuleGUID: "00000000-0000-0000-0000-000000000001",
      Name: "Traeger Recipe Forums [UPDATED]",
      EnumerationDirectory: "./enumerations/",
      EnumerationsToRetain: 16,
      MaxDrainTasks: 4,
      ProcessAdditions: true,
      ProcessDeletions: true,
      ProcessUpdates: true,
      CreatedUtc: "2025-03-25T21:50:09.230321Z",
    });
    console.log(response, "Crawl plan updated successfully");
  } catch (err) {
    console.log("Error updating Crawl plan:", err);
  }
};

updateCrawlPlan();

Response body:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "My updated local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Delete

To delete an object by GUID, call DELETE /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid].

curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••' 
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "00000000-0000-0000-0000-000000000000", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);


const deleteCrawlPlan = async () => {
  try {
    const response = await crawler.deleteCrawlPlan(
      "418cd284-4a30-4a9b-9e2a-b36645cbc6d7"
    );
    console.log(response, "Crawl plan deleted successfully");
  } catch (err) {
    console.log("Error deleting Crawl plan:", err);
  }
};
deleteCrawlPlan();

Check Existance

The HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

curl --location --head 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "00000000-0000-0000-0000-000000000000", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const existsCrawlPlan = async () => {
  try {
    const response = await crawler.existsCrawlPlan(
      "418cd284-4a30-4a9b-9e2a-b36645cbc6d7"
    );
    console.log(response, "Crawl plan exists");
  } catch (err) {
    console.log("Error checking Crawl plan:", err);
  }
};

existsCrawlPlan();