View Crawl Plan Management API Reference

Overview

The View Crawl Plan management system provides comprehensive configuration for automated data ingestion workflows. Crawl plans serve as orchestration templates that map data repositories to crawl schedules and filters, defining the complete parameters for how data should be discovered, processed, and ingested into the View platform.

Key Features

Repository Mapping: Complete integration with data repositories for source data access
Schedule Integration: Automated execution based on configurable crawl schedules
Filter Application: Content filtering and size constraints through crawl filters
Processing Configuration: Integration with metadata and embeddings processing rules
Enumeration Management: Configurable enumeration storage and retention policies
Parallel Processing: Optimized parallel task execution for efficient data processing
Change Detection: Automated processing of additions, updates, and deletions
Workflow Orchestration: Complete automation of data ingestion and processing pipelines

Supported Operations

Create: Create new crawl plan configurations with repository and processing mappings
Read: Retrieve individual crawl plan configurations and metadata
Enumerate: List all crawl plans with pagination support
Update: Modify existing crawl plan configurations and settings
Delete: Remove crawl plan configurations and associated workflows
Existence Check: Verify crawl plan presence without retrieving details

API Endpoints

Crawl plans are managed via the Crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawlplans

Supported HTTP Methods: GET, HEAD, PUT, DELETE

Important: All crawl plan operations require appropriate authentication tokens.

Crawl Plan Object Structure

Crawl plan objects contain comprehensive configuration for automated data ingestion workflows. Here's the complete structure:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "Local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Field Descriptions

GUID (GUID): Globally unique identifier for the crawl plan object
TenantGUID (GUID): Globally unique identifier for the tenant
DataRepositoryGUID (GUID): Globally unique identifier for the data repository to crawl
CrawlScheduleGUID (GUID): Globally unique identifier for the crawl schedule
CrawlFilterGUID (GUID): Globally unique identifier for the crawl filter
MetadataRuleGUID (GUID): Globally unique identifier for the metadata processing rule
EmbeddingsRuleGUID (GUID): Globally unique identifier for the embeddings processing rule
Name (string): Display name for the crawl plan
EnumerationDirectory (string): Directory path for storing previous enumerations
EnumerationsToRetain (integer): Number of enumeration snapshots to retain
MaxDrainTasks (integer): Maximum number of parallel processing tasks
ProcessAdditions (boolean): Whether to process newly added files
ProcessDeletions (boolean): Whether to process deleted files
ProcessUpdates (boolean): Whether to process updated files
CreatedUtc (datetime): UTC timestamp when the crawl plan was created

Important Notes

Workflow Orchestration: Crawl plans define complete data ingestion workflows
Repository Integration: Plans must reference valid data repositories for source data access
Schedule Dependencies: Plans require valid crawl schedules for automated execution
Processing Rules: Integration with metadata and embeddings rules enables advanced data processing

Create Crawl Plan

Creates a new crawl plan configuration using PUT /v1.0/tenants/[tenant-guid]/crawlplans. This endpoint allows you to define automated data ingestion workflows by mapping data repositories to crawl schedules and filters.

Request Parameters

Required Parameters

DataRepositoryGUID (string, Body, Required): GUID of the data repository to crawl
CrawlScheduleGUID (string, Body, Required): GUID of the crawl schedule for execution timing
CrawlFilterGUID (string, Body, Required): GUID of the crawl filter for content filtering
Name (string, Body, Required): Display name for the crawl plan

Optional Parameters

MetadataRuleGUID (string, Body, Optional): GUID of the metadata processing rule
EmbeddingsRuleGUID (string, Body, Optional): GUID of the embeddings processing rule
EnumerationDirectory (string, Body, Optional): Directory path for storing enumerations (defaults to "./enumerations/")
EnumerationsToRetain (integer, Body, Optional): Number of enumeration snapshots to retain (defaults to 16)
MaxDrainTasks (integer, Body, Optional): Maximum parallel processing tasks (defaults to 4)
ProcessAdditions (boolean, Body, Optional): Whether to process new files (defaults to true)
ProcessDeletions (boolean, Body, Optional): Whether to process deleted files (defaults to true)
ProcessUpdates (boolean, Body, Optional): Whether to process updated files (defaults to true)

Important Notes

Repository Dependencies: Ensure the data repository exists and is accessible before creating the plan
Schedule Configuration: Verify the crawl schedule is properly configured for your requirements
Filter Application: Configure appropriate crawl filters to optimize processing performance
Processing Rules: Set up metadata and embeddings rules for advanced data processing capabilities

curl -X PUT http://localhost:8601/v1.0/tenants/[tenant-guid]/crawlschedules \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer [accesskey]" \
     -d '
{
    "DataRepositoryGUID": "e9068089-4c90-4ef7-b4bb-bafccb771a9c",
    "CrawlScheduleGUID": "default",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "example-embeddings-rule",
    "Name": "My crawl plan",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 30,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/" //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const createCrawlPlan = async () => {
  try {
    const response = await api.CrawlPlan.create({
      DataRepositoryGUID: "<datarepository-guid>",
      CrawlScheduleGUID: "<crawlschedule-guid>",
      CrawlFilterGUID: "<crawlfilter-guid>",
      Name: "My crawl plan [ASH]",
      EnumerationDirectory: "./enumerations/",
      EnumerationsToRetain: 30,
      MetadataRuleGUID: "<metadatarule-guid>",
      ProcessingEndpoint:
        "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
      ProcessingAccessKey: "default",
      CleanupEndpoint:
        "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup",
      CleanupAccessKey: "default",
    });
    console.log(response, "Crawl plan created successfully");
  } catch (err) {
    console.log("Error creating Crawl plan:", err);
  }
};

createCrawlPlan();

import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)


def createCrawlPlan():
    crawlPlan = crawler.CrawlPlan.create(
        DataRepositoryGUID="00000000-0000-0000-0000-000000000000",
        CrawlScheduleGUID="00000000-0000-0000-0000-000000000000",
        CrawlFilterGUID="00000000-0000-0000-0000-000000000000",
        Name="My crawl plan",
        EnumerationDirectory="./enumerations/",
        EnumerationsToRetain=30,
        MetadataRuleGUID="00000000-0000-0000-0000-000000000000",
        EmbeddingsRuleGUID="00000000-0000-0000-0000-000000000000",
        MaxDrainTasks=10,
        ProcessAdditions=True,
        ProcessDeletions=True,
        ProcessUpdates=True,
        ProcessingEndpoint="http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
        ProcessingAccessKey="default",
        CleanupEndpoint="http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup",
        CleanupAccessKey="default",
    )
    print(crawlPlan)

createCrawlPlan()

using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");

CrawlPlan plan = new CrawlPlan
{
  crawlPlan = crawler.CrawlPlan.create(
  DataRepositoryGUID="<datarepository-guid>",
  CrawlScheduleGUID="<crawlschedule-guid>",
  CrawlFilterGUID="<crawlfilter-guid>",
  Name="My crawl plan",
  EnumerationDirectory="./enumerations/",
  EnumerationsToRetain=30,
  MetadataRuleGUID="<metadatarule-guid>",
  ProcessingEndpoint="http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
  ProcessingAccessKey="default",
  CleanupEndpoint="http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup",
  CleanupAccessKey="default"
};

CrawlPlan response =  await sdk.CrawlPlan.Create(plan);

Response

Returns the created crawl plan object with all configuration details:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "My crawl plan",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 30,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Enumerate Crawl Plans

Retrieves a paginated list of all crawl plan objects in the tenant using GET /v2.0/tenants/[tenant-guid]/crawlplans. This endpoint provides comprehensive enumeration with pagination support for managing multiple crawl plan configurations.

Request Parameters

No additional parameters required beyond authentication.

curl --location 'http://view.homedns.org:8000/v2.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/" //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const enumerateCrawlPlans = async () => {
  try {
    const response = await api.CrawlPlan.enumerate();
    console.log(response, "Crawl plans fetched successfully");
  } catch (err) {
    console.log("Error fetching Crawl plans:", err);
  }
};

enumerateCrawlPlans();

import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def enumerateCrawlPlans():
    crawlPlans = crawler.CrawlPlan.enumerate()
    print(crawlPlans)

enumerateCrawlPlans()

using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
EnumerationResult<CrawlPlan> response = await sdk.CrawlPlan.Enumerate();

Response Structure

The enumeration response includes pagination metadata and crawl plan objects with complete configuration details:

Response

Returns a paginated list of crawl plan objects:

{
    "Success": true,
    "Timestamp": {
        "Start": "2024-10-21T02:36:37.677751Z",
        "TotalMs": 23.58,
        "Messages": {}
    },
    "MaxResults": 10,
    "IterationsRequired": 1,
    "EndOfResults": true,
    "RecordsRemaining": 0,
    "Objects": [
        {
            "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
            "TenantGUID": "default",
            "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
            "CrawlScheduleGUID": "oneminute",
            "CrawlFilterGUID": "default",
            "MetadataRuleGUID": "example-metadata-rule",
            "EmbeddingsRuleGUID": "crawler-embeddings-rule",
            "Name": "Local files",
            "EnumerationDirectory": "./enumerations/",
            "EnumerationsToRetain": 16,
            "MaxDrainTasks": 4,
            "ProcessAdditions": true,
            "ProcessDeletions": true,
            "ProcessUpdates": true,
            "CreatedUtc": "2024-10-23T15:14:26.000000Z"
        }
    ],
    "ContinuationToken": null
}

Read Crawl Plan

Retrieves crawl plan configuration and metadata by GUID using GET /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]. Returns the complete crawl plan configuration including repository mappings, schedule settings, and processing rules. If the plan doesn't exist, a 404 error is returned.

Request Parameters

crawlplan-guid (string, Path, Required): GUID of the crawl plan object to retrieve

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/" //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const readCrawlPlan = async () => {
  try {
    const response = await api.CrawlPlan.read(
      "<crawlplan-guid>"
    );
    console.log(response, "Crawl plan fetched successfully");
  } catch (err) {
    console.log("Error fetching Crawl plan:", err);
  }
};

readCrawlPlan();

import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def readCrawlPlan():
    crawlPlan = crawler.CrawlPlan.retrieve("<crawlplan-guid>")
    print(crawlPlan)

readCrawlPlan()

using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
CrawlPlan response = await sdk.CrawlPlan.Retrieve(Guid.Parse("<crawlPlan-guid>"));

Response

Returns the complete crawl plan configuration:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "Local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

Read All Crawl Plans

Retrieves all crawl plan objects in the tenant using GET /v1.0/tenants/[tenant-guid]/crawlplans/. Returns an array of crawl plan objects with complete configuration details for all plans in the tenant.

Request Parameters

No additional parameters required beyond authentication.

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/" //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const readAllCrawlPlans = async () => {
  try {
    const response = await api.CrawlPlan.readAll();
    console.log(response, "All crawl plans fetched successfully");
  } catch (err) {
    console.log("Error fetching All crawl plans:", err);
  }
};

readAllCrawlPlans();

import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def readAllCrawlPlans():
    crawlPlans = crawler.CrawlPlan.retrieve_all()
    print(crawlPlans)

readAllCrawlPlans()

using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
List<CrawlPlan> response = await sdk.CrawlPlan.RetrieveMany();

Response

Returns an array of all crawl plan objects:

[
    {
        "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
        "TenantGUID": "default",
        "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
        "CrawlScheduleGUID": "oneminute",
        "CrawlFilterGUID": "default",
        "MetadataRuleGUID": "example-metadata-rule",
        "EmbeddingsRuleGUID": "crawler-embeddings-rule",
        "Name": "Local files",
        "EnumerationDirectory": "./enumerations/",
        "EnumerationsToRetain": 16,
        "MaxDrainTasks": 4,
        "ProcessAdditions": true,
        "ProcessDeletions": true,
        "ProcessUpdates": true,
        "CreatedUtc": "2024-10-23T15:14:26.000000Z"
    },
    {
        "GUID": "another-crawl-plan",
        "TenantGUID": "default",
        "DataRepositoryGUID": "another-repository-guid",
        "CrawlScheduleGUID": "hourly",
        "CrawlFilterGUID": "large-files-filter",
        "MetadataRuleGUID": "production-metadata-rule",
        "EmbeddingsRuleGUID": "production-embeddings-rule",
        "Name": "Production crawl plan",
        "EnumerationDirectory": "./enumerations/production/",
        "EnumerationsToRetain": 30,
        "MaxDrainTasks": 8,
        "ProcessAdditions": true,
        "ProcessDeletions": true,
        "ProcessUpdates": true,
        "CreatedUtc": "2024-10-24T10:30:15.123456Z"
    }
]

Update Crawl Plan

Updates an existing crawl plan configuration using PUT /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]. This endpoint allows you to modify crawl plan parameters while preserving certain immutable fields.

Request Parameters

crawlplan-guid (string, Path, Required): GUID of the crawl plan object to update

Updateable Fields

All configuration parameters can be updated except for:

GUID: Immutable identifier
TenantGUID: Immutable tenant association
CreatedUtc: Immutable creation timestamp

Important Notes

Field Preservation: Certain fields cannot be modified and will be preserved across updates
Complete Object: Provide a fully populated object in the request body
Configuration Validation: All updated parameters will be validated before applying changes
Workflow Impact: Consider the impact of plan changes on existing crawl operations

Request body:

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/00000000-0000-0000-0000-000000000000' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "DataRepositoryGUID": "00000000-0000-0000-0000-000000000000",
    "CrawlScheduleGUID": "00000000-0000-0000-0000-000000000000",
    "CrawlFilterGUID": "00000000-0000-0000-0000-000000000000",
    "Name": "My updated crawl plan",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 30,
    "MetadataRuleGUID": "00000000-0000-0000-0000-000000000000",
    "ProcessingEndpoint": "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
    "ProcessingAccessKey": "default",
    "CleanupEndpoint": "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup",
    "CleanupAccessKey": "default"
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/" //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);


const updateCrawlPlan = async () => {
  try {
    const response = await api.CrawlPlan.update({
      GUID: "<crawlplan-guid>",
      TenantGUID: "<tenant-guid>",
      DataRepositoryGUID: "<datarepository-guid>",
      CrawlScheduleGUID: "<crawlschedule-guid>",
      CrawlFilterGUID: "<crawlfilter-guid>",
      MetadataRuleGUID: "<metadatarule-guid>",
      EmbeddingsRuleGUID: "<metadatarule-guid>",
      Name: "Traeger Recipe Forums [UPDATED]",
      EnumerationDirectory: "./enumerations/",
      EnumerationsToRetain: 16,
      MaxDrainTasks: 4,
      ProcessAdditions: true,
      ProcessDeletions: true,
      ProcessUpdates: true,
      CreatedUtc: "2025-03-25T21:50:09.230321Z",
    });
    console.log(response, "Crawl plan updated successfully");
  } catch (err) {
    console.log("Error updating Crawl plan:", err);
  }
};

updateCrawlPlan();

import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def updateCrawlPlan():
    crawlPlan = crawler.CrawlPlan.update(
        "<crawlplan-guid>",
        DataRepositoryGUID="<datarepository-guid>",
        CrawlScheduleGUID="<crawlschedule-guid>",
        CrawlFilterGUID="<crawlfilter-guid>",
        Name="My crawl plan [updated]",
        EnumerationDirectory="./enumerations/",
        EnumerationsToRetain=30,
        MetadataRuleGUID="<metadatarule-guid>",
        ProcessingEndpoint="http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
        ProcessingAccessKey="default",
        CleanupEndpoint="http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup",
        CleanupAccessKey="default"
    )
    print(crawlPlan)

updateCrawlPlan()

using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");

CrawlPlan plan = new CrawlPlan
{
  GUID= "<crawlplan-guid>",
  TenantGUID= "<tenant-guid>",
  DataRepositoryGUID= "<datarepository-guid>",
  CrawlScheduleGUID= "<crawlschedule-guid>",
  CrawlFilterGUID= "<crawlfilter-guid>",
  MetadataRuleGUID= "<metadatarule-guid>",
  EmbeddingsRuleGUID= "<metadatarule-guid>",
  crawlPlan = crawler.CrawlPlan.create(
  DataRepositoryGUID="<datarepository-guid>",
  CrawlScheduleGUID="<crawlschedule-guid>",
  CrawlFilterGUID="<crawlfilter-guid>",
  Name="My crawl plan",
  EnumerationDirectory="./enumerations/",
  EnumerationsToRetain=30,
  MetadataRuleGUID="<metadatarule-guid>",
  ProcessingEndpoint="http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
  ProcessingAccessKey="default",
  CleanupEndpoint="http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup",
  CleanupAccessKey="default"
};

CrawlPlan response = await sdk.CrawlPlan.Update(plan);

Response

Returns the updated crawl plan object with all configuration details:

{
    "GUID": "4292118d-3397-4090-88c6-90f1886a3e35",
    "TenantGUID": "default",
    "DataRepositoryGUID": "c854f5f2-68f6-44c4-813e-9c1dea51676a",
    "CrawlScheduleGUID": "oneminute",
    "CrawlFilterGUID": "default",
    "MetadataRuleGUID": "example-metadata-rule",
    "EmbeddingsRuleGUID": "crawler-embeddings-rule",
    "Name": "My updated local files",
    "EnumerationDirectory": "./enumerations/",
    "EnumerationsToRetain": 16,
    "MaxDrainTasks": 4,
    "ProcessAdditions": true,
    "ProcessDeletions": true,
    "ProcessUpdates": true,
    "CreatedUtc": "2024-10-23T15:14:26.000000Z"
}

Delete Crawl Plan

Deletes a crawl plan object by GUID using DELETE /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]. This operation permanently removes the crawl plan configuration from the system. Use with caution as this action cannot be undone.

Important Note: Ensure no active crawl operations are using this plan before deletion, as this will break ongoing crawl executions.

Request Parameters

crawlplan-guid (string, Path, Required): GUID of the crawl plan object to delete

curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/" //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);


const deleteCrawlPlan = async () => {
  try {
    const response = await api.CrawlPlan.delete(
      "<crawlplan-guid>"
    );
    console.log(response, "Crawl plan deleted successfully");
  } catch (err) {
    console.log("Error deleting Crawl plan:", err);
  }
};
deleteCrawlPlan();

import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def deleteCrawlPlan():
    crawlPlan = crawler.CrawlPlan.delete("<crawlplan-guid>")
    print(crawlPlan)

deleteCrawlPlan()

using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
bool deleted = await sdk.CrawlPlan.Delete(Guid.Parse("<crawlPlan-guid>"));

Response

Returns 204 No Content on successful deletion. No response body is returned.

Check Crawl Plan Existence

Verifies if a crawl plan object exists without retrieving its configuration using HEAD /v1.0/tenants/[tenant-guid]/crawlplans/[crawlplan-guid]. This is an efficient way to check plan presence before performing operations.

Request Parameters

crawlplan-guid (string, Path, Required): GUID of the crawl plan object to check

curl --location --head 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlplans/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/" //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const existsCrawlPlan = async () => {
  try {
    const response = await api.CrawlPlan.exists(
      "<crawlplan-guid>"
    );
    console.log(response, "Crawl plan exists");
  } catch (err) {
    console.log("Error checking Crawl plan:", err);
  }
};

existsCrawlPlan();

import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def existsCrawlPlan():
    crawlPlan = crawler.CrawlPlan.exists("<crawlplan-guid>")
    print(crawlPlan)

existsCrawlPlan()

using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
bool exists = await sdk.CrawlPlan.Exists(Guid.Parse("<crawlPlan-guid>"));

Response

200 No Content: Crawl plan exists
404 Not Found: Crawl plan does not exist
No response body: Only HTTP status code is returned

Note: HEAD requests do not return a response body, only the HTTP status code indicating whether the crawl plan exists.

Best Practices

When managing crawl plans in the View platform, consider the following recommendations for optimal data ingestion workflow configuration:

Repository Validation: Ensure data repositories are properly configured and accessible before creating crawl plans
Schedule Optimization: Configure crawl schedules based on data update frequency and processing requirements
Filter Configuration: Use appropriate crawl filters to optimize processing performance and reduce unnecessary data handling
Processing Rules: Set up metadata and embeddings rules for comprehensive data processing and analysis
Performance Tuning: Monitor and adjust parallel processing settings based on system resources and data volume

Next Steps

After successfully configuring crawl plans, you can:

Crawl Operations: Monitor crawl plan executions and track processing performance through crawl operations
Data Repositories: Set up additional data repositories to expand your data ingestion capabilities
Crawl Schedules: Create and configure crawl schedules to define automated execution timing
Crawl Filters: Develop specialized crawl filters for different content types and processing requirements
Integration: Integrate crawl plans with other View platform services for comprehensive data processing workflows