Comprehensive guide to View's crawl schedule management system, including automated execution timing, interval configuration, and scheduling templates for efficient data ingestion and content discovery workflows.
Overview
The View Crawl Schedule management system provides comprehensive configuration for automated execution timing of data ingestion workflows. Crawl schedules serve as reusable templates that define when and how frequently crawl plans should be executed, enabling precise control over data discovery and processing automation.
Key Features
- Interval Configuration: Flexible scheduling with support for seconds, minutes, hours, and days intervals
- Template Reusability: Reusable schedule templates that can be referenced by multiple crawl plans
- One-Time Execution: Support for one-time crawl execution without repetition
- Automated Execution: Seamless integration with crawl plans for automated data ingestion
- Flexible Timing: Configurable interval values to match data update frequencies
- Resource Optimization: Efficient scheduling to balance data freshness with system resources
- Integration Support: Seamless integration with crawl plans and data repositories
- Template Management: Centralized management of scheduling templates across the platform
Supported Operations
- Create: Create new crawl schedule configurations with interval settings
- Read: Retrieve individual crawl schedule configurations and metadata
- Enumerate: List all crawl schedules with pagination support
- Update: Modify existing crawl schedule configurations and settings
- Delete: Remove crawl schedule configurations and associated templates
- Existence Check: Verify crawl schedule presence without retrieving details
API Endpoints
Crawl schedules are managed via the Crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawlschedules
Supported HTTP Methods: GET
, HEAD
, PUT
, DELETE
Important: All crawl schedule operations require appropriate authentication tokens.
Crawl Schedule Object Structure
Crawl schedule objects contain comprehensive configuration for automated execution timing. Here's the complete structure:
{
"GUID": "oneminute",
"TenantGUID": "default",
"Name": "Every minute",
"Schedule": "MinutesInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
Field Descriptions
- GUID (GUID): Globally unique identifier for the crawl schedule object
- TenantGUID (GUID): Globally unique identifier for the tenant
- Name (string): Display name for the crawl schedule
- Schedule (enum): Interval type of the schedule (OneTime, SecondsInterval, MinutesInterval, HoursInterval, DaysInterval)
- Interval (integer): Number of instances of the interval type (e.g., Interval=10 with Schedule=MinutesInterval means every 10 minutes)
- CreatedUtc (datetime): UTC timestamp when the crawl schedule was created
Important Notes
- Interval Configuration: Use appropriate interval types and values based on your data update frequency
- One-Time Execution: OneTime schedules execute once and do not repeat
- Template Reusability: Schedules can be referenced by multiple crawl plans for consistent timing
- Resource Considerations: Balance execution frequency with system resources and data freshness requirements
Create Crawl Schedule
Creates a new crawl schedule configuration using PUT /v1.0/tenants/[tenant-guid]/crawlschedules
. This endpoint allows you to define automated execution timing for crawl plans with flexible interval configurations.
Request Parameters
Required Parameters
- Name (string, Body, Required): Display name for the crawl schedule
- Schedule (enum, Body, Required): Interval type (OneTime, SecondsInterval, MinutesInterval, HoursInterval, DaysInterval)
- Interval (integer, Body, Required): Number of interval instances (e.g., 10 with MinutesInterval means every 10 minutes)
Important Notes
- Interval Types: Choose appropriate interval types based on your data update frequency and processing requirements
- One-Time Execution: Use OneTime schedule type for single execution without repetition
- Template Usage: Created schedules can be referenced by multiple crawl plans for consistent timing
- Resource Optimization: Balance execution frequency with system resources and data freshness needs
curl -X PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
"Name": "My schedule",
"Schedule": "DaysInterval",
"Interval": 1
}'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default", //access key
);
const createCrawlSchedules = async () => {
try {
const response = await api.CrawlSchedule.create({
Name: "My schedule",
Schedule: "DaysInterval",
Interval: 1,
});
console.log(response, "Data crawler created successfully");
} catch (err) {
console.log("Error creating Data crawler:", err);
}
};
createCrawlSchedules();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def createCrawlSchedule():
crawlSchedule = crawler.CrawlSchedule.create(
Name="My schedule",
Schedule="DaysInterval",
Interval=1
)
print(crawlSchedule)
createCrawlSchedule()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
CrawlSchedule schedule = new CrawlSchedule
{
Name = "My schedule",
Schedule = "DaysInterval",
Interval=1
};
CrawlSchedule createdSchedule = await sdk.CrawlSchedule.Create(schedule);
Response
Returns the created crawl schedule object with all configuration details:
{
"GUID": "oneminute",
"TenantGUID": "default",
"Name": "My schedule",
"Schedule": "DaysInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
Enumerate Crawl Schedules
Retrieves a paginated list of all crawl schedule objects in the tenant using GET /v2.0/tenants/[tenant-guid]/crawlschedules
. This endpoint provides comprehensive enumeration with pagination support for managing multiple crawl schedule configurations.
Request Parameters
No additional parameters required beyond authentication.
curl -X GET 'http://view.homedns.org:8000/v2.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules?enumerate' \
--header 'Authorization: ••••••' \
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default", //access key
);
const enumerateCrawlSchedules = async () => {
try {
const response = await api.CrawlSchedule.enumerate();
console.log(response, "Crawl schedules fetched successfully");
} catch (err) {
console.log("Error fetching Crawl schedules:", err);
}
};
enumerateCrawlSchedules();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def enumerateCrawlSchedules():
crawlSchedules = crawler.CrawlSchedule.enumerate()
print(crawlSchedules)
enumerateCrawlSchedules()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
EnumerationResult<CrawlSchedule> response = await sdk.CrawlSchedule.Enumerate();
Response
Returns a paginated list of crawl schedule objects:
{
"Success": true,
"Timestamp": {
"Start": "2024-10-21T02:36:37.677751Z",
"TotalMs": 23.58,
"Messages": {}
},
"MaxResults": 10,
"IterationsRequired": 1,
"EndOfResults": true,
"RecordsRemaining": 0,
"Objects": [
{
"GUID": "oneminute",
"TenantGUID": "default",
"Name": "Every minute",
"Schedule": "MinutesInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
},
{
"GUID": "hourly",
"TenantGUID": "default",
"Name": "Every hour",
"Schedule": "HoursInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:22:00.000000Z"
}
],
"ContinuationToken": null
}
Read Crawl Schedule
Retrieves crawl schedule configuration and metadata by GUID using GET /v1.0/tenants/[tenant-guid]/crawlschedules/[crawlschedule-guid]
. Returns the complete crawl schedule configuration including interval settings and timing details. If the schedule doesn't exist, a 404 error is returned.
Request Parameters
- crawlschedule-guid (string, Path, Required): GUID of the crawl schedule object to retrieve
curl -X GET 'http://192.168.101.63:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••' \
--header 'Content-Type: application/json'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default", //access key
);
const readCrawlSchedule = async () => {
try {
const response = await api.CrawlSchedule.read(
"<crawlschedule-guid>"
);
console.log(response, "Crawl schedule fetched successfully");
} catch (err) {
console.log("Error fetching Crawl schedule:", err);
}
};
readCrawlSchedule();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def readCrawlSchedule():
crawlSchedule = crawler.CrawlSchedule.retrieve("<crawlschedule-guid>")
print(crawlSchedule)
readCrawlSchedule()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
CrawlSchedule response = await sdk.CrawlSchedule.Retrieve(Guid.Parse("<crawlschedule-guid>"));
Response
Returns the complete crawl schedule configuration:
{
"GUID": "oneminute",
"TenantGUID": "default",
"Name": "Every minute",
"Schedule": "MinutesInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
Note: the HEAD
method can be used as an alternative to get to simply check the existence of the object. HEAD
requests return either a 200/OK
in the event the object exists, or a 404/Not Found
if not. No response body is returned with a HEAD
request.
Read All Crawl Schedules
Retrieves all crawl schedule objects in the tenant using GET /v1.0/tenants/[tenant-guid]/crawlschedules/
. Returns an array of crawl schedule objects with complete configuration details for all schedules in the tenant.
Request Parameters
No additional parameters required beyond authentication.
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default", //access key
);
const readAllCrawlSchedules = async () => {
try {
const response = await api.CrawlSchedule.readAll();
console.log(response, "All crawl schedules fetched successfully");
} catch (err) {
console.log("Error fetching All crawl schedules:", err);
}
};
readAllCrawlSchedules();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def readAllCrawlSchedules():
crawlSchedules = crawler.CrawlSchedule.retrieve_all()
print(crawlSchedules)
readAllCrawlSchedules()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
List<CrawlSchedule> schedules = await sdk.CrawlSchedule.RetrieveMany();
Response
Returns an array of all crawl schedule objects:
[
{
"GUID": "oneminute",
"TenantGUID": "default",
"Name": "Every minute",
"Schedule": "MinutesInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
},
{
"GUID": "hourly",
"TenantGUID": "default",
"Name": "Every hour",
"Schedule": "HoursInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:22:00.000000Z"
},
{
"GUID": "daily",
"TenantGUID": "default",
"Name": "Every day",
"Schedule": "DaysInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:23:00.000000Z"
}
]
Update Crawl Schedule
Updates an existing crawl schedule configuration using PUT /v1.0/tenants/[tenant-guid]/crawlschedules/[crawlschedule-guid]
. This endpoint allows you to modify schedule parameters while preserving certain immutable fields.
Request Parameters
- crawlschedule-guid (string, Path, Required): GUID of the crawl schedule object to update
Updateable Fields
All configuration parameters can be updated except for:
- GUID: Immutable identifier
- TenantGUID: Immutable tenant association
- CreatedUtc: Immutable creation timestamp
Important Notes
- Field Preservation: Certain fields cannot be modified and will be preserved across updates
- Complete Object: Provide a fully populated object in the request body
- Configuration Validation: All updated parameters will be validated before applying changes
- Impact Assessment: Consider the impact of schedule changes on existing crawl plans
curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/00000000-0000-0000-0000-000000000000' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
"Name": "My updated schedule",
"Schedule": "DaysInterval",
"Interval": 1
}'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default", //access key
);
const updateCrawlSchedule = async () => {
try {
const response = await api.CrawlSchedule.update({
GUID: "<crawlschedule-guid>",
TenantGUID: "<tenant-guid>",
Name: "My schedule [UPDATED]",
Schedule: "DaysInterval",
Interval: 1,
});
console.log(response, "Crawl schedule updated successfully");
} catch (err) {
console.log("Error updating Crawl schedule:", err);
}
};
updateCrawlSchedule();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def updateCrawlSchedule():
crawlSchedule = crawler.CrawlSchedule.update(
"<crawlschedule-guid>",
Name="My schedule [updated]",
Schedule="DaysInterval",
Interval=1
)
print(crawlSchedule)
updateCrawlSchedule()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
CrawlSchedule schedule = new CrawlSchedule
{
GUID = "<crawlschedule-guid>",
TenantGUID = "<tenant-guid>",
Name = "My schedule",
Schedule = "DaysInterval",
Interval = 1
};
CrawlSchedule createdSchedule = await sdk.CrawlSchedule.Update(schedule);
Response
Returns the updated crawl schedule object with all configuration details:
{
"GUID": "oneminute",
"TenantGUID": "default",
"Name": "Updated every minute schedule",
"Schedule": "MinutesInterval",
"Interval": 1,
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
Delete Crawl Schedule
Deletes a crawl schedule object by GUID using DELETE /v1.0/tenants/[tenant-guid]/crawlschedules/[crawlschedule-guid]
. This operation permanently removes the crawl schedule configuration from the system. Use with caution as this action cannot be undone.
Important Note: Ensure no active crawl plans are using this schedule before deletion, as this will break crawl plan execution.
Request Parameters
- crawlschedule-guid (string, Path, Required): GUID of the crawl schedule object to delete
curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default", //access key
);
const deleteCrawlSchedule = async () => {
try {
const response = await api.CrawlSchedule.delete(
"<crawlschedule-guid>"
);
console.log(response, "Crawl schedule deleted successfully");
} catch (err) {
console.log("Error deleting Crawl schedule:", err);
}
};
deleteCrawlSchedule();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def deleteCrawlSchedule():
crawlSchedule = crawler.CrawlSchedule.delete("<crawlschedule-guid>")
print(crawlSchedule)
deleteCrawlSchedule()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
bool deleted = await sdk.CrawlSchedule.Delete(Guid.Parse("<crawlschedule-guid>"));
Response
Returns 204 No Content on successful deletion. No response body is returned.
Check Crawl Schedule Existence
Verifies if a crawl schedule object exists without retrieving its configuration using HEAD /v1.0/tenants/[tenant-guid]/crawlschedules/[crawlschedule-guid]
. This is an efficient way to check schedule presence before performing operations.
Request Parameters
- crawlschedule-guid (string, Path, Required): GUID of the crawl schedule object to check
curl --location --head 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default", //access key
);
const existsCrawlSchedule = async () => {
try {
const response = await api.CrawlSchedule.exists(
"<crawlschedule-guid>"
);
console.log(response, "Crawl schedule exists");
} catch (err) {
console.log("Error checking Crawl schedule:", err);
}
};
existsCrawlSchedule();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def existsCrawlSchedule():
crawlSchedule = crawler.CrawlSchedule.exists("<crawlschedule-guid>")
print(crawlSchedule)
existsCrawlSchedule()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
bool exist = sdk.CrawlSchedule.Exists(Guid.Parse("<crawlschedule-guid>"));
Response
- 200 No Content: Crawl schedule exists
- 404 Not Found: Crawl schedule does not exist
- No response body: Only HTTP status code is returned
Note: HEAD requests do not return a response body, only the HTTP status code indicating whether the crawl schedule exists.
Best Practices
When managing crawl schedules in the View platform, consider the following recommendations for optimal execution timing and resource management:
- Frequency Optimization: Configure crawl frequencies based on data update patterns and processing requirements
- Resource Management: Balance execution frequency with system resources and network bandwidth
- Template Reusability: Create reusable schedule templates that can be shared across multiple crawl plans
- Interval Selection: Choose appropriate interval types (seconds, minutes, hours, days) based on data freshness needs
- Performance Monitoring: Monitor crawl execution performance and adjust schedules accordingly
Next Steps
After successfully configuring crawl schedules, you can:
- Crawl Plans: Create crawl plans that reference your configured schedules for automated data ingestion
- Data Repositories: Set up data repositories to define source data locations for crawling
- Crawl Filters: Configure crawl filters to optimize content discovery and processing
- Crawl Operations: Monitor crawl plan executions and track performance through crawl operations
- Integration: Integrate crawl schedules with other View platform services for comprehensive data processing workflows