Crawl Schedules

Object Overview

Crawl schedules provide a reusable template that can be referenced by a crawl plan to define how frequently a given data repository is crawled.

Endpoint, URL, and Supported Methods

Objects are managed via the crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawlschedules

Supported methods include: GET HEAD PUT DELETE

Structure

Objects have the following structure:

{
    "GUID": "oneminute",
    "TenantGUID": "default",
    "Name": "Every minute",
    "Schedule": "MinutesInterval",
    "Interval": 1,
    "CreatedUtc": "2024-07-10T05:21:00.000000Z"
}

Properties:

GUID GUID globally unique identifier for the object
TenantGUID GUID globally unique identifier for the tenant
Name string name of the object
Schedule enum the interval type of the schedule, valid values are OneTime SecondsInterval MinutesInterval HoursInterval DaysInterval
Interval int defines the number of instances of the interval type, for instance Interval with value of 10 and a Schedule of MinutesInterval indicates a crawl schedule that operates every 10-minutes. OneTime schedules do not repeat
CreatedUtc datetime timestamp from creation, in UTC time

Important: the user's password is never stored by View, but rather the SHA-256 hash within the PasswordSha256 property. As such this property is redacted when retrieving, enumerating, or updating the user object.

Create

To create, call PUT /v1.0/tenants/[tenant-guid]/crawlschedules with the following properties using the configuration server: ``Name Schedule Interval`

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My schedule",
    "Schedule": "DaysInterval",
    "Interval": 1
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createCrawlSchedules = async () => {
  try {
    const response = await api.CrawlSchedule.create({
      Name: "My schedule",
      Schedule: "DaysInterval",
      Interval: 1,
    });
    console.log(response, "Data crawler created successfully");
  } catch (err) {
    console.log("Error creating Data crawler:", err);
  }
};

createCrawlSchedules();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def createCrawlSchedule():
    crawlSchedule = crawler.CrawlSchedule.create(
        Name="My schedule",
        Schedule="DaysInterval",
        Interval=1
    )
    print(crawlSchedule)

createCrawlSchedule()

Enumerate

Refer to the Enumeration page in REST API for details about the use of enumeration APIs.

Enumerate objects by using GET /v2.0/tenants/[tenant-guid]/crawlschedules. The resultant object will appear as:

{
    "Success": true,
    "Timestamp": {
        "Start": "2024-10-21T02:36:37.677751Z",
        "TotalMs": 23.58,
        "Messages": {}
    },
    "MaxResults": 10,
    "IterationsRequired": 1,
    "EndOfResults": true,
    "RecordsRemaining": 16,
    "Objects": [
        {
            "GUID": "example-crawlschedule",
            ... crawlschedule details ...
        },
        { ... }
    ],
    "ContinuationToken": "[continuation-token]"
}

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const enumerateCrawlSchedules = async () => {
  try {
    const response = await api.CrawlSchedule.enumerate();
    console.log(response, "Crawl schedules fetched successfully");
  } catch (err) {
    console.log("Error fetching Crawl schedules:", err);
  }
};

enumerateCrawlSchedules();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def enumerateCrawlSchedules():
    crawlSchedules = crawler.CrawlSchedule.enumerate()
    print(crawlSchedules)

enumerateCrawlSchedules()

Read

To read an object by GUID, call GET /v1.0/tenants/[tenant-guid]/crawlschedules/[crawlschedule-guid]. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

{
    "GUID": "oneminute",
    "TenantGUID": "default",
    "Name": "Every minute",
    "Schedule": "MinutesInterval",
    "Interval": 1,
    "CreatedUtc": "2024-07-10T05:21:00.000000Z"
}

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const readCrawlSchedule = async () => {
  try {
    const response = await api.CrawlSchedule.read(
      "<crawlschedule-guid>"
    );
    console.log(response, "Crawl schedule fetched successfully");
  } catch (err) {
    console.log("Error fetching Crawl schedule:", err);
  }
};

readCrawlSchedule();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readCrawlSchedule():
    crawlSchedule = crawler.CrawlSchedule.retrieve("<crawlschedule-guid>")
    print(crawlSchedule)

readCrawlSchedule()

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

Read all

o read all objects, call GET /v1.0/tenants/[tenant-guid]/crawlschedules/. If the object exists, it will be returned as an array of JSON object in the response body

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const readAllCrawlSchedules = async () => {
  try {
    const response = await api.CrawlSchedule.readAll();
    console.log(response, "All crawl schedules fetched successfully");
  } catch (err) {
    console.log("Error fetching All crawl schedules:", err);
  }
};

readAllCrawlSchedules();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readAllCrawlSchedules():
    crawlSchedules = crawler.CrawlSchedule.retrieve_all()
    print(crawlSchedules)

readAllCrawlSchedules()

Update

To update an object by GUID, call PUT /v1.0/tenants/[tenant-guid]/crawlschedules/[crawlschedule-guid] with a fully populated object in the request body. The updated object will be returned to you.

Note: certain fields cannot be modified and will be preserved across updates.

Request body:

{
    "GUID": "oneminute",
    "TenantGUID": "default",
    "Name": "Updated every minute schedule",
    "Schedule": "MinutesInterval",
    "Interval": 1,
    "CreatedUtc": "2024-07-10T05:21:00.000000Z"
}

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/00000000-0000-0000-0000-000000000000' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My updated schedule",
    "Schedule": "DaysInterval",
    "Interval": 1
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const updateCrawlSchedule = async () => {
  try {
    const response = await api.CrawlSchedule.update({
      GUID: "<crawlschedule-guid>",
      TenantGUID: "<tenant-guid>",
      Name: "My schedule [UPDATED]",
      Schedule: "DaysInterval",
      Interval: 1,
    });
    console.log(response, "Crawl schedule updated successfully");
  } catch (err) {
    console.log("Error updating Crawl schedule:", err);
  }
};

updateCrawlSchedule();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def updateCrawlSchedule():
    crawlSchedule = crawler.CrawlSchedule.update(
        "<crawlschedule-guid>",
        Name="My schedule [updated]",
        Schedule="DaysInterval",
        Interval=1
    )
    print(crawlSchedule)

updateCrawlSchedule()

Response body:

{
    "GUID": "oneminute",
    "TenantGUID": "default",
    "Name": "Updated every minute schedule",
    "Schedule": "MinutesInterval",
    "Interval": 1,
    "CreatedUtc": "2024-07-10T05:21:00.000000Z"
}

Delete

To delete an object by GUID, call DELETE /v1.0/tenants/[tenant-guid]/crawlschedules/[crawlschedule-guid].

curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const deleteCrawlSchedule = async () => {
  try {
    const response = await api.CrawlSchedule.delete(
      "<crawlschedule-guid>"
    );
    console.log(response, "Crawl schedule deleted successfully");
  } catch (err) {
    console.log("Error deleting Crawl schedule:", err);
  }
};

deleteCrawlSchedule();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def deleteCrawlSchedule():
    crawlSchedule = crawler.CrawlSchedule.delete("<crawlschedule-guid>")
    print(crawlSchedule)

deleteCrawlSchedule()

Check Existance

curl --location --head 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlschedules/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const existsCrawlSchedule = async () => {
  try {
    const response = await api.CrawlSchedule.exists(
      "<crawlschedule-guid>"
    );
    console.log(response, "Crawl schedule exists");
  } catch (err) {
    console.log("Error checking Crawl schedule:", err);
  }
};

existsCrawlSchedule();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def existsCrawlSchedule():
    crawlSchedule = crawler.CrawlSchedule.exists("<crawlschedule-guid>")
    print(crawlSchedule)

existsCrawlSchedule()