Comprehensive guide to View's crawl operation management system, including operation monitoring, execution tracking, performance metrics, and status management for automated data ingestion and content discovery workflows.
Overview
The View Crawl Operation management system provides comprehensive monitoring and tracking of crawl plan executions. Crawl operations serve as metadata containers that capture detailed information about each invocation of a crawl plan, including execution status, performance metrics, object counts, and timing information for complete visibility into data ingestion processes.
Key Features
- Execution Tracking: Complete monitoring of crawl plan invocations and execution status
- Performance Metrics: Detailed statistics on objects processed, bytes transferred, and processing times
- Status Management: Real-time tracking of operation states including enumeration, retrieval, and completion
- Timing Information: Comprehensive timestamps for all operation phases and milestones
- Error Monitoring: Tracking of failed objects and error conditions during crawl execution
- Resource Utilization: Monitoring of processing endpoints and cleanup operations
- Historical Data: Retention of operation metadata for analysis and troubleshooting
- Integration Support: Seamless integration with crawl plans, schedules, and data repositories
Supported Operations
- Read: Retrieve individual crawl operation metadata and execution details
- Enumerate: List all crawl operations with pagination support
- Read All: Retrieve all crawl operations in the tenant
- Retrieve Enumeration: Access detailed enumeration data for specific operations
- Stop: Terminate running crawl operations
- Delete: Remove crawl operation records and associated metadata
- Existence Check: Verify crawl operation presence without retrieving details
API Endpoints
Crawl operations are managed via the Crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawloperations
Supported HTTP Methods: GET
, HEAD
, DELETE
Important: All crawl operation operations require appropriate authentication tokens.
Crawl Operation Object Structure
Crawl operation objects contain comprehensive metadata about crawl plan executions. Here's the complete structure:
{
"GUID": "9ced1af3-2e19-4cd6-81ce-a8c90a9ed32d",
"TenantGUID": "default",
"CrawlPlanGUID": "e9a7d61e-7cbd-46e4-9956-533e22008978",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"DataRepositoryGUID": "1a56c067-9e6d-4f7b-85bf-eb6b04aeda3f",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"ProcessingEndpoint": "http://nginx-orchestrator:8501/processor",
"CleanupEndpoint": "http://nginx-orchestrator:8501/processor/cleanup",
"Name": "Alienware CIFS (started 2024-10-25T22:29:31 UTC)",
"ObjectsEnumerated": 123,
"BytesEnumerated": 61052490,
"ObjectsAdded": 0,
"BytesAdded": 0,
"ObjectsUpdated": 0,
"BytesUpdated": 0,
"ObjectsDeleted": 0,
"BytesDeleted": 0,
"ObjectsSuccess": 0,
"BytesSuccess": 0,
"ObjectsFailed": 2,
"BytesFailed": 16384,
"State": "Success",
"CreatedUtc": "2024-10-25T22:29:31.000000Z",
"StartUtc": "2024-10-25T22:29:31.000000Z",
"StartEnumerationUtc": "2024-10-25T22:29:31.000000Z",
"StartRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishEnumerationUtc": "2024-10-25T22:29:39.000000Z",
"FinishRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishUtc": "2024-10-25T22:29:39.000000Z",
"AdditionalData": "No objects detected during enumeration"
}
Field Descriptions
- GUID (GUID): Globally unique identifier for the crawl operation object
- TenantGUID (GUID): Globally unique identifier for the tenant
- CrawlPlanGUID (GUID): Globally unique identifier for the associated crawl plan
- CrawlScheduleGUID (GUID): Globally unique identifier for the crawl schedule
- CrawlFilterGUID (GUID): Globally unique identifier for the crawl filter
- DataRepositoryGUID (GUID): Globally unique identifier for the data repository
- MetadataRuleGUID (GUID): Globally unique identifier for the metadata rule
- EmbeddingsRuleGUID (GUID): Globally unique identifier for the embeddings rule
- ProcessingEndpoint (string): URL endpoint for processing new and changed objects
- CleanupEndpoint (string): URL endpoint for processing deleted objects
- Name (string): Display name for the crawl operation
- ObjectsEnumerated (integer): Total number of objects enumerated during the operation
- BytesEnumerated (integer): Total number of bytes enumerated during the operation
- ObjectsAdded (integer): Number of objects added since the latest enumeration
- BytesAdded (integer): Number of bytes added since the latest enumeration
- ObjectsUpdated (integer): Number of objects updated since the latest enumeration
- BytesUpdated (integer): Number of bytes updated since the latest enumeration
- ObjectsDeleted (integer): Number of objects deleted from the latest enumeration
- BytesDeleted (integer): Number of bytes deleted from the latest enumeration
- ObjectsSuccess (integer): Number of objects successfully processed
- BytesSuccess (integer): Number of bytes successfully processed
- ObjectsFailed (integer): Number of objects that failed processing
- BytesFailed (integer): Number of bytes that failed processing
- State (enum): Current state of the crawl operation (NotStarted, Starting, Stopped, Canceled, Enumerating, Retrieving, Deleting, Success, Failed)
- CreatedUtc (datetime): UTC timestamp when the crawl operation was created
- StartUtc (datetime): UTC timestamp when the crawl operation started
- StartEnumerationUtc (datetime): UTC timestamp when enumeration phase began
- StartRetrievalUtc (datetime): UTC timestamp when retrieval phase began
- FinishEnumerationUtc (datetime): UTC timestamp when enumeration phase completed
- FinishRetrievalUtc (datetime): UTC timestamp when retrieval phase completed
- FinishUtc (datetime): UTC timestamp when the crawl operation completed
- AdditionalData (string): Additional information or notes about the crawl operation execution
Important Notes
- Read-Only Objects: Crawl operations are automatically created and managed by the system
- Execution Tracking: Operations provide complete visibility into crawl plan execution
- Performance Metrics: Detailed statistics help monitor and optimize crawl performance
- State Management: Real-time status tracking enables proactive operation management
Enumerate Crawl Operations
Retrieves a paginated list of all crawl operation objects in the tenant using GET /v2.0/tenants/[tenant-guid]/crawloperations
. This endpoint provides comprehensive enumeration with pagination support for monitoring multiple crawl operation executions.
Request Parameters
No additional parameters required beyond authentication.
curl --location 'http://view.homedns.org:8000/v2.0/tenants/00000000-0000-0000-0000-000000000000/crawloperations/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default" //access key
);
const enumerateCrawlOperations = async () => {
try {
const response = await api.CrawlOperation.enumerate();
console.log(response, "Crawl operations fetched successfully");
} catch (err) {
console.log("Error fetching Crawl operations:", err);
}
};
enumerateCrawlOperations();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def enumerateCrawlOperations():
crawlOperations = crawler.CrawlOperation.enumerate()
print(crawlOperations)
enumerateCrawlOperations()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
EnumerationResult<CrawlOperation> response = await sdk.CrawlOperation.Enumerate();
Response Structure
The enumeration response includes pagination metadata and crawl operation objects with complete execution details:
Response
Returns a paginated list of crawl operation objects:
{
"Success": true,
"Timestamp": {
"Start": "2024-10-21T02:36:37.677751Z",
"TotalMs": 23.58,
"Messages": {}
},
"MaxResults": 10,
"IterationsRequired": 1,
"EndOfResults": true,
"RecordsRemaining": 0,
"Objects": [
{
"GUID": "9ced1af3-2e19-4cd6-81ce-a8c90a9ed32d",
"TenantGUID": "default",
"CrawlPlanGUID": "e9a7d61e-7cbd-46e4-9956-533e22008978",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"DataRepositoryGUID": "1a56c067-9e6d-4f7b-85bf-eb6b04aeda3f",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"ProcessingEndpoint": "http://nginx-orchestrator:8501/processor",
"CleanupEndpoint": "http://nginx-orchestrator:8501/processor/cleanup",
"Name": "Alienware CIFS (started 2024-10-25T22:29:31 UTC)",
"ObjectsEnumerated": 123,
"BytesEnumerated": 61052490,
"ObjectsAdded": 0,
"BytesAdded": 0,
"ObjectsUpdated": 0,
"BytesUpdated": 0,
"ObjectsDeleted": 0,
"BytesDeleted": 0,
"ObjectsSuccess": 0,
"BytesSuccess": 0,
"ObjectsFailed": 2,
"BytesFailed": 16384,
"State": "Success",
"CreatedUtc": "2024-10-25T22:29:31.000000Z",
"StartUtc": "2024-10-25T22:29:31.000000Z",
"StartEnumerationUtc": "2024-10-25T22:29:31.000000Z",
"StartRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishEnumerationUtc": "2024-10-25T22:29:39.000000Z",
"FinishRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishUtc": "2024-10-25T22:29:39.000000Z",
"AdditionalData": "No objects detected during enumeration"
}
],
"ContinuationToken": null
}
Read Crawl Operation
Retrieves crawl operation metadata and execution details by GUID using GET /v1.0/tenants/[tenant-guid]/crawloperations/[crawloperation-guid]
. Returns the complete crawl operation information including execution status, performance metrics, and timing details. If the operation doesn't exist, a 404 error is returned.
Request Parameters
- crawloperation-guid (string, Path, Required): GUID of the crawl operation object to retrieve
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawloperations/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default" //access key
);
const readCrawlOperation = async () => {
try {
const response = await api.CrawlOperation.read(
"<crawloperation-guid>"
);
console.log(response, "Crawl operation fetched successfully");
} catch (err) {
console.log("Error fetching Crawl operation:", err);
}
};
readCrawlOperation();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def readCrawlOperation():
crawlOperation = crawler.CrawlOperation.retrieve("<crawloperation-guid>")
print(crawlOperation)
readCrawlOperation()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
CrawlOperation response = await sdk.CrawlOperation.Retrieve(Guid.Parse("<crawloperation-guid>"));
Response
Returns the complete crawl operation metadata:
{
"GUID": "9ced1af3-2e19-4cd6-81ce-a8c90a9ed32d",
"TenantGUID": "default",
"CrawlPlanGUID": "e9a7d61e-7cbd-46e4-9956-533e22008978",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"DataRepositoryGUID": "1a56c067-9e6d-4f7b-85bf-eb6b04aeda3f",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"ProcessingEndpoint": "http://nginx-orchestrator:8501/processor",
"CleanupEndpoint": "http://nginx-orchestrator:8501/processor/cleanup",
"Name": "Alienware CIFS (started 2024-10-25T22:29:31 UTC)",
"ObjectsEnumerated": 123,
"BytesEnumerated": 61052490,
"ObjectsAdded": 0,
"BytesAdded": 0,
"ObjectsUpdated": 0,
"BytesUpdated": 0,
"ObjectsDeleted": 0,
"BytesDeleted": 0,
"ObjectsSuccess": 0,
"BytesSuccess": 0,
"ObjectsFailed": 2,
"BytesFailed": 16384,
"State": "Success",
"CreatedUtc": "2024-10-25T22:29:31.000000Z",
"StartUtc": "2024-10-25T22:29:31.000000Z",
"StartEnumerationUtc": "2024-10-25T22:29:31.000000Z",
"StartRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishEnumerationUtc": "2024-10-25T22:29:39.000000Z",
"FinishRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishUtc": "2024-10-25T22:29:39.000000Z",
"AdditionalData": "No objects detected during enumeration"
}
Note: the HEAD
method can be used as an alternative to get to simply check the existence of the object. HEAD
requests return either a 200/OK
in the event the object exists, or a 404/Not Found
if not. No response body is returned with a HEAD
request.
Retrieve Crawl Operation Enumeration
Retrieves detailed enumeration data for a specific crawl operation using GET /v1.0/tenants/[tenant-guid]/crawloperations/[crawloperation-guid]/enumeration
. This endpoint provides access to the complete enumeration results and object details discovered during the crawl operation.
Request Parameters
- crawloperation-guid (string, Path, Required): GUID of the crawl operation object to retrieve enumeration data for
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawloperations/78d881af-ad82-48bc-8097-1fccd2787624/enumeration' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default" //access key
);
const retrieveEnumerationCrawlOperations = async () => {
try {
const response = await api.CrawlOperation.readEnumeration(
"<crawloperation-guid>"
);
console.log(response, "Crawl operation enumeration fetched successfully");
} catch (err) {
console.log("Error fetching Crawl operation enumeration:", err);
}
};
retrieveEnumerationCrawlOperations();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def enumerateCrawlOperation():
enumeration = crawler.CrawlOperation.enumerateCrawlOperation("<crawloperation-guid>")
print(enumeration)
enumerateCrawlOperation()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
CrawlEnumeration response = await sdk.CrawlOperation.RetrieveEnumeration(Guid.Parse("<crawloperation-guid>"));
Response
Returns the detailed enumeration data for the crawl operation:
{
"GUID": "9ced1af3-2e19-4cd6-81ce-a8c90a9ed32d",
"TenantGUID": "default",
"CrawlPlanGUID": "e9a7d61e-7cbd-46e4-9956-533e22008978",
"DataRepositoryGUID": "1a56c067-9e6d-4f7b-85bf-eb6b04aeda3f",
"Name": "Alienware CIFS (started 2024-10-25T22:29:31 UTC)",
"ObjectsEnumerated": 123,
"BytesEnumerated": 61052490,
"EnumerationDirectory": "./enumerations/",
"EnumerationsToRetain": 16,
"MaxDrainTasks": 4,
"ProcessAdditions": true,
"ProcessDeletions": true,
"ProcessUpdates": true,
"CreatedUtc": "2024-10-25T22:29:31.000000Z",
"StartUtc": "2024-10-25T22:29:31.000000Z",
"FinishUtc": "2024-10-25T22:29:39.000000Z"
}
Read All Crawl Operations
Retrieves all crawl operation objects in the tenant using GET /v1.0/tenants/[tenant-guid]/crawloperations/
. Returns an array of crawl operation objects with complete execution details for all operations in the tenant.
Request Parameters
No additional parameters required beyond authentication.
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawloperations/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default" //access key
);
const readAllCrawlOperations = async () => {
try {
const response = await api.CrawlOperation.readAll();
console.log(response, "All crawl operations fetched successfully");
} catch (err) {
console.log("Error fetching All crawl operations:", err);
}
};
readAllCrawlOperations();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def readAllCrawlOperations():
crawlOperations = crawler.CrawlOperation.retrieve_all()
print(crawlOperations)
readAllCrawlOperations()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
List<CrawlOperation> response = await sdk.CrawlOperation.RetrieveAll();
Response
Returns an array of all crawl operation objects:
[
{
"GUID": "9ced1af3-2e19-4cd6-81ce-a8c90a9ed32d",
"TenantGUID": "default",
"CrawlPlanGUID": "e9a7d61e-7cbd-46e4-9956-533e22008978",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"DataRepositoryGUID": "1a56c067-9e6d-4f7b-85bf-eb6b04aeda3f",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"ProcessingEndpoint": "http://nginx-orchestrator:8501/processor",
"CleanupEndpoint": "http://nginx-orchestrator:8501/processor/cleanup",
"Name": "Alienware CIFS (started 2024-10-25T22:29:31 UTC)",
"ObjectsEnumerated": 123,
"BytesEnumerated": 61052490,
"ObjectsAdded": 0,
"BytesAdded": 0,
"ObjectsUpdated": 0,
"BytesUpdated": 0,
"ObjectsDeleted": 0,
"BytesDeleted": 0,
"ObjectsSuccess": 0,
"BytesSuccess": 0,
"ObjectsFailed": 2,
"BytesFailed": 16384,
"State": "Success",
"CreatedUtc": "2024-10-25T22:29:31.000000Z",
"StartUtc": "2024-10-25T22:29:31.000000Z",
"StartEnumerationUtc": "2024-10-25T22:29:31.000000Z",
"StartRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishEnumerationUtc": "2024-10-25T22:29:39.000000Z",
"FinishRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishUtc": "2024-10-25T22:29:39.000000Z",
"AdditionalData": "No objects detected during enumeration"
}
]
Stop Crawl Operation
Terminates a running crawl operation by GUID using DELETE /v1.0/tenants/[tenant-guid]/crawloperations/[crawloperation-guid]/stop
. This operation gracefully stops the crawl execution and updates the operation state to indicate termination.
Request Parameters
- crawloperation-guid (string, Path, Required): GUID of the crawl operation object to stop
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawloperations/00000000-0000-0000-0000-000000000000/stop' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
"Name": "My tenant"
}'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default" //access key
);
const stopCrawlOperation = async () => {
try {
const response = await api.CrawlOperation.stop(
"<crawloperation-guid>",
{
Name: "My crawl operation [ASH]",
}
);
console.log(response, "Crawl operation stopped successfully");
} catch (err) {
console.log("Error stopping Crawl operation:", err);
}
};
stopCrawlOperation();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def stopCrawlOperation():
crawlOperation = crawler.CrawlOperation.stop("<crawloperation-guid>",Name="My crawl operation")
print(crawlOperation)
stopCrawlOperation()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
CrawlOperation crawlRequest = new CrawlOperationRequest
{
GUID = "<crawloperation-guid>",
Name = "My First Crawl Operation"
};
CrawlOperation response = await sdk.CrawlOperation.Stop(operation);
Response
Returns the updated crawl operation object with the stopped state:
{
"GUID": "9ced1af3-2e19-4cd6-81ce-a8c90a9ed32d",
"TenantGUID": "default",
"CrawlPlanGUID": "e9a7d61e-7cbd-46e4-9956-533e22008978",
"CrawlScheduleGUID": "oneminute",
"CrawlFilterGUID": "default",
"DataRepositoryGUID": "1a56c067-9e6d-4f7b-85bf-eb6b04aeda3f",
"MetadataRuleGUID": "example-metadata-rule",
"EmbeddingsRuleGUID": "crawler-embeddings-rule",
"ProcessingEndpoint": "http://nginx-orchestrator:8501/processor",
"CleanupEndpoint": "http://nginx-orchestrator:8501/processor/cleanup",
"Name": "My crawl operation [ASH]",
"ObjectsEnumerated": 123,
"BytesEnumerated": 61052490,
"ObjectsAdded": 0,
"BytesAdded": 0,
"ObjectsUpdated": 0,
"BytesUpdated": 0,
"ObjectsDeleted": 0,
"BytesDeleted": 0,
"ObjectsSuccess": 0,
"BytesSuccess": 0,
"ObjectsFailed": 2,
"BytesFailed": 16384,
"State": "Stopped",
"CreatedUtc": "2024-10-25T22:29:31.000000Z",
"StartUtc": "2024-10-25T22:29:31.000000Z",
"StartEnumerationUtc": "2024-10-25T22:29:31.000000Z",
"StartRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishEnumerationUtc": "2024-10-25T22:29:39.000000Z",
"FinishRetrievalUtc": "2024-10-25T22:29:39.000000Z",
"FinishUtc": "2024-10-25T22:29:39.000000Z",
"AdditionalData": "No objects detected during enumeration"
}
Delete Crawl Operation
Deletes a crawl operation object by GUID using DELETE /v1.0/tenants/[tenant-guid]/crawloperations/[crawloperation-guid]
. This operation permanently removes the crawl operation record and associated metadata from the system.
Request Parameters
- crawloperation-guid (string, Path, Required): GUID of the crawl operation object to delete
curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawloperations/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••' \
--data ''
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default" //access key
);
const deleteCrawlOperation = async () => {
try {
const response = await api.CrawlOperation.delete(
"<crawloperation-guid>"
);
console.log(response, "Crawl operation deleted successfully");
} catch (err) {
console.log("Error deleting Crawl operation:", err);
}
};
deleteCrawlOperation();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def deleteCrawlOperation():
crawlOperation = crawler.CrawlOperation.delete("<crawloperation-guid>")
print(crawlOperation)
deleteCrawlOperation()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
bool deleted = await sdk.CrawlOperation.Delete(Guid.Parse("<crawloperation-guid>"));
Response
Returns 200 No Content on successful deletion. No response body is returned.
Check Crawl Operation Existence
Verifies if a crawl operation object exists without retrieving its details using HEAD /v1.0/tenants/[tenant-guid]/crawloperations/[crawloperation-guid]
. This is an efficient way to check operation presence before performing operations.
Request Parameters
- crawloperation-guid (string, Path, Required): GUID of the crawl operation object to check
curl --location --head 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawloperations/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";
const api = new ViewCrawlerSdk(
"http://localhost:8000/", //endpoint
"default", //tenant Id
"default" //access key
);
const existsCrawlOperation = async () => {
try {
const response = await api.CrawlOperation.exists(
"<crawloperation-guid>"
);
console.log(response, "Crawl operation exists");
} catch (err) {
console.log("Error checking Crawl operation:", err);
}
};
existsCrawlOperation();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.CRAWLER: 8000},
)
def existsCrawlOperation():
crawlOperation = crawler.CrawlOperation.exists("<crawloperation-guid>")
print(crawlOperation)
existsCrawlOperation()
using View.Sdk;
using View.Crawler;
ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"),
"default",
"http://view.homedns.org:8000/");
bool exists = await sdk.CrawlOperation.Exists(Guid.Parse("<crawloperation-guid>"));
Response
- 200 No Content: Crawl operation exists
- 404 Not Found: Crawl operation does not exist
- No response body: Only HTTP status code is returned
Note: HEAD requests do not return a response body, only the HTTP status code indicating whether the crawl operation exists.
Best Practices
When managing crawl operations in the View platform, consider the following recommendations for optimal operation monitoring and management:
- Regular Monitoring: Monitor crawl operation status and performance metrics to ensure optimal execution
- Error Analysis: Review failed objects and error conditions to identify and resolve processing issues
- Performance Optimization: Use timing information and object counts to optimize crawl plan configurations
- Resource Management: Monitor processing endpoints and cleanup operations for efficient resource utilization
- Historical Analysis: Retain operation metadata for trend analysis and performance improvement
Next Steps
After successfully monitoring crawl operations, you can:
- Crawl Plans: Create and configure crawl plans to define automated data ingestion workflows
- Data Repositories: Set up data repositories to define source data locations for crawling
- Crawl Schedules: Configure crawl schedules to define when and how frequently crawling occurs
- Performance Tuning: Optimize crawl performance based on operation metrics and execution patterns
- Integration: Integrate crawl operations with other View platform services for comprehensive data processing