This page covers configuration and management of View crawl filter objects.

Object Overview

Crawl filters provide a reusable template that can be referenced by a crawl plan to define what content from a given data repository is crawled.

Endpoint, URL, and Supported Methods

Objects are managed via the crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawlfilters

Supported methods include: GET HEAD PUT DELETE

Structure

Objects have the following structure:

{
    "GUID": "defaultfilter",
    "TenantGUID": "default",
    "Name": "My filter",
    "MinimumSize": 1,
    "MaximumSize": 134217728,
    "IncludeSubdirectories": true,
    "ContentType": "*",
    "CreatedUtc": "2024-07-10T05:21:00.000000Z"
}

Properties:

  • GUID GUID globally unique identifier for the object
  • TenantGUID GUID globally unique identifier for the tenant
  • Name string name of the object
  • MinimumSize int the minimum size of objects considered candidate for retrieval
  • MaximumSize int the maximum size of objects considered candidate for retrieval
  • IncludeSubdirectories bool boolean indicating if subdirectories should be crawled
  • ContentType string content-types that should be considered candidates for retrieval. An asterisk * represents all content types
  • CreatedUtc datetime timestamp from creation, in UTC time

Create

To create, call PUT /v1.0/tenants/[tenant-guid]/crawlfilters with the following properties using the configuration server: Name MinimumSize MaximumSize IncludeSubdirectories ContentType Prefix Suffix

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlfilters' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My filter",
    "MinimumSize": 1,
    "MaximumSize": 134217728,
    "IncludeSubdirectories": true,
    "ContentType": "*"
}'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "default", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const createCrawlFilter = async () => {
  try {
    const response = await crawler.createCrawlFilter({
      Name: "My filter [ASH]",
      MinimumSize: 1,
      MaximumSize: 134217728,
      IncludeSubdirectories: true,
      ContentType: "*",
    });
    console.log(response, "Crawl filter created successfully");
  } catch (err) {
    console.log("Error creating Crawl filter:", err);
  }
};

createCrawlFilter();

Enumerate

Refer to the Enumeration page in REST API for details about the use of enumeration APIs.

Enumerate objects by using GET /v2.0/tenants/[tenant-guid]/crawlfilters. The resultant object will appear as:

{
    "Success": true,
    "Timestamp": {
        "Start": "2024-10-21T02:36:37.677751Z",
        "TotalMs": 23.58,
        "Messages": {}
    },
    "MaxResults": 10,
    "IterationsRequired": 1,
    "EndOfResults": true,
    "RecordsRemaining": 16,
    "Objects": [
        {
            "GUID": "example-crawlfilter",
            ... crawlfilter details ...
        },
        { ... }
    ],
    "ContinuationToken": "[continuation-token]"
}
curl --location 'http://view.homedns.org:8000/v2.0/tenants/00000000-0000-0000-0000-000000000000/crawlfilters/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "default", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const enumerateCrawlFilters = async () => {
  try {
    const response = await crawler.enumerateCrawlFilters();
    console.log(response, "Crawl filters fetched successfully");
  } catch (err) {
    console.log("Error fetching Crawl filters:", err);
  }
};

enumerateCrawlFilters();

Read

To read an object by GUID, call GET /v1.0/tenants/[tenant-guid]/crawlfilters/[crawlfilter-guid]. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

{
    "GUID": "default",
    "TenantGUID": "default",
    "Name": "My filter",
    "MinimumSize": 1,
    "MaximumSize": 134217728,
    "IncludeSubdirectories": true,
    "Prefix": "myprefix",
    "Suffix": ".pptx",
    "ContentType": "*",
    "CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "default", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);


const readCrawlFilter = async () => {
  try {
    const response = await crawler.retrieveCrawlFilter(
      "d3490b1a-3219-4691-9587-61e6380a9551"
    );
    console.log(response, "Crawl filter fetched successfully");
  } catch (err) {
    console.log("Error fetching Crawl filter:", err);
  }
};

readCrawlFilter();

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

Read all

o read all objects, call GET /v1.0/tenants/[tenant-guid]/crawlfilters/. If the object exists, it will be returned as an array of JSON object in the response body

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlfilters/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "default", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const readAllCrawlFilters = async () => {
  try {
    const response = await crawler.retrieveCrawlFilters();
    console.log(response, "All crawl filters fetched successfully");
  } catch (err) {
    console.log("Error fetching All crawl filters:", err);
  }
};

readAllCrawlFilters();

Update

To update an object by GUID, call PUT /v1.0/tenants/[tenant-guid]/crawlfilters/[crawlfilter -guid] with a fully populated object in the request body. The updated object will be returned to you.

Note: certain fields cannot be modified and will be preserved across updates.

Request body:

{
    "GUID": "default",
    "TenantGUID": "default",
    "Name": "My updated filter",
    "MinimumSize": 1,
    "MaximumSize": 134217728,
    "IncludeSubdirectories": true,
    "ContentType": "*"
}
curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlfilters/00000000-0000-0000-0000-000000000000' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My updated filter",
    "MinimumSize": 1,
    "MaximumSize": 134217728,
    "IncludeSubdirectories": true,
    "ContentType": "*"
}'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "default", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const updateCrawlFilter = async () => {
  try {
    const response = await crawler.updateCrawlFilter({
      GUID: "d3490b1a-3219-4691-9587-61e6380a9551",
      TenantGUID: "00000000-0000-0000-0000-000000000000",
      Name: "My filter [ASH] [UPDATED]",
      MinimumSize: 1,
      MaximumSize: 134217728,
      IncludeSubdirectories: true,
      ContentType: "*",
      CreatedUtc: "2025-04-01T10:47:14.382138Z",
    });
    console.log(response, "Crawl filter updated successfully");
  } catch (err) {
    console.log("Error updating Crawl filter:", err);
  }
};

updateCrawlFilter();

Response body:

{
    "GUID": "default",
    "TenantGUID": "default",
    "Name": "My updated filter",
    "MinimumSize": 1,
    "MaximumSize": 134217728,
    "IncludeSubdirectories": true,
    "ContentType": "*",
    "CreatedUtc": "2024-07-10T05:21:00.000000Z"
}

Delete

To delete an object by GUID, call DELETE /v1.0/tenants/[tenant-guid]/crawlfilters/[crawlfilter-guid].

curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlfilters/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••' 
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "default", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const deleteCrawlFilter = async () => {
  try {
    const response = await crawler.deleteCrawlFilter(
      "d3490b1a-3219-4691-9587-61e6380a9551"
    );
    console.log(response, "Crawl filter deleted successfully");
  } catch (err) {
    console.log("Error deleting Crawl filter:", err);
  }
};

deleteCrawlFilter();

Check Existence

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

curl --location --head 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/crawlfilters/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const crawler = new ViewCrawlerSdk(
  "default", //tenant Id
  "default", //access token
  "http://localhost:8000/" //endpoint
);

const existsCrawlFilter = async () => {
  try {
    const response = await crawler.existsCrawlFilter(
      "d3490b1a-3219-4691-9587-61e6380a9551"
    );
    console.log(response, "Crawl filter exists");
  } catch (err) {
    console.log("Error checking Crawl filter:", err);
  }
};

existsCrawlFilter();