This page covers configuration and management of View crawl filter objects.
Object Overview
Crawl filters provide a reusable template that can be referenced by a crawl plan to define what content from a given data repository is crawled.
Endpoint, URL, and Supported Methods
Objects are managed via the crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/crawlfilters
By default, the crawler server is accessible on port 8101
.
Supported methods include: GET
HEAD
PUT
DELETE
Structure
Objects have the following structure:
{
"GUID": "defaultfilter",
"TenantGUID": "default",
"Name": "My filter",
"MinimumSize": 1,
"MaximumSize": 134217728,
"IncludeSubdirectories": true,
"ContentType": "*",
"Prefix": "myprefix",
"Suffix": ".pptx",
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
Properties:
GUID
string
globally unique identifier for the objectTenantGUID
string
globally unique identifier for the tenantName
string
name of the objectMinimumSize
int
the minimum size of objects considered candidate for retrievalMaximumSize
int
the maximum size of objects considered candidate for retrievalIncludeSubdirectories
bool
boolean indicating if subdirectories should be crawledContentType
string
content-types that should be considered candidates for retrieval. An asterisk*
represents all content typesPrefix
string
object key prefix conditions that must be met before being considered candidate for retrievalSuffix
string
object key suffix conditions that must be met before being considered candidate for retrievalCreatedUtc
datetime
timestamp from creation, in UTC time
Create
To create, call PUT /v1.0/tenants/[tenant-guid]/crawlfilters
with the following properties using the configuration server: `Name
MinimumSize
MaximumSize
IncludeSubdirectories
ContentType
Prefix
Suffix
curl -X PUT http://localhost:8601/v1.0/tenants/[tenant-guid]/crawlschedules \
-H "Content-Type: application/json" \
-H "Authorization: Bearer [accesskey]" \
-d '
{
"Name": "My filter",
"MinimumSize": 1,
"MaximumSize": 134217728,
"IncludeSubdirectories": true,
"ContentType": "*",
"Prefix": "myprefix",
"Suffix": ".pptx"
}'
Enumerate
Refer to the Enumeration page in REST API for details about the use of enumeration APIs.
Enumerate objects by using GET /v2.0/tenants/[tenant-guid]/crawlfilters
. The resultant object will appear as:
{
"Success": true,
"Timestamp": {
"Start": "2024-10-21T02:36:37.677751Z",
"TotalMs": 23.58,
"Messages": {}
},
"MaxResults": 10,
"IterationsRequired": 1,
"EndOfResults": true,
"RecordsRemaining": 16,
"Objects": [
{
"GUID": "example-crawlfilter",
... crawlfilter details ...
},
{ ... }
],
"ContinuationToken": "[continuation-token]"
}
Read
To read an object by GUID, call GET /v1.0/tenants/[tenant-guid]/crawlfilters/[crawlfilter-guid]
. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound
error response.
{
"GUID": "default",
"TenantGUID": "default",
"Name": "My filter",
"MinimumSize": 1,
"MaximumSize": 134217728,
"IncludeSubdirectories": true,
"Prefix": "myprefix",
"Suffix": ".pptx",
"ContentType": "*",
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
Note: the HEAD
method can be used as an alternative to get to simply check the existence of the object. HEAD
requests return either a 200/OK
in the event the object exists, or a 404/Not Found
if not. No response body is returned with a HEAD
request.
Update
To update an object by GUID, call PUT /v1.0/tenants/[tenant-guid]/crawlfilters/[crawlfilter -guid]
with a fully populated object in the request body. The updated object will be returned to you.
Note: certain fields cannot be modified and will be preserved across updates.
Request body:
{
"GUID": "default",
"TenantGUID": "default",
"Name": "My updated filter",
"MinimumSize": 1,
"MaximumSize": 134217728,
"IncludeSubdirectories": true,
"Prefix": "myprefix",
"Suffix": ".pptx",
"ContentType": "*",
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
Response body:
{
"GUID": "default",
"TenantGUID": "default",
"Name": "My updated filter",
"MinimumSize": 1,
"MaximumSize": 134217728,
"IncludeSubdirectories": true,
"Prefix": "myprefix",
"Suffix": ".pptx",
"ContentType": "*",
"CreatedUtc": "2024-07-10T05:21:00.000000Z"
}
Delete
To delete an object by GUID, call DELETE /v1.0/tenants/[tenant-guid]/crawlfilters/[crawlfilter-guid]
.