Data Repositories

Object Overview

Data repositories define where source data resides that should be crawled for further processing by View.

Endpoint, URL, and Supported Methods

Objects are managed via the crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/datarepositories

Supported methods include: GET HEAD PUT DELETE

Structure

Data repository objects can have one of many structures depending on the type of repository. An array of fully populated examples are shown below.:

[
    {
        "GUID": "4ae4294d-d135-4b21-a75d-3df5e1c84d2b",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "Local filesystem",
        "RepositoryType": "File",
        "IncludeSubdirectories": true,
        "DiskDirectory": "./files/",
        "CreatedUtc": "2024-10-22T13:57:54.000000Z"
    },
    {
        "GUID": "876c139e-e57f-44ed-b2e6-4dcb5d3677e6",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "NFS file server",
        "RepositoryType": "NFS",
        "IncludeSubdirectories": true,
        "NfsHostname": "nfsserver",
        "NfsUserId": 0,
        "NfsGroupId": 0,
        "NfsShareName": "export1",
        "NfsVersion": "V3",
        "CreatedUtc": "2024-10-22T13:58:18.000000Z"
    },
    {
        "GUID": "8fface0d-9514-4cf6-b260-827dc1c180f4",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "CIFS file server",
        "RepositoryType": "CIFS",
        "IncludeSubdirectories": true,
        "CifsHostname": "windowshost",
        "CifsUsername": "[email protected]",
        "CifsPassword": "password",
        "CifsShareName": "share3",
        "CreatedUtc": "2024-10-22T13:58:43.000000Z"
    },
    {
        "GUID": "e37d0a94-e7e3-447c-9eab-489d1baaad49",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "S3 compatible object store",
        "RepositoryType": "AmazonS3",
        "IncludeSubdirectories": true,
        "S3EndpointUrl": "http://s3storage.company.com/",
        "S3BaseUrl": "http://s3storage.company.com/{bucket}/{key}/",
        "S3AccessKey": "myaccesskey",
        "S3SecretKey": "mysecretkey",
        "S3BucketName": "bucket1",
        "S3Region": "us-west-1",
        "CreatedUtc": "2024-10-22T14:02:14.000000Z"
    },
    {
        "GUID": "c28df7e3-28c2-40a6-8203-c3ac433992c1",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "S3 bucket",
        "RepositoryType": "AmazonS3",
        "IncludeSubdirectories": true,
        "S3EndpointUrl": "https://mybucket.us-west-1.s3.amazonaws.com/",
        "S3BaseUrl": "https://{bucket}.us-west-1.s3.amazonaws.com/{key}/",
        "S3AccessKey": "myaccesskey",
        "S3SecretKey": "mysecretkey",
        "S3BucketName": "mybucket",
        "S3Region": "us-west-1",
        "CreatedUtc": "2024-10-22T14:03:13.000000Z"
    },
    {
        "GUID": "21d149e2-f405-41fe-a20a-e9a3d6073783",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "Azure BLOB storage",
        "RepositoryType": "AzureBlob",
        "IncludeSubdirectories": true,
        "AzureEndpointUrl": "https://myblobcontainer.blob.core.windows.net/",
        "AzureAccountName": "myazureaccount",
        "AzureContainerName": "myblobcontainer",
        "AzureAccessKey": "myaccesskey",
        "CreatedUtc": "2024-10-22T14:04:08.000000Z"
    }
]

Properties (all repository types):

GUID GUID globally unique identifier for the object
TenantGUID GUID globally unique identifier for the tenant
OwnerGUID GUID GUID of the owner that created the object
RepositoryType enum the type of repository, valid values are File NFS CIFS AmazonS3 AzureBLOB
Name string name of the object
IncludeSubdirectories bool indicates whether or not subdirectories should be crawled
CreatedUtc datetime timestamp from creation, in UTC time

Additional properties (Local filesystem repositories)

DiskDirectory string for file repositories, the directory to crawl

Additional properties (NFS repositories)

NfsHostname string for NFS repositories, the hostname of the server
NfsUserId int the NFS user ID
NfsGroupId int the NFS group ID
NfsShareName string the full name of the export on the NFS server
NfsVersion enum the NFS version, valid values are V2 V3 V4

Additional properties (CIFS repositories)

CifsHostname string for CIFS repositories, the hostname of the server
CifsUsername string for CIFS repositories, the username
CifsPassword string for CIFS repositories, the password
CifsShareName string for CIFS repositories, the share name

Additional properties (S3 repositories)

S3EndpointUrl string for S3 repositories, the endpoint URL
S3BaseUrl string for S3 repositories, the base URL format
- For virtual-hosted deployments (including S3 itself), base URL should be of the form [http||https]://{bucket}.[hostname]:[port]/{key}
- For path-style deployments, the base URL should be of the form [http||https]://[hostname]:[port]/{bucket}/{key}
S3AccessKey string for S3 repositories, the access key
S3SecretKey string for S3 repositories, the secret key
S3BucketName string for S3 repositories, the bucket name
S3Region string for S3 repositories, the region string, i.e. us-west-1

Additional properties (Azure BLOB repositories)

AzureEndpointUrl string for Azure BLOB repositories, the endpoint URL
AzureAccountName string for Azure BLOB repositories, the account name
AzureContainerName string for Azure BLOB repositories, the container name
AzureAccessKey string for Azure BLOB repositories, the access key

Create

Create CIFS Data Repository

To create, call PUT /v1.0/tenants/[tenant-guid]/datarepositories with the properties of the data repository as defined above, using the crawler server.

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My CIFS repository",
    "RepositoryType": "CIFS",
    "CifsHostname": "localhost",
    "CifsUsername": "domain\\username",
    "CifsPassword": "password",
    "CifsShareName": "share",
    "CifsIncludeSubdirectories": true
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
    "Name": "My CIFS repository",
    "RepositoryType": "CIFS",
    "CifsHostname": "localhost",
    "CifsUsername": "domain\\username",
    "CifsPassword": "******rd",
    "CifsShareName": "share",
    "CifsIncludeSubdirectories": true
});
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My CIFS repository",
        RepositoryType="CIFS",
        CifsHostname="localhost",
        CifsUsername="domain\\username",
        CifsPassword="******rd",
        CifsShareName="share",
        CifsIncludeSubdirectories=True
    )
    print(dataRepository)

createDataRepository()

Create Disk Data Repository

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My file repository",
    "RepositoryType": "File",
    "BaseUrl": "./files/",
    "DiskDirectory": "./files/",
    "DiskIncludeSubdirectories": true
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
      Name: "My file repository [ASH]",
      RepositoryType: "File",
      BaseUrl: "./files/",
      DiskDirectory: "./files/",
      DiskIncludeSubdirectories: true,
    });
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My file repository",
        RepositoryType="File",
        DiskDirectory="./files/",
        DiskIncludeSubdirectories=True
    )
    print(dataRepository)

createDataRepository()

Create S3 Data Repository

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My S3 repository",
    "RepositoryType": "AmazonS3",
    "S3EndpointUrl": null,
    "S3BaseUrl": null,
    "S3AccessKey": "accesskey",
    "S3SecretKey": "secretkey",
    "S3BucketName": "bucket",
    "S3Region": "us-west-1"
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
    "Name": "My S3 repository",
    "RepositoryType": "AmazonS3",
    "S3EndpointUrl": null,
    "S3BaseUrl": null,
    "S3AccessKey": "*******ey",
    "S3SecretKey": "*******ey",
    "S3BucketName": "bucket",
    "S3Region": "us-west-1"
	});
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My S3 repository",
        RepositoryType="AmazonS3",
        S3EndpointUrl=None,
        S3BaseUrl="https://{bucket}.us-west-1.s3.amazonaws.com/{key}",
        S3AccessKey="*******ey",
        S3SecretKey="*******ey",
        S3BucketName="bucket",
        S3Region="us-west-1"
    )
    print(dataRepository)

createDataRepository()

Create Azure Blob Data Repository

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My Azure BLOB repository",
    "RepositoryType": "AzureBlob",
    "AzureEndpointUrl": "https://accountname.blob.core.windows.net",
    "AzureAccountName": "accountname",
    "AzureContainerName": "containername",
    "AzureAccessKey": "accesskey"
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
    "Name": "My Azure BLOB repository",
    "RepositoryType": "AzureBlob",
    "AzureEndpointUrl": "https://accountname.blob.core.windows.net",
    "AzureAccountName": "accountname",
    "AzureContainerName": "containername",
    "AzureAccessKey": "*******ey"
	});
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "00000000-0000-0000-0000-000000000000")

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My Azure BLOB repository",
        RepositoryType="AzureBlob",
        AzureEndpointUrl="https://accountname.blob.core.windows.net",
        AzureAccountName="accountname",
        AzureContainerName="containername",
        AzureAccessKey="*******ey"
    )
    print(dataRepository)

createDataRepository()

Create NFS Data Repository

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My NFS repository",
    "RepositoryType": "NFS",
    "NfsHostname": "localhost",
    "NfsUserId": 0,
    "NfsGroupId": 0,
    "NfsShareName": "share",
    "NfsIncludeSubdirectories": true
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
      "Name": "My NFS repository",
      "RepositoryType": "NFS",
      "NfsHostname": "localhost",
      "NfsUserId": 0,
      "NfsGroupId": 0,
      "NfsShareName": "share",
      "NfsIncludeSubdirectories": true
  });
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My NFS repository",
        RepositoryType="NFS",
        NfsVersion="V3",
        NfsHostname="192.168.86.248",
        NfsUserId=0,
        NfsGroupId=0,
        NfsShareName="/nfs",
        NfsIncludeSubdirectories=True
    )
    print(dataRepository)

createDataRepository()

Update

To update an object by GUID, call PUT /v1.0/tenants/[tenant-guid]/datarepositories/[datarepository-guid]

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/00000000-0000-0000-0000-000000000000' \
--header 'content-type: application/json' \
--header 'Authorization: Bearer default' \
--data '{
    "Name": "My updated file repository",
    "RepositoryType": "File",
    "IncludeSubdirectories": true,
    "DiskDirectory": "./files/"
}'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const updateDataRepository = async () => {
  try {
    const response = await api.DataRepository.update({
      GUID: "<datarepository-guid>",
      TenantGUID: "<tenant-guid>",
      OwnerGUID: "<owner-guid>",
      Name: "My S3 repository [UPDATED]",
      RepositoryType: "AmazonS3",
      IncludeSubdirectories: true,
      S3BaseUrl: "https://{bucket}.us-west-1.s3.amazonaws.com/{key}/",
      S3AccessKey: "*******ey",
      S3SecretKey: "*******ey",
      S3BucketName: "bucket",
      S3Region: "us-west-1",
      CreatedUtc: "2025-05-01T11:46:48.671117Z",
    });
    console.log(response, "Data repository updated successfully");
  } catch (err) {
    console.log("Error updating Data repository:", err);
  }
};
updateDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def updateDataRepository():
    dataRepository = crawler.DataRepository.update(
        "<datarepository-guid>",
        Name="My NFS repository [updated]",
        RepositoryType="NFS",
        NfsVersion="V3",
        NfsHostname="192.168.86.248",
        NfsUserId=0,
        NfsGroupId=0,
        NfsShareName="/nfs",
        NfsIncludeSubdirectories=True
    )
    print(dataRepository)

updateDataRepository()

Enumerate

Refer to the Enumeration page in REST API for details about the use of enumeration APIs.

Enumerate objects by using GET /v2.0/tenants/[tenant-guid]/datarepositories. The resultant object will appear as:

{
    "Success": true,
    "Timestamp": {
        "Start": "2024-10-21T02:36:37.677751Z",
        "TotalMs": 23.58,
        "Messages": {}
    },
    "MaxResults": 10,
    "IterationsRequired": 1,
    "EndOfResults": true,
    "RecordsRemaining": 16,
    "Objects": [
        {
            "GUID": "example-datarepository",
            ... datarepository details ...
        },
        { ... }
    ],
    "ContinuationToken": "[continuation-token]"
}

curl --location 'http://view.homedns.org:8000/v2.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const enumerateDataRepositories = async () => {
  try {
    const response = await api.DataRepository.enumerate();
    console.log(response, "Data repositories fetched successfully");
  } catch (err) {
    console.log("Error fetching Data repositories:", err);
  }
};

enumerateDataRepositories();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def enumerateDataRepositories():
    dataRepositories = crawler.DataRepository.enumerate()
    print(dataRepositories)

enumerateDataRepositories()

Read

To read an object by GUID, call GET /v1.0/tenants/[tenant-guid]/datarepositories/[datarepository-guid]. If the object exists, it will be returned as a JSON object in the response body. If it does not exist, a 404 will be returned with a NotFound error response.

{
    "GUID": "8fface0d-9514-4cf6-b260-827dc1c180f4",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "CIFS file server",
    "RepositoryType": "CIFS",
    "IncludeSubdirectories": true,
    "CifsHostname": "windowshost",
    "CifsUsername": "[email protected]",
    "CifsPassword": "password",
    "CifsShareName": "share3",
    "CreatedUtc": "2024-10-22T13:58:43.000000Z"
}

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const readDataRepository = async () => {
  try {
    const response = await api.DataRepository.read(
      "<datarepository-guid>"
    );
    console.log(response, "Data repository fetched successfully");
  } catch (err) {
    console.log("Error fetching Data repository:", err);
  }
};

readDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readDataRepository():
    dataRepository = crawler.DataRepository.retrieve("<datarepository-guid>")
    print(dataRepository)

readDataRepository()

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

Read all

o read all objects, call GET /v1.0/tenants/[tenant-guid]/datarepositories/. If the object exists, it will be returned as an array of JSON object in the response body

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const readAllDataRepositories = async () => {
  try {
    const response = await api.DataRepository.readAll();
    console.log(response, "All data repositories fetched successfully");
  } catch (err) {
    console.log("Error fetching All data repositories:", err);
  }
};

readAllDataRepositories();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def readAllDataRepositories():
    dataRepositories = crawler.DataRepository.retrieve_all()
    print(dataRepositories)

readAllDataRepositories()

Delete

To delete an object by GUID, call DELETE /v1.0/tenants/[tenant-guid]/datarepositories/[datarepository-guid]

curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••' \

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const deleteDataRepository = async () => {
  try {
    const response = await api.DataRepository.delete(
      "<datarepository-guid>"
    );
    console.log(response, "Data repository deleted successfully");
  } catch (err) {
    console.log("Error deleting Data repository:", err);
  }
};

deleteDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def deleteDataRepository():
    dataRepository = crawler.DataRepository.delete("<datarepository-guid>")
    print(dataRepository)

deleteDataRepository()

Check Existence

curl --location --head 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'

import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const existsDataRepository = async () => {
  try {
    const response = await api.DataRepository.exists(
      "<datarepository-guid>"
    );
    console.log(response, "Data repository exists");
  } catch (err) {
    console.log("Error checking Data repository:", err);
  }
};

existsDataRepository();

import view_sdk
from view_sdk import crawler

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def existsDataRepository():
    dataRepository = crawler.DataRepository.exists("<datarepository-guid>")
    print(dataRepository)

existsDataRepository()