Data Repositories

Comprehensive guide to View's data repository management system, including multi-protocol support, cloud storage integration, network file systems, and local storage configuration for efficient data ingestion and content discovery.

Overview

The View Data Repository management system provides comprehensive configuration for accessing and ingesting data from diverse storage systems. Data repositories serve as connection definitions that enable the View platform to discover, access, and process data from various sources including local filesystems, network file systems, and cloud storage platforms.

Key Features

  • Multi-Protocol Support: Support for File, NFS, CIFS, Amazon S3, and Azure Blob storage systems
  • Cloud Integration: Seamless integration with major cloud storage providers (AWS S3, Azure Blob)
  • Network File Systems: Support for NFS and CIFS network file system protocols
  • Local Storage: Direct access to local filesystem directories and files
  • Authentication Management: Secure credential handling for various storage systems
  • Directory Traversal: Configurable subdirectory crawling and hierarchical data discovery
  • Flexible Configuration: Support for various storage configurations and access patterns
  • Integration Support: Seamless integration with crawl plans, filters, and processing workflows

Supported Repository Types

  • File: Local filesystem directories and files
  • NFS: Network File System (NFS) shares and exports
  • CIFS: Common Internet File System (CIFS/SMB) shares
  • AmazonS3: Amazon S3 buckets and compatible object storage
  • AzureBlob: Azure Blob Storage containers and objects

Supported Operations

  • Create: Create new data repository configurations for various storage types
  • Read: Retrieve individual data repository configurations and metadata
  • Enumerate: List all data repositories with pagination support
  • Update: Modify existing data repository configurations and settings
  • Delete: Remove data repository configurations and associated connections
  • Existence Check: Verify data repository presence without retrieving details

API Endpoints

Data repositories are managed via the Crawler server API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/datarepositories

Supported HTTP Methods: GET, HEAD, PUT, DELETE

Important: All data repository operations require appropriate authentication tokens.

Data Repository Object Structure

Data repository objects contain comprehensive configuration for accessing various storage systems. The structure varies based on the repository type. Here are complete examples for all supported repository types:

[
    {
        "GUID": "4ae4294d-d135-4b21-a75d-3df5e1c84d2b",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "Local filesystem",
        "RepositoryType": "File",
        "IncludeSubdirectories": true,
        "DiskDirectory": "./files/",
        "CreatedUtc": "2024-10-22T13:57:54.000000Z"
    },
    {
        "GUID": "876c139e-e57f-44ed-b2e6-4dcb5d3677e6",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "NFS file server",
        "RepositoryType": "NFS",
        "IncludeSubdirectories": true,
        "NfsHostname": "nfsserver",
        "NfsUserId": 0,
        "NfsGroupId": 0,
        "NfsShareName": "export1",
        "NfsVersion": "V3",
        "CreatedUtc": "2024-10-22T13:58:18.000000Z"
    },
    {
        "GUID": "8fface0d-9514-4cf6-b260-827dc1c180f4",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "CIFS file server",
        "RepositoryType": "CIFS",
        "IncludeSubdirectories": true,
        "CifsHostname": "windowshost",
        "CifsUsername": "[email protected]",
        "CifsPassword": "password",
        "CifsShareName": "share3",
        "CreatedUtc": "2024-10-22T13:58:43.000000Z"
    },
    {
        "GUID": "e37d0a94-e7e3-447c-9eab-489d1baaad49",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "S3 compatible object store",
        "RepositoryType": "AmazonS3",
        "IncludeSubdirectories": true,
        "S3EndpointUrl": "http://s3storage.company.com/",
        "S3BaseUrl": "http://s3storage.company.com/{bucket}/{key}/",
        "S3AccessKey": "myaccesskey",
        "S3SecretKey": "mysecretkey",
        "S3BucketName": "bucket1",
        "S3Region": "us-west-1",
        "CreatedUtc": "2024-10-22T14:02:14.000000Z"
    },
    {
        "GUID": "c28df7e3-28c2-40a6-8203-c3ac433992c1",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "S3 bucket",
        "RepositoryType": "AmazonS3",
        "IncludeSubdirectories": true,
        "S3EndpointUrl": "https://mybucket.us-west-1.s3.amazonaws.com/",
        "S3BaseUrl": "https://{bucket}.us-west-1.s3.amazonaws.com/{key}/",
        "S3AccessKey": "myaccesskey",
        "S3SecretKey": "mysecretkey",
        "S3BucketName": "mybucket",
        "S3Region": "us-west-1",
        "CreatedUtc": "2024-10-22T14:03:13.000000Z"
    },
    {
        "GUID": "21d149e2-f405-41fe-a20a-e9a3d6073783",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "Azure BLOB storage",
        "RepositoryType": "AzureBlob",
        "IncludeSubdirectories": true,
        "AzureEndpointUrl": "https://myblobcontainer.blob.core.windows.net/",
        "AzureAccountName": "myazureaccount",
        "AzureContainerName": "myblobcontainer",
        "AzureAccessKey": "myaccesskey",
        "CreatedUtc": "2024-10-22T14:04:08.000000Z"
    }
]

Field Descriptions

Common Fields (All Repository Types)

  • GUID (GUID): Globally unique identifier for the data repository object
  • TenantGUID (GUID): Globally unique identifier for the tenant
  • OwnerGUID (GUID): GUID of the user who created the repository
  • RepositoryType (enum): Type of repository (File, NFS, CIFS, AmazonS3, AzureBlob)
  • Name (string): Display name for the data repository
  • IncludeSubdirectories (boolean): Whether to crawl subdirectories recursively
  • CreatedUtc (datetime): UTC timestamp when the repository was created

File Repository Fields

  • DiskDirectory (string): Local directory path to crawl

NFS Repository Fields

  • NfsHostname (string): Hostname or IP address of the NFS server
  • NfsUserId (integer): NFS user ID for authentication
  • NfsGroupId (integer): NFS group ID for authentication
  • NfsShareName (string): Full path of the NFS export/share
  • NfsVersion (enum): NFS protocol version (V2, V3, V4)

CIFS Repository Fields

  • CifsHostname (string): Hostname or IP address of the CIFS server
  • CifsUsername (string): Username for CIFS authentication
  • CifsPassword (string): Password for CIFS authentication
  • CifsShareName (string): Name of the CIFS share

S3 Repository Fields

  • S3EndpointUrl (string): S3-compatible endpoint URL
  • S3BaseUrl (string): Base URL format for object access
    • Virtual-hosted: https://{bucket}.{hostname}/{key}
    • Path-style: https://{hostname}/{bucket}/{key}
  • S3AccessKey (string): S3 access key for authentication
  • S3SecretKey (string): S3 secret key for authentication
  • S3BucketName (string): Name of the S3 bucket
  • S3Region (string): AWS region (e.g., "us-west-1")

Azure Blob Repository Fields

  • AzureEndpointUrl (string): Azure Blob Storage endpoint URL
  • AzureAccountName (string): Azure storage account name
  • AzureContainerName (string): Name of the blob container
  • AzureAccessKey (string): Azure storage access key

Important Notes

  • Repository Types: Each repository type has specific configuration requirements and authentication methods
  • Network Access: Ensure proper network connectivity and firewall configuration for remote repositories
  • Authentication: Secure credential management is essential for accessing protected storage systems
  • Directory Traversal: Configure subdirectory crawling based on your data organization and processing needs

Create Data Repository

Creates a new data repository configuration using PUT /v1.0/tenants/[tenant-guid]/datarepositories. This endpoint supports multiple repository types with specific configuration requirements for each storage system.

Create CIFS Data Repository

Creates a CIFS (Common Internet File System) data repository for accessing Windows shares and SMB/CIFS file systems.

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My CIFS repository",
    "RepositoryType": "CIFS",
    "CifsHostname": "localhost",
    "CifsUsername": "domain\\username",
    "CifsPassword": "password",
    "CifsShareName": "share",
    "CifsIncludeSubdirectories": true
}'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
    "Name": "My CIFS repository",
    "RepositoryType": "CIFS",
    "CifsHostname": "localhost",
    "CifsUsername": "domain\\username",
    "CifsPassword": "******rd",
    "CifsShareName": "share",
    "CifsIncludeSubdirectories": true
});
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My CIFS repository",
        RepositoryType="CIFS",
        CifsHostname="localhost",
        CifsUsername="domain\\username",
        CifsPassword="******rd",
        CifsShareName="share",
        CifsIncludeSubdirectories=True
    )
    print(dataRepository)

createDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");

DataRepository repository = new DataRepository
{
   Name = "My CIFS repository",
   RepositoryType = "CIFS",
   CifsHostname = "localhost",
   CifsUsername = @"domain\username",
   CifsPassword = "******rd",
   CifsShareName = "share",
   CifsIncludeSubdirectories = true
};

DataRepository createdRepository = await sdk.DataRepository.Create(repository);

Response

Returns the created CIFS data repository object with all configuration details:

{
    "GUID": "8fface0d-9514-4cf6-b260-827dc1c180f4",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "My CIFS repository",
    "RepositoryType": "CIFS",
    "IncludeSubdirectories": true,
    "CifsHostname": "localhost",
    "CifsUsername": "domain\\username",
    "CifsPassword": "***word",
    "CifsShareName": "share",
    "CreatedUtc": "2024-10-22T13:58:43.000000Z"
}

Create File Data Repository

Creates a local filesystem data repository for accessing files and directories on the local system.

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My file repository",
    "RepositoryType": "File",
    "BaseUrl": "./files/",
    "DiskDirectory": "./files/",
    "DiskIncludeSubdirectories": true
}'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
      Name: "My file repository [ASH]",
      RepositoryType: "File",
      BaseUrl: "./files/",
      DiskDirectory: "./files/",
      DiskIncludeSubdirectories: true,
    });
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def createFileDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My file repository",
        RepositoryType="File",
        DiskDirectory="./files/",
        DiskIncludeSubdirectories=True
    )
    print(dataRepository)

createFileDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");

DataRepository repository = new DataRepository
{
   Name="My file repository",
   RepositoryType="File",
   DiskDirectory="./files/",
   DiskIncludeSubdirectories=True
};

DataRepository createdRepository = await sdk.DataRepository.Create(repository);

Response

Returns the created file data repository object:

{
    "GUID": "4ae4294d-d135-4b21-a75d-3df5e1c84d2b",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "My file repository",
    "RepositoryType": "File",
    "IncludeSubdirectories": true,
    "DiskDirectory": "./files/",
    "CreatedUtc": "2024-10-22T13:57:54.000000Z"
}

Create S3 Data Repository

Creates an Amazon S3 or S3-compatible data repository for accessing cloud object storage.

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My S3 repository",
    "RepositoryType": "AmazonS3",
    "S3EndpointUrl": null,
    "S3BaseUrl": null,
    "S3AccessKey": "accesskey",
    "S3SecretKey": "secretkey",
    "S3BucketName": "bucket",
    "S3Region": "us-west-1"
}'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
    "Name": "My S3 repository",
    "RepositoryType": "AmazonS3",
    "S3EndpointUrl": null,
    "S3BaseUrl": null,
    "S3AccessKey": "*******ey",
    "S3SecretKey": "*******ey",
    "S3BucketName": "bucket",
    "S3Region": "us-west-1"
	});
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My S3 repository",
        RepositoryType="AmazonS3",
        S3EndpointUrl=None,
        S3BaseUrl="https://{bucket}.us-west-1.s3.amazonaws.com/{key}",
        S3AccessKey="*******ey",
        S3SecretKey="*******ey",
        S3BucketName="bucket",
        S3Region="us-west-1"
    )
    print(dataRepository)

createDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");

DataRepository repository = new DataRepository
{
   Name="My S3 repository",
   RepositoryType="AmazonS3",
   S3EndpointUrl=None,
   S3BaseUrl="https://{bucket}.us-west-1.s3.amazonaws.com/{key}",
   S3AccessKey="*******ey",
   S3SecretKey="*******ey",
   S3BucketName="bucket",
   S3Region="us-west-1"
};

DataRepository createdRepository = await sdk.DataRepository.Create(repository);

Response

Returns the created S3 data repository object:

{
    "GUID": "e37d0a94-e7e3-447c-9eab-489d1baaad49",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "My S3 repository",
    "RepositoryType": "AmazonS3",
    "IncludeSubdirectories": true,
    "S3EndpointUrl": null,
    "S3BaseUrl": "https://{bucket}.us-west-1.s3.amazonaws.com/{key}",
    "S3AccessKey": "***key",
    "S3SecretKey": "***key",
    "S3BucketName": "bucket",
    "S3Region": "us-west-1",
    "CreatedUtc": "2024-10-22T14:02:14.000000Z"
}

Create Azure Blob Data Repository

Creates an Azure Blob Storage data repository for accessing Microsoft Azure cloud storage.

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My Azure BLOB repository",
    "RepositoryType": "AzureBlob",
    "AzureEndpointUrl": "https://accountname.blob.core.windows.net",
    "AzureAccountName": "accountname",
    "AzureContainerName": "containername",
    "AzureAccessKey": "accesskey"
}'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
    "Name": "My Azure BLOB repository",
    "RepositoryType": "AzureBlob",
    "AzureEndpointUrl": "https://accountname.blob.core.windows.net",
    "AzureAccountName": "accountname",
    "AzureContainerName": "containername",
    "AzureAccessKey": "*******ey"
	});
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My Azure BLOB repository",
        RepositoryType="AzureBlob",
        AzureEndpointUrl="https://accountname.blob.core.windows.net",
        AzureAccountName="accountname",
        AzureContainerName="containername",
        AzureAccessKey="*******ey"
    )
    print(dataRepository)

createDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");

DataRepository repository = new DataRepository
{
   Name="My Azure BLOB repository",
   RepositoryType="AzureBlob",
   AzureEndpointUrl="https://accountname.blob.core.windows.net",
   AzureAccountName="accountname",
   AzureContainerName="containername",
   AzureAccessKey="*******ey"
};

DataRepository createdRepository = await sdk.DataRepository.Create(repository);

Response

Returns the created Azure Blob data repository object:

{
    "GUID": "21d149e2-f405-41fe-a20a-e9a3d6073783",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "My Azure BLOB repository",
    "RepositoryType": "AzureBlob",
    "IncludeSubdirectories": true,
    "AzureEndpointUrl": "https://accountname.blob.core.windows.net",
    "AzureAccountName": "accountname",
    "AzureContainerName": "containername",
    "AzureAccessKey": "***key",
    "CreatedUtc": "2024-10-22T14:04:08.000000Z"
}

Create NFS Data Repository

Creates an NFS (Network File System) data repository for accessing Unix/Linux network file shares.

curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories' \
--header 'content-type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "Name": "My NFS repository",
    "RepositoryType": "NFS",
    "NfsHostname": "localhost",
    "NfsUserId": 0,
    "NfsGroupId": 0,
    "NfsShareName": "share",
    "NfsIncludeSubdirectories": true
}'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const createDataRepository = async () => {
  try {
    const response = await api.DataRepository.create({
      "Name": "My NFS repository",
      "RepositoryType": "NFS",
      "NfsHostname": "localhost",
      "NfsUserId": 0,
      "NfsGroupId": 0,
      "NfsShareName": "share",
      "NfsIncludeSubdirectories": true
  });
    console.log(response, "Data repository created successfully");
  } catch (err) {
    console.log("Error creating Data repository:", err);
  }
};

createDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def createDataRepository():
    dataRepository = crawler.DataRepository.create(
        Name="My NFS repository",
        RepositoryType="NFS",
        NfsVersion="V3",
        NfsHostname="192.168.86.248",
        NfsUserId=0,
        NfsGroupId=0,
        NfsShareName="/nfs",
        NfsIncludeSubdirectories=True
    )
    print(dataRepository)

createDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");

DataRepository repository = new DataRepository
{
   Name="My NFS repository",
   RepositoryType="NFS",
   NfsVersion="V3",
   NfsHostname="192.168.86.248",
   NfsUserId=0,
   NfsGroupId=0,
   NfsShareName="/nfs",
   NfsIncludeSubdirectories=True
};

DataRepository createdRepository = await sdk.DataRepository.Create(repository);

Response

Returns the created NFS data repository object:

{
    "GUID": "876c139e-e57f-44ed-b2e6-4dcb5d3677e6",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "My NFS repository",
    "RepositoryType": "NFS",
    "IncludeSubdirectories": true,
    "NfsHostname": "192.168.86.248",
    "NfsUserId": 0,
    "NfsGroupId": 0,
    "NfsShareName": "/nfs",
    "NfsVersion": "V3",
    "CreatedUtc": "2024-10-22T13:58:18.000000Z"
}

Update Data Repository

Updates an existing data repository configuration using PUT /v1.0/tenants/[tenant-guid]/datarepositories/[datarepository-guid]. This endpoint allows you to modify repository parameters while preserving certain immutable fields.

Request Parameters

  • datarepository-guid (string, Path, Required): GUID of the data repository object to update

Updateable Fields

All configuration parameters can be updated except for:

  • GUID: Immutable identifier
  • TenantGUID: Immutable tenant association
  • OwnerGUID: Immutable owner association
  • CreatedUtc: Immutable creation timestamp

Important Notes

  • Field Preservation: Certain fields cannot be modified and will be preserved across updates
  • Complete Object: Provide a fully populated object in the request body
  • Configuration Validation: All updated parameters will be validated before applying changes
  • Connection Impact: Consider the impact of repository changes on existing crawl plans
curl --location --request PUT 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/00000000-0000-0000-0000-000000000000' \
--header 'content-type: application/json' \
--header 'Authorization: Bearer default' \
--data '{
    "Name": "My updated file repository",
    "RepositoryType": "File",
    "IncludeSubdirectories": true,
    "DiskDirectory": "./files/"
}'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const updateDataRepository = async () => {
  try {
    const response = await api.DataRepository.update({
      GUID: "<datarepository-guid>",
      TenantGUID: "<tenant-guid>",
      OwnerGUID: "<owner-guid>",
      Name: "My S3 repository [UPDATED]",
      RepositoryType: "AmazonS3",
      IncludeSubdirectories: true,
      S3BaseUrl: "https://{bucket}.us-west-1.s3.amazonaws.com/{key}/",
      S3AccessKey: "*******ey",
      S3SecretKey: "*******ey",
      S3BucketName: "bucket",
      S3Region: "us-west-1",
      CreatedUtc: "2025-05-01T11:46:48.671117Z",
    });
    console.log(response, "Data repository updated successfully");
  } catch (err) {
    console.log("Error updating Data repository:", err);
  }
};
updateDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def updateDataRepository():
    dataRepository = crawler.DataRepository.update(
        "<datarepository-guid>",
        Name="My NFS repository [updated]",
        RepositoryType="NFS",
        NfsVersion="V3",
        NfsHostname="192.168.86.248",
        NfsUserId=0,
        NfsGroupId=0,
        NfsShareName="/nfs",
        NfsIncludeSubdirectories=True
    )
    print(dataRepository)

updateDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");

DataRepository repository = new DataRepository
{
   GUID = "<datarepository-guid>",
   TenantGUID = "<tenant-guid>",
   OwnerGUID = "<owner-guid>",
   Name: "My S3 repository [UPDATED]",
   RepositoryType: "AmazonS3",
   IncludeSubdirectories: true,
   S3BaseUrl: "https://{bucket}.us-west-1.s3.amazonaws.com/{key}/",
   S3AccessKey: "*******ey",
   S3SecretKey: "*******ey",
   S3BucketName: "bucket",
   S3Region: "us-west-1",
};

DataRepository createdRepository = await sdk.DataRepository.update(repository);

Response

Returns the updated data repository object with all configuration details:

{
    "GUID": "8fface0d-9514-4cf6-b260-827dc1c180f4",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "My S3 repository [UPDATED]",
    "RepositoryType": "AmazonS3",
    "IncludeSubdirectories": true,
    "S3BaseUrl": "https://{bucket}.us-west-1.s3.amazonaws.com/{key}/",
    "S3AccessKey": "***key",
    "S3SecretKey": "***key",
    "S3BucketName": "bucket",
    "S3Region": "us-west-1",
    "CreatedUtc": "2024-10-22T13:58:43.000000Z"
}

Enumerate Data Repositories

Retrieves a paginated list of all data repository objects in the tenant using GET /v2.0/tenants/[tenant-guid]/datarepositories. This endpoint provides comprehensive enumeration with pagination support for managing multiple data repository configurations.

Request Parameters

No additional parameters required beyond authentication.

curl --location 'http://view.homedns.org:8000/v2.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const enumerateDataRepositories = async () => {
  try {
    const response = await api.DataRepository.enumerate();
    console.log(response, "Data repositories fetched successfully");
  } catch (err) {
    console.log("Error fetching Data repositories:", err);
  }
};

enumerateDataRepositories();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def enumerateDataRepositories():
    dataRepositories = crawler.DataRepository.enumerate()
    print(dataRepositories)

enumerateDataRepositories()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
EnumerationResult<DataRepository> response = await sdk.DataRepository.Enumerate();

Response

Returns a paginated list of data repository objects:

{
    "Success": true,
    "Timestamp": {
        "Start": "2024-10-21T02:36:37.677751Z",
        "TotalMs": 23.58,
        "Messages": {}
    },
    "MaxResults": 10,
    "IterationsRequired": 1,
    "EndOfResults": true,
    "RecordsRemaining": 0,
    "Objects": [
        {
            "GUID": "4ae4294d-d135-4b21-a75d-3df5e1c84d2b",
            "TenantGUID": "default",
            "OwnerGUID": "default",
            "Name": "Local filesystem",
            "RepositoryType": "File",
            "IncludeSubdirectories": true,
            "DiskDirectory": "./files/",
            "CreatedUtc": "2024-10-22T13:57:54.000000Z"
        },
{
    "GUID": "8fface0d-9514-4cf6-b260-827dc1c180f4",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "CIFS file server",
    "RepositoryType": "CIFS",
    "IncludeSubdirectories": true,
    "CifsHostname": "windowshost",
    "CifsUsername": "[email protected]",
            "CifsPassword": "***word",
    "CifsShareName": "share3",
    "CreatedUtc": "2024-10-22T13:58:43.000000Z"
        }
    ],
    "ContinuationToken": null
}

Read Data Repository

Retrieves data repository configuration and metadata by GUID using GET /v1.0/tenants/[tenant-guid]/datarepositories/[datarepository-guid]. Returns the complete repository configuration including connection details and authentication settings. If the repository doesn't exist, a 404 error is returned.

Request Parameters

  • datarepository-guid (string, Path, Required): GUID of the data repository object to retrieve
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const readDataRepository = async () => {
  try {
    const response = await api.DataRepository.read(
      "<datarepository-guid>"
    );
    console.log(response, "Data repository fetched successfully");
  } catch (err) {
    console.log("Error fetching Data repository:", err);
  }
};

readDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def readDataRepository():
    dataRepository = crawler.DataRepository.retrieve("<datarepository-guid>")
    print(dataRepository)

readDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
DataRepository response = await sdk.DataRepository.Retrieve(Guid.Parse("<datarepository-guid>"));

Response

Returns the complete data repository configuration:

{
    "GUID": "8fface0d-9514-4cf6-b260-827dc1c180f4",
    "TenantGUID": "default",
    "OwnerGUID": "default",
    "Name": "CIFS file server",
    "RepositoryType": "CIFS",
    "IncludeSubdirectories": true,
    "CifsHostname": "windowshost",
    "CifsUsername": "[email protected]",
    "CifsPassword": "***word",
    "CifsShareName": "share3",
    "CreatedUtc": "2024-10-22T13:58:43.000000Z"
}

Note: the HEAD method can be used as an alternative to get to simply check the existence of the object. HEAD requests return either a 200/OK in the event the object exists, or a 404/Not Found if not. No response body is returned with a HEAD request.

Read All Data Repositories

Retrieves all data repository objects in the tenant using GET /v1.0/tenants/[tenant-guid]/datarepositories/. Returns an array of data repository objects with complete configuration details for all repositories in the tenant.

Request Parameters

No additional parameters required beyond authentication.

curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const readAllDataRepositories = async () => {
  try {
    const response = await api.DataRepository.readAll();
    console.log(response, "All data repositories fetched successfully");
  } catch (err) {
    console.log("Error fetching All data repositories:", err);
  }
};

readAllDataRepositories();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def readAllDataRepositories():
    dataRepositories = crawler.DataRepository.retrieve_all()
    print(dataRepositories)

readAllDataRepositories()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
List<DataRepository> response = await sdk.DataRepository.RetrieveMany();

Response

Returns an array of all data repository objects:

[
    {
        "GUID": "4ae4294d-d135-4b21-a75d-3df5e1c84d2b",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "Local filesystem",
        "RepositoryType": "File",
        "IncludeSubdirectories": true,
        "DiskDirectory": "./files/",
        "CreatedUtc": "2024-10-22T13:57:54.000000Z"
    },
    {
        "GUID": "8fface0d-9514-4cf6-b260-827dc1c180f4",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "CIFS file server",
        "RepositoryType": "CIFS",
        "IncludeSubdirectories": true,
        "CifsHostname": "windowshost",
        "CifsUsername": "[email protected]",
        "CifsPassword": "***word",
        "CifsShareName": "share3",
        "CreatedUtc": "2024-10-22T13:58:43.000000Z"
    },
    {
        "GUID": "e37d0a94-e7e3-447c-9eab-489d1baaad49",
        "TenantGUID": "default",
        "OwnerGUID": "default",
        "Name": "S3 compatible object store",
        "RepositoryType": "AmazonS3",
        "IncludeSubdirectories": true,
        "S3EndpointUrl": "http://s3storage.company.com/",
        "S3BaseUrl": "http://s3storage.company.com/{bucket}/{key}/",
        "S3AccessKey": "***key",
        "S3SecretKey": "***key",
        "S3BucketName": "bucket1",
        "S3Region": "us-west-1",
        "CreatedUtc": "2024-10-22T14:02:14.000000Z"
    }
]

Delete Data Repository

Deletes a data repository object by GUID using DELETE /v1.0/tenants/[tenant-guid]/datarepositories/[datarepository-guid]. This operation permanently removes the data repository configuration from the system. Use with caution as this action cannot be undone.

Important Note: Ensure no active crawl plans are using this repository before deletion, as this will break crawl plan execution.

Request Parameters

  • datarepository-guid (string, Path, Required): GUID of the data repository object to delete
curl --location --request DELETE 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••' \
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const deleteDataRepository = async () => {
  try {
    const response = await api.DataRepository.delete(
      "<datarepository-guid>"
    );
    console.log(response, "Data repository deleted successfully");
  } catch (err) {
    console.log("Error deleting Data repository:", err);
  }
};

deleteDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def deleteDataRepository():
    dataRepository = crawler.DataRepository.delete("<datarepository-guid>")
    print(dataRepository)

deleteDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
bool deleted = await sdk.DataRepository.Delete(Guid.Parse("<datarepository-guid>"));

Response

Returns 204 No Content on successful deletion. No response body is returned.


Check Data Repository Existence

Verifies if a data repository object exists without retrieving its configuration using HEAD /v1.0/tenants/[tenant-guid]/datarepositories/[datarepository-guid]. This is an efficient way to check repository presence before performing operations.

Request Parameters

  • datarepository-guid (string, Path, Required): GUID of the data repository object to check
curl --location --head 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/datarepositories/00000000-0000-0000-0000-000000000000' \
--header 'Authorization: ••••••'
import { ViewCrawlerSdk } from "view-sdk";

const api = new ViewCrawlerSdk(
  "http://localhost:8000/", //endpoint
  "default", //tenant Id
  "default", //access key
);

const existsDataRepository = async () => {
  try {
    const response = await api.DataRepository.exists(
      "<datarepository-guid>"
    );
    console.log(response, "Data repository exists");
  } catch (err) {
    console.log("Error checking Data repository:", err);
  }
};

existsDataRepository();
import view_sdk
from view_sdk import crawler
from view_sdk.sdk_configuration import Service

sdk = view_sdk.configure(
    access_key="default",
    base_url="localhost", 
    tenant_guid="default",
    service_ports={Service.CRAWLER: 8000},
)

def existsDataRepository():
    dataRepository = crawler.DataRepository.exists("<datarepository-guid>")
    print(dataRepository)

existsDataRepository()
using View.Sdk;
using View.Crawler;

ViewCrawlerSdk sdk = new ViewCrawlerSdk(Guid.Parse("00000000-0000-0000-0000-000000000000"), 
                                        "default", 
                                        "http://view.homedns.org:8000/");
bool exists = await sdk.DataRepository.Exists(Guid.Parse("<datarepository-guid>"));

Response

  • 200 No Content: Data repository exists
  • 404 Not Found: Data repository does not exist
  • No response body: Only HTTP status code is returned

Note: HEAD requests do not return a response body, only the HTTP status code indicating whether the data repository exists.

Best Practices

When managing data repositories in the View platform, consider the following recommendations for optimal data access and security:

  • Security Configuration: Use secure authentication methods and protect credentials for all repository types
  • Network Access: Ensure proper network connectivity and firewall configuration for remote repositories
  • Repository Types: Choose appropriate repository types based on your data storage infrastructure and access requirements
  • Directory Structure: Configure subdirectory crawling based on your data organization and processing needs
  • Performance Optimization: Consider network latency and bandwidth when configuring remote repositories

Next Steps

After successfully configuring data repositories, you can:

  • Crawl Plans: Create crawl plans that reference your configured repositories for automated data ingestion
  • Crawl Schedules: Set up crawl schedules to define when and how frequently repositories should be crawled
  • Crawl Filters: Configure crawl filters to optimize content discovery and processing for your repositories
  • Crawl Operations: Monitor repository crawling operations and track processing performance
  • Integration: Integrate data repositories with other View platform services for comprehensive data processing workflows