Comprehensive guide to managing cleanup pipeline operations in the View Processing platform for data cleanup and resource management.
Overview
The cleanup pipeline provides comprehensive data cleanup and resource management capabilities within the View Processing platform. It enables systematic removal of processed data, metadata, embeddings, and associated resources from storage systems, vector repositories, and graph databases.
Cleanup operations are accessible via the View Processing API at [http|https]://[hostname]:[port]/[apiversion]/tenants/[tenantguid]/processing/cleanup
and support both storage-based and crawler-based cleanup workflows with comprehensive resource management.
API Endpoints
- POST
/v1.0/tenants/[tenant-guid]/processing/cleanup
- Execute cleanup pipeline operations for storage or crawler data
Cleanup Pipeline (Storage)
Executes comprehensive cleanup operations for storage-based data using POST /v1.0/tenants/[tenant-guid]/processing/cleanup
. Removes processed objects, metadata, embeddings, and associated resources from storage systems, vector repositories, and graph databases with full cleanup capabilities.
Request Parameters
Required Parameters
- Async (boolean, Body, Required): Whether to execute the cleanup operation asynchronously
- Tenant (object, Body, Required): Tenant metadata for the cleanup operation
- Collection (object, Body, Required): Collection metadata for the cleanup operation
- Bucket (object, Body, Required): Bucket metadata for the cleanup operation
- Pool (object, Body, Required): Storage pool metadata for the cleanup operation
- Object (object, Body, Required): Object metadata for the cleanup operation
- MetadataRule (object, Body, Required): Metadata rule configuration for cleanup
- EmbeddingsRule (object, Body, Required): Embeddings rule configuration for cleanup
- VectorRepository (object, Body, Required): Vector repository configuration for cleanup
- GraphRepository (object, Body, Required): Graph repository configuration for cleanup
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{
"Async": true,
"Tenant": {
"GUID": "00000000-0000-0000-0000-000000000000",
"Name": "Default Tenant",
"Region": "us-west-1",
"S3BaseDomain": "localhost",
"DefaultPoolGUID": "d00000000-0000-0000-0000-000000000000fault",
"Active": true
},
"Collection": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My first collection",
"AllowOverwrites": true,
"AdditionalData": "Created by setup"
},
"Bucket": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"PoolGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "example-data-bucket",
"RegionString": "us-west-1",
"Versioning": true,
"MaxMultipartUploadSeconds": 604800
},
"Pool": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"Name": "default",
"Provider": "Disk",
"WriteMode": "GUID",
"UseSsl": false,
"DiskDirectory": "./disk/",
"Compress": "None",
"EnableReadCaching": false
},
"Object": {
"GUID": "00000000-0000-0000-0000-000000000000",
"ParentGUID": null,
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"TenantName": "My default tenant",
"NodeGUID": null,
"PoolGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"BucketName": "data",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Key": "hello1.txt",
"Version": "1",
"ContentType": "text/plain",
"DocumentType": "Text",
"ContentLength": 13,
"Data": "VGhpcyBpcyBhIHNhbXBsZSBkb2N1bWVudCB3aXRoIGp1c3QgYSBoYW5kZnVsIG9mIHdvcmRzIHRoYXQgd2lsbCBiZSBwcm9jZXNzZWQgYnkgVmlldw=="
},
"MetadataRule": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "example-metadata-rule",
"ContentType": "*",
"MaxContentLength": 16777216,
"DataFlowEndpoint": "http://localhost:8501/processor",
"TypeDetectorEndpoint": "http://localhost:8501/processor/typedetector",
"SemanticCellEndpoint": "http://localhost:8341/",
"MaxChunkContentLength": 512,
"ShiftSize": 448,
"UdrEndpoint": "http://localhost:8321/",
"TopTerms": 25,
"CaseInsensitive": true,
"IncludeFlattened": true,
"DataCatalogEndpoint": "http://localhost:8201/",
"DataCatalogType": "Lexi",
"DataCatalogCollection": "default",
"GraphRepositoryGUID": "00000000-0000-0000-0000-000000000000"
},
"EmbeddingsRule": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My storage server embeddings rule",
"ContentType": "*",
"GraphRepositoryGUID": "00000000-0000-0000-0000-000000000000",
"VectorRepositoryGUID": "00000000-0000-0000-0000-000000000000",
"DataFlowEndpoint": "http://localhost:8501/processor",
"EmbeddingsGenerator": "LCProxy",
"GeneratorUrl": "http://localhost:8301/",
"GeneratorApiKey": "",
"VectorStoreUrl": "http://localhost:8311/",
"MaxContentLength": 16777216
},
"VectorRepository": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My vector repository",
"RepositoryType": "Pgvector",
"Model": "all-MiniLM-L6-v2",
"Dimensionality": 384,
"DatabaseHostname": "localhost",
"DatabaseName": "vectordb",
"DatabaseTable": "minilm",
"DatabasePort": 5432,
"DatabaseUser": "postgres",
"DatabasePassword": "password"
},
"GraphRepository": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My LiteGraph instance",
"RepositoryType": "LiteGraph",
"EndpointUrl": "http://localhost:8701/",
"ApiKey": "default",
"GraphIdentifier": "00000000-0000-0000-0000-000000000000"
}
}
'
import { ViewProcessorSdk } from "view-sdk";
const api = new ViewProcessorSdk(
"http://localhost:8000/", //endpoint
"<tenant-guid>", //tenant Id
"default" //access key
);
const cleanupPipeline = async () => {
try {
const response = await api.processSdk.cleanupPipeline({
Async: true,
Tenant: {
GUID: "<tenant-guid>",
Name: "Default Tenant",
Region: "us-west-1",
S3BaseDomain: "localhost",
DefaultPoolGUID: "<pool-guid>",
Active: true,
},
Collection: {
GUID: "<collection-guid>",
TenantGUID: "<tenant-guid>",
Name: "My first collection",
AllowOverwrites: true,
AdditionalData: "Created by setup",
},
Bucket: {
GUID: "<bucket-guid>",
TenantGUID: "<tenant-guid>",
PoolGUID: "<pool-guid>",
OwnerGUID: "<owner-guid>",
Name: "example-data-bucket",
RegionString: "us-west-1",
Versioning: true,
MaxMultipartUploadSeconds: 604800,
},
Pool: {
GUID: "<pool-guid>",
TenantGUID: "<tenant-guid>",
Name: "default",
Provider: "Disk",
WriteMode: "GUID",
UseSsl: false,
DiskDirectory: "./disk/",
Compress: "None",
EnableReadCaching: false,
},
Object: {
GUID: "<object-guid>",
ParentGUID: null,
TenantGUID: "<tenant-guid>",
TenantName: "My default tenant",
NodeGUID: null,
PoolGUID: "<pool-guid>",
BucketGUID: "<bucket-guid>",
BucketName: "data",
OwnerGUID: "<owner-guid>",
Key: "hello1.txt",
Version: "1",
ContentType: "text/plain",
DocumentType: "Text",
ContentLength: 13,
Data: "VGhpcyBpcyBhIHNhbXBsZSBkb2N1bWVudCB3aXRoIGp1c3QgYSBoYW5kZnVsIG9mIHdvcmRzIHRoYXQgd2lsbCBiZSBwcm9jZXNzZWQgYnkgVmlldw==",
},
MetadataRule: {
GUID: "<metadatarule-guid>",
TenantGUID: "<tenant-guid>",
BucketGUID: "<bucket-guid>",
OwnerGUID: "<owner-guid>",
Name: "example-metadata-rule",
ContentType: "*",
MaxContentLength: 16777216,
DataFlowEndpoint: "http://localhost:8501/processor",
TypeDetectorEndpoint: "http://localhost:8501/processor/typedetector",
SemanticCellEndpoint: "http://localhost:8341/",
MaxChunkContentLength: 512,
ShiftSize: 448,
UdrEndpoint: "http://localhost:8321/",
TopTerms: 25,
CaseInsensitive: true,
IncludeFlattened: true,
DataCatalogEndpoint: "http://localhost:8201/",
DataCatalogType: "Lexi",
DataCatalogCollection: "default",
GraphRepositoryGUID: "<graph-repository-guid>",
},
EmbeddingsRule: {
GUID: "<embeddingrule-guid>",
TenantGUID: "<tenant-guid>",
BucketGUID: "<bucket-guid>",
OwnerGUID: "<owner-guid>",
Name: "My storage server embeddings rule",
ContentType: "*",
GraphRepositoryGUID: "<graph-repository-guid>",
VectorRepositoryGUID: "<vector-repository-guid>",
DataFlowEndpoint: "http://localhost:8501/processor",
EmbeddingsGenerator: "LCProxy",
GeneratorUrl: "http://localhost:8301/",
GeneratorApiKey: "",
VectorStoreUrl: "http://localhost:8311/",
MaxContentLength: 16777216,
},
VectorRepository: {
GUID: "<vector-repository-guid>",
TenantGUID: "<tenant-guid>",
Name: "My vector repository",
RepositoryType: "Pgvector",
Model: "all-MiniLM-L6-v2",
Dimensionality: 384,
DatabaseHostname: "localhost",
DatabaseName: "vectordb",
DatabaseTable: "minilm",
DatabasePort: 5432,
DatabaseUser: "postgres",
DatabasePassword: "password",
},
GraphRepository: {
GUID: "<graph-repository-guid>",
TenantGUID: "<tenant-guid>",
Name: "My LiteGraph instance",
RepositoryType: "LiteGraph",
EndpointUrl: "http://localhost:8701/",
ApiKey: "default",
GraphIdentifier: "<graph-identifier>",
},
});
console.log(response);
} catch (err) {
console.log("Error", err);
}
};
cleanupPipeline();
import view_sdk
from view_sdk import processor
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.LEXI: 8000},
)
def cleanup():
result = processor.Cleanup.cleanup_pipeline(Async=True,
Tenant={
"GUID": "<tenant-guid>",
"Name": "Default Tenant",
"Region": "us-west-1",
"S3BaseDomain": "localhost",
"DefaultPoolGUID": "<pool-guid>",
"Active": True
},
Collection={
"GUID": "<collection-guid>",
"TenantGUID": "<tenant-guid>",
"Name": "My first collection",
"AllowOverwrites": True,
"AdditionalData": "Created by setup"
},
Bucket={
"GUID": "<bucket-guid>",
"TenantGUID": "<tenant-guid>",
"PoolGUID": "<pool-guid>",
"OwnerGUID": "<owner-guid>",
"Category": "Data",
"Name": "example-data-bucket",
"RegionString": "us-west-1",
"Versioning": True,
"MaxMultipartUploadSeconds": 604800
},
Pool={
"GUID": "<pool-guid>",
"TenantGUID": "<tenant-guid>",
"Name": "default",
"Provider": "Disk",
"WriteMode": "GUID",
"UseSsl": False,
"DiskDirectory": "./disk/",
"Compress": "None",
"EnableReadCaching": False
},
Object={
"GUID": "<object-guid>",
"ParentGUID": None,
"TenantGUID": "<tenant-guid>",
"TenantName": "My default tenant",
"PoolGUID": "<pool-guid>",
"BucketGUID": "<bucket-guid>",
"BucketName": "data",
"OwnerGUID": "<owner-guid>",
"Key": "hello1.txt",
"Version": "1",
"ContentType": "text/plain",
"DocumentType": "Text",
"ContentLength": 13
},
MetadataRule={
"GUID": "<metadatarule-guid>",
"TenantGUID": "<tenant-guid>",
"BucketGUID": "<bucket-guid>",
"OwnerGUID": "<owner-guid>",
"Name": "example-metadata-rule",
"ContentType": "*",
"MaxContentLength": 16777216,
"DataFlowEndpoint": "http://localhost:8501/processor",
"TypeDetectorEndpoint": "http://localhost:8501/processor/typedetector",
"SemanticCellEndpoint": "http://localhost:8341/",
"MaxChunkContentLength": 512,
"ShiftSize": 448,
"UdrEndpoint": "http://localhost:8321/",
"TopTerms": 25,
"CaseInsensitive": True,
"IncludeFlattened": True,
"DataCatalogEndpoint": "http://localhost:8201/",
"DataCatalogType": "Lexi",
"DataCatalogCollection": "default",
"GraphRepositoryGUID": "<graph-repository-guid>",
"TargetBucketGUID": "<target-bucket-guid>"
},
EmbeddingsRule={
"GUID": "<embeddingrule-guid>",
"TenantGUID": "<tenant-guid>",
"BucketGUID": "<bucket-guid>",
"OwnerGUID": "<owner-guid>",
"Name": "My storage server embeddings rule",
"ContentType": "*",
"GraphRepositoryGUID": "<graph-repository-guid>",
"VectorRepositoryGUID": "<vector-repository-guid>",
"DataFlowEndpoint": "http://localhost:8501/processor",
"EmbeddingsGenerator": "LCProxy",
"GeneratorUrl": "http://localhost:8301/",
"GeneratorApiKey": "",
"VectorStoreUrl": "http://localhost:8311/",
"MaxContentLength": 16777216
},
VectorRepository={
"GUID": "<vector-repository-guid>",
"Name": "My vector repository",
"RepositoryType": "Pgvector",
"Model": "all-MiniLM-L6-v2",
"Dimensionality": 384,
"DatabaseHostname": "localhost",
"DatabaseName": "vectordb",
"DatabaseTable": "minilm",
"DatabasePort": 5432,
"DatabaseUser": "postgres",
"DatabasePassword": "password"
},
GraphRepository={
"GUID": "<graph-repository-guid>",
"TenantGUID": "<tenant-guid>",
"Name": "My LiteGraph instance",
"RepositoryType": "LiteGraph",
"EndpointUrl": "http://localhost:8701/",
"ApiKey": "default",
"GraphIdentifier": "<graph-identifier>"
})
print(result)
cleanup()
using View.Sdk;
using View.Sdk.Processor;
ViewProcessorSdk sdk = new ViewProcessorSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
TenantMetadata tenant = new TenantMetadata
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "Default Tenant",
Region = "us-west-1",
S3BaseDomain = "localhost",
DefaultPoolGUID = "d00000000-0000-0000-0000-000000000000fault",
Active = true
};
Collection collection = new Collection
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My first collection",
AllowOverwrites = true,
AdditionalData = "Created by setup"
};
StoragePool pool = new StoragePool
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "default",
Provider = "Disk",
WriteMode = "GUID",
UseSsl = false,
DiskDirectory = "./disk/",
Compress = "None",
EnableReadCaching = false
};
BucketMetadata bucket = new BucketMetadata
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
PoolGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "example-data-bucket",
RegionString = "us-west-1",
Versioning = true,
MaxMultipartUploadSeconds = 604800
};
ObjectMetadata obj = new ObjectMetadata
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
ParentGUID = (Guid?)null,
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantName = "My default tenant",
NodeGUID = (Guid?)null,
PoolGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketName = "data",
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Key = "hello1.txt",
Version = "1",
ContentType = "text/plain",
DocumentType = "Text",
ContentLength = 13,
Data = "VGhpcyBpcyBhIHNhbXBsZSBkb2N1bWVudCB3aXRoIGp1c3QgYSBoYW5kZnVsIG9mIHdvcmRzIHRoYXQgd2lsbCBiZSBwcm9jZXNzZWQgYnkgVmlldw=="
};
MetadataRule mdRule = new MetadataRule
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "example-metadata-rule",
ContentType = "*",
MaxContentLength = 16777216,
DataFlowEndpoint = "http://localhost:8501/processor",
TypeDetectorEndpoint = "http://localhost:8501/processor/typedetector",
SemanticCellEndpoint = "http://localhost:8341/",
MaxChunkContentLength = 512,
ShiftSize = 448,
UdrEndpoint = "http://localhost:8321/",
TopTerms = 25,
CaseInsensitive = true,
IncludeFlattened = true,
DataCatalogEndpoint = "http://localhost:8201/",
DataCatalogType = "Lexi",
DataCatalogCollection = "default",
GraphRepositoryGUID = Guid.Parse("00000000-0000-0000-0000-000000000000")
};
EmbeddingsRule embedRule = new EmbeddingsRule
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My storage server embeddings rule",
ContentType = "*",
GraphRepositoryGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
VectorRepositoryGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
DataFlowEndpoint = "http://localhost:8501/processor",
EmbeddingsGenerator = "LCProxy",
GeneratorUrl = "http://localhost:8301/",
GeneratorApiKey = "",
VectorStoreUrl = "http://localhost:8311/",
MaxContentLength = 16777216
};
VectorRepository vectorRepo = new VectorRepository
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My vector repository",
RepositoryType = "Pgvector",
Model = "all-MiniLM-L6-v2",
Dimensionality = 384,
DatabaseHostname = "localhost",
DatabaseName = "vectordb",
DatabaseTable = "minilm",
DatabasePort = 5432,
DatabaseUser = "postgres",
DatabasePassword = "password"
};
GraphRepository graphRepo = new GraphRepository
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My LiteGraph instance",
RepositoryType = "LiteGraph",
EndpointUrl = "http://localhost:8701/",
ApiKey = "default",
GraphIdentifier = Guid.Parse("00000000-0000-0000-0000-000000000000")
};
bool async = true;
CleanupResult response = await sdk.Cleanup.Process(tenant,
collection,
pool,
bucket,
obj,
mdRule,
embedRule,
vectorRepo,
graphRepo,
async);
Response
Returns cleanup operation results with execution status and timing information.
{
"GUID": "3292d8eb-642b-40f4-a2de-9b81e66de288",
"Success": true,
"Async": true,
"Timestamp": {
"Start": "2025-04-30T13:19:30.096373Z",
"TotalMs": 34.2,
"Messages": {}
}
}
Cleanup Pipeline (Crawler)
Executes comprehensive cleanup operations for crawler-based data using POST /v1.0/tenants/[tenant-guid]/processing/cleanup
. Removes processed objects, metadata, embeddings, and associated resources from data repositories, vector repositories, and graph databases with full cleanup capabilities.
Request Parameters
Required Parameters
- Async (boolean, Body, Required): Whether to execute the cleanup operation asynchronously
- Tenant (object, Body, Required): Tenant metadata for the cleanup operation
- Collection (object, Body, Required): Collection metadata for the cleanup operation
- DataRepository (object, Body, Required): Data repository metadata for the cleanup operation
- Object (object, Body, Required): Object metadata for the cleanup operation
- MetadataRule (object, Body, Required): Metadata rule configuration for cleanup
- EmbeddingsRule (object, Body, Required): Embeddings rule configuration for cleanup
- VectorRepository (object, Body, Required): Vector repository configuration for cleanup
- GraphRepository (object, Body, Required): Graph repository configuration for cleanup
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/cleanup' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{
"Async": true,
"Tenant": {
"GUID": "00000000-0000-0000-0000-000000000000",
"Name": "Default Tenant",
"Region": "us-west-1",
"S3BaseDomain": "localhost",
"DefaultPoolGUID": "00000000-0000-0000-0000-000000000000",
"Active": true
},
"Collection": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My first collection",
"AllowOverwrites": true,
"AdditionalData": "Created by setup"
},
"DataRepository": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My disk data repository",
"RepositoryType": "File",
"DiskDirectory": "./disk/"
},
"Object": {
"GUID": "00000000-0000-0000-0000-000000000001",
"ParentGUID": null,
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"TenantName": "My default tenant",
"NodeGUID": null,
"PoolGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"BucketName": "data",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Key": "hello2.txt",
"Version": "1",
"ContentType": "text/plain",
"DocumentType": "Text",
"ContentLength": 13,
"Data": "VGhpcyBpcyBhIHNhbXBsZSBkb2N1bWVudCB3aXRoIGp1c3QgYSBoYW5kZnVsIG9mIHdvcmRzIHRoYXQgd2lsbCBiZSBwcm9jZXNzZWQgYnkgVmlldw=="
},
"MetadataRule": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "example-metadata-rule",
"ContentType": "*",
"MaxContentLength": 16777216,
"DataFlowEndpoint": "http://localhost:8501/processor",
"TypeDetectorEndpoint": "http://localhost:8501/processor/typedetector",
"SemanticCellEndpoint": "http://localhost:8341/",
"MaxChunkContentLength": 512,
"ShiftSize": 448,
"UdrEndpoint": "http://localhost:8321/",
"TopTerms": 25,
"CaseInsensitive": true,
"IncludeFlattened": true,
"DataCatalogEndpoint": "http://localhost:8201/",
"DataCatalogType": "Lexi",
"DataCatalogCollection": "00000000-0000-0000-0000-000000000000",
"GraphRepositoryGUID": "00000000-0000-0000-0000-000000000000"
},
"EmbeddingsRule": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My storage server embeddings rule",
"ContentType": "*",
"GraphRepositoryGUID": "00000000-0000-0000-0000-000000000000",
"VectorRepositoryGUID": "00000000-0000-0000-0000-000000000000",
"DataFlowEndpoint": "http://localhost:8501/processor",
"EmbeddingsGenerator": "LCProxy",
"GeneratorUrl": "http://localhost:8301/",
"GeneratorApiKey": "",
"VectorStoreUrl": "http://localhost:8311/",
"MaxContentLength": 16777216
},
"VectorRepository": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My vector repository",
"RepositoryType": "Pgvector",
"Model": "all-MiniLM-L6-v2",
"Dimensionality": 384,
"DatabaseHostname": "localhost",
"DatabaseName": "vectordb",
"DatabaseTable": "minilm",
"DatabasePort": 5432,
"DatabaseUser": "postgres",
"DatabasePassword": "password"
},
"GraphRepository": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My LiteGraph instance",
"RepositoryType": "LiteGraph",
"EndpointUrl": "http://localhost:8701/",
"ApiKey": "default",
"GraphIdentifier": "00000000-0000-0000-0000-000000000000"
}
}
'
import { ViewProcessorSdk } from "view-sdk";
const api = new ViewProcessorSdk(
"http://localhost:8000/", //endpoint
"<tenant-guid>", //tenant Id
"default" //access key
);
const cleanupPipeline = async () => {
try {
const response = await api.processSdk.cleanupPipeline({
Async: true,
Tenant: {
GUID: "<tenant-guid>",
Name: "Default Tenant",
Region: "us-west-1",
S3BaseDomain: "localhost",
DefaultPoolGUID: "<pool-guid>",
Active: true,
},
Collection: {
GUID: "<collection-guid>",
TenantGUID: "<tenant-guid>",
Name: "My first collection",
AllowOverwrites: true,
AdditionalData: "Created by setup",
},
DataRepository: {
GUID: "<datarepository-guid>",
TenantGUID: "<tenant-guid>",
OwnerGUID: "<owner-guid>",
Name: "My disk data repository",
RepositoryType: "File",
DiskDirectory: "./disk/",
},
Object: {
GUID: "<object-guid>",
ParentGUID: null,
TenantGUID: "<tenant-guid>",
TenantName: "My default tenant",
NodeGUID: null,
PoolGUID: "<pool-guid>",
BucketGUID: "<bucket-guid>",
BucketName: "data",
OwnerGUID: "<owner-guid>",
Key: "hello2.txt",
Version: "1",
ContentType: "text/plain",
DocumentType: "Text",
ContentLength: 13,
Data: "VGhpcyBpcyBhIHNhbXBsZSBkb2N1bWVudCB3aXRoIGp1c3QgYSBoYW5kZnVsIG9mIHdvcmRzIHRoYXQgd2lsbCBiZSBwcm9jZXNzZWQgYnkgVmlldw==",
},
MetadataRule: {
GUID: "<metadatarule-guid>",
TenantGUID: "<tenant-guid>",
BucketGUID: "<bucket-guid>",
OwnerGUID: "<owner-guid>",
Name: "example-metadata-rule",
ContentType: "*",
MaxContentLength: 16777216,
DataFlowEndpoint: "http://localhost:8501/processor",
TypeDetectorEndpoint: "http://localhost:8501/processor/typedetector",
SemanticCellEndpoint: "http://localhost:8341/",
MaxChunkContentLength: 512,
ShiftSize: 448,
UdrEndpoint: "http://localhost:8321/",
TopTerms: 25,
CaseInsensitive: true,
IncludeFlattened: true,
DataCatalogEndpoint: "http://localhost:8201/",
DataCatalogType: "Lexi",
DataCatalogCollection: "<collection-guid>",
GraphRepositoryGUID: "<graph-repository-guid>",
},
EmbeddingsRule: {
GUID: "<embeddingrule-guid>",
TenantGUID: "<tenant-guid>",
BucketGUID: "<bucket-guid>",
OwnerGUID: "<owner-guid>",
Name: "My storage server embeddings rule",
ContentType: "*",
GraphRepositoryGUID: "<graph-repository-guid>",
VectorRepositoryGUID: "<vector-repository-guid>",
DataFlowEndpoint: "http://localhost:8501/processor",
EmbeddingsGenerator: "LCProxy",
GeneratorUrl: "http://localhost:8301/",
GeneratorApiKey: "",
VectorStoreUrl: "http://localhost:8311/",
MaxContentLength: 16777216,
},
VectorRepository: {
GUID: "<vector-repository-guid>",
TenantGUID: "<tenant-guid>",
Name: "My vector repository",
RepositoryType: "Pgvector",
Model: "all-MiniLM-L6-v2",
Dimensionality: 384,
DatabaseHostname: "localhost",
DatabaseName: "vectordb",
DatabaseTable: "minilm",
DatabasePort: 5432,
DatabaseUser: "postgres",
DatabasePassword: "password",
},
GraphRepository: {
GUID: "<graph-repository-guid>",
TenantGUID: "<tenant-guid>",
Name: "My LiteGraph instance",
RepositoryType: "LiteGraph",
EndpointUrl: "http://localhost:8701/",
ApiKey: "default",
GraphIdentifier: "<graph-identifier>",
},
});
console.log(response);
} catch (err) {
console.log("Error", err);
}
};
cleanupPipeline();
import view_sdk
from view_sdk import processor
sdk = view_sdk.configure(
access_key="default",
base_url="localhost",
tenant_guid="default",
service_ports={Service.LEXI: 8000},
)
def cleanup():
result = processor.Cleanup.cleanup_pipeline(Async=True,
Tenant={
"GUID": "<tenant-guid>",
"Name": "Default Tenant",
"Region": "us-west-1",
"S3BaseDomain": "localhost",
"DefaultPoolGUID": "<pool-guid>",
"Active": True
},
Collection={
"GUID": "<collection-guid>",
"TenantGUID": "<tenant-guid>",
"Name": "My first collection",
"AllowOverwrites": True,
"AdditionalData": "Created by setup"
},
DataRepository={
"GUID": "<datarepository-guid>",
"TenantGUID": "<tenant-guid>",
"OwnerGUID": "<owner-guid>",
"Name": "My disk data repository",
"RepositoryType": "File",
"DiskDirectory": "./disk/"
},
Object={
"GUID": "<object-guid>",
"ParentGUID": None,
"TenantGUID": "<tenant-guid>",
"TenantName": "My default tenant",
"NodeGUID": None,
"PoolGUID": "<pool-guid>",
"BucketGUID": "<bucket-guid>",
"BucketName": "data",
"OwnerGUID": "<owner-guid>",
"Key": "hello2.txt",
"Version": "1",
"ContentType": "text/plain",
"DocumentType": "Text",
"ContentLength": 13,
"Data": "VGhpcyBpcyBhIHNhbXBsZSBkb2N1bWVudCB3aXRoIGp1c3QgYSBoYW5kZnVsIG9mIHdvcmRzIHRoYXQgd2lsbCBiZSBwcm9jZXNzZWQgYnkgVmlldw=="
},
MetadataRule={
"GUID": "<metadatarule-guid>",
"TenantGUID": "<tenant-guid>",
"BucketGUID": "<bucket-guid>",
"OwnerGUID": "<owner-guid>",
"Name": "example-metadata-rule",
"ContentType": "*",
"MaxContentLength": 16777216,
"DataFlowEndpoint": "http://localhost:8501/processor",
"TypeDetectorEndpoint": "http://localhost:8501/processor/typedetector",
"SemanticCellEndpoint": "http://localhost:8341/",
"MaxChunkContentLength": 512,
"ShiftSize": 448,
"UdrEndpoint": "http://localhost:8321/",
"TopTerms": 25,
"CaseInsensitive": True,
"IncludeFlattened": True,
"DataCatalogEndpoint": "http://localhost:8201/",
"DataCatalogType": "Lexi",
"DataCatalogCollection": "<collection-guid>",
"GraphRepositoryGUID": "<graph-repository-guid>"
},
EmbeddingsRule={
"GUID": "<embeddingrule-guid>",
"TenantGUID": "<tenant-guid>",
"BucketGUID": "<bucket-guid>",
"OwnerGUID": "<owner-guid>",
"Name": "My storage server embeddings rule",
"ContentType": "*",
"GraphRepositoryGUID": "<graph-repository-guid>",
"VectorRepositoryGUID": "<vector-repository-guid>",
"DataFlowEndpoint": "http://localhost:8501/processor",
"EmbeddingsGenerator": "LCProxy",
"GeneratorUrl": "http://localhost:8301/",
"GeneratorApiKey": "",
"VectorStoreUrl": "http://localhost:8311/",
"MaxContentLength": 16777216
},
VectorRepository={
"GUID": "<vector-repository-guid>",
"TenantGUID": "<tenant-guid>",
"Name": "My vector repository",
"RepositoryType": "Pgvector",
"Model": "all-MiniLM-L6-v2",
"Dimensionality": 384,
"DatabaseHostname": "localhost",
"DatabaseName": "vectordb",
"DatabaseTable": "minilm",
"DatabasePort": 5432,
"DatabaseUser": "postgres",
"DatabasePassword": "password"
},
GraphRepository={
"GUID": "<graph-repository-guid>",
"TenantGUID": "<tenant-guid>",
"Name": "My LiteGraph instance",
"RepositoryType": "LiteGraph",
"EndpointUrl": "http://localhost:8701/",
"ApiKey": "default",
"GraphIdentifier": "<graph-identifier>"
})
print(result)
cleanup()
using View.Sdk;
using View.Sdk.Processor;
ViewProcessorSdk sdk = new ViewProcessorSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
// Tenant
TenantMetadata tenant = new TenantMetadata
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "Default Tenant",
Region = "us-west-1",
S3BaseDomain = "localhost",
DefaultPoolGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Active = true
};
// Collection
Collection collection = new Collection
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My first collection",
AllowOverwrites = true,
AdditionalData = "Created by setup"
};
DataRepository dataRepository = new DataRepository
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My disk data repository",
RepositoryType = "File",
DiskDirectory = "./disk/"
};
// Object
ObjectMetadata obj = new ObjectMetadata
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000001"),
ParentGUID = (Guid?)null,
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantName = "My default tenant",
NodeGUID = (Guid?)null,
PoolGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketName = "data",
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Key = "hello2.txt",
Version = "1",
ContentType = "text/plain",
DocumentType = "Text",
ContentLength = 13,
Data = "VGhpcyBpcyBhIHNhbXBsZSBkb2N1bWVudCB3aXRoIGp1c3QgYSBoYW5kZnVsIG9mIHdvcmRzIHRoYXQgd2lsbCBiZSBwcm9jZXNzZWQgYnkgVmlldw=="
};
// MetadataRule
MetadataRule mdRule = new MetadataRule
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "example-metadata-rule",
ContentType = "*",
MaxContentLength = 16777216,
DataFlowEndpoint = "http://localhost:8501/processor",
TypeDetectorEndpoint = "http://localhost:8501/processor/typedetector",
SemanticCellEndpoint = "http://localhost:8341/",
MaxChunkContentLength = 512,
ShiftSize = 448,
UdrEndpoint = "http://localhost:8321/",
TopTerms = 25,
CaseInsensitive = true,
IncludeFlattened = true,
DataCatalogEndpoint = "http://localhost:8201/",
DataCatalogType = "Lexi",
DataCatalogCollection = "00000000-0000-0000-0000-000000000000",
GraphRepositoryGUID = Guid.Parse("00000000-0000-0000-0000-000000000000")
};
// EmbeddingsRule
EmbeddingsRule embedRule = new EmbeddingsRule
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My storage server embeddings rule",
ContentType = "*",
GraphRepositoryGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
VectorRepositoryGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
DataFlowEndpoint = "http://localhost:8501/processor",
EmbeddingsGenerator = "LCProxy",
GeneratorUrl = "http://localhost:8301/",
GeneratorApiKey = "",
VectorStoreUrl = "http://localhost:8311/",
MaxContentLength = 16777216
};
// VectorRepository
VectorRepository vectorRepo = new VectorRepository
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My vector repository",
RepositoryType = "Pgvector",
Model = "all-MiniLM-L6-v2",
Dimensionality = 384,
DatabaseHostname = "localhost",
DatabaseName = "vectordb",
DatabaseTable = "minilm",
DatabasePort = 5432,
DatabaseUser = "postgres",
DatabasePassword = "password"
};
// GraphRepository
GraphRepository graphRepo = new GraphRepository
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My LiteGraph instance",
RepositoryType = "LiteGraph",
EndpointUrl = "http://localhost:8701/",
ApiKey = "default",
GraphIdentifier = Guid.Parse("00000000-0000-0000-0000-000000000000")
};
bool async = true;
CleanupResult response = await sdk.Cleanup.Process(tenant,
collection,
pool,
bucket,
obj,
mdRule,
embedRule,
vectorRepo,
graphRepo,
async);
Response
Returns cleanup operation results with execution status and timing information.
{
"GUID": "3292d8eb-642b-40f4-a2de-9b81e66de288",
"Success": true,
"Async": true,
"Timestamp": {
"Start": "2025-04-30T13:19:30.096373Z",
"TotalMs": 34.2,
"Messages": {}
}
}
Best Practices
When managing cleanup pipeline operations in the View Processing platform, consider the following recommendations for optimal data cleanup, resource management, and processing efficiency:
- Cleanup Strategy: Implement systematic cleanup strategies based on data lifecycle, retention policies, and resource utilization patterns
- Resource Management: Monitor and manage storage, vector repository, and graph database resources to prevent resource exhaustion
- Data Integrity: Ensure cleanup operations maintain data integrity and consistency across all affected systems and repositories
- Performance Optimization: Use asynchronous cleanup operations for large-scale data cleanup to optimize processing performance
- Monitoring and Logging: Implement comprehensive monitoring and logging for cleanup operations to track progress and identify issues
Next Steps
After successfully managing cleanup pipeline operations, you can:
- Processing Pipeline: Implement comprehensive processing pipeline operations for data ingestion and processing workflows
- Metadata Generation: Generate and manage metadata using UDR (Universal Data Representation) for enhanced search capabilities
- Semantic Processing: Extract semantic cells and generate embeddings for AI-powered content analysis and search
- Type Detection: Implement automated type detection for various document formats and content types
- Resource Optimization: Monitor and optimize resource utilization across storage, vector, and graph repositories