Comprehensive guide to generating metadata using Universal Data Representation (UDR) in the View Processing platform for enhanced search capabilities.
Overview
Universal Data Representation (UDR) provides comprehensive metadata generation capabilities within the View Processing platform. UDR serves as the foundational data representation for Lexi, enabling both simple and advanced search capabilities through automated content analysis, term extraction, and schema inference.
UDR metadata generation is accessible via the View Processing API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/processing/udr
and supports comprehensive content analysis for various document types and formats.
API Endpoints
- POST
/v1.0/tenants/[tenant-guid]/processing/udr
- Generate UDR metadata for data assets
UDR Metadata Components
UDR documents contain comprehensive metadata including:
- Key Terms: List of key terms identified within the data asset and their frequency
- Full Terms: Complete list of terms identified within the data asset
- Inferred Schema: Schema inference when content type implies structured data
- Flattened Representation: Simplified document structure for enhanced query capabilities
- Postings: Inverted index over the document including terms, frequencies, and positions
- Semantic Cells: Results from semantic cell extraction processes (appended separately)
Generate UDR Metadata
Generates comprehensive UDR metadata for data assets using POST /v1.0/tenants/[tenant-guid]/processing/udr
. Requires prior type detection to determine the appropriate processing approach for the data asset.
Request Parameters
Required Parameters
- GUID (string, Body, Required): Unique identifier for the data asset
- Key (string, Body, Required): Key/filename of the data asset
- ContentType (string, Body, Required): MIME type of the content
- Type (string, Body, Required): Detected document type from type detection
- Data (string, Body, Required): Base64-encoded content of the data asset
- MetadataRule (object, Body, Required): Metadata rule configuration for UDR generation
{
"GUID": "00000000-0000-0000-0000-000000000000",
"Key": "testfile.text",
"ContentType": "text/plain",
"Type": "Text",
"IncludeFlattened": true,
"CaseInsensitive": true,
"TopTerms": 10,
"AdditionalData": "The body below is simple sample text, base64 encoded, taken from https://en.wikipedia.org/wiki/Artificial_intelligence.",
"Metadata": {
"foo": "bar"
},
"MetadataRule": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My metadata rule",
"ContentType": "text/plain",
"UdrEndpoint": "http://localhost:8000/",
"DataCatalogType": "Lexi",
"DataCatalogEndpoint": "http://localhost:8000/",
"DataCatalogCollection": "00000000-0000-0000-0000-000000000000",
"TopTerms": 10,
"CaseInsensitive": true,
"IncludeFlattened": true
},
"Data": "QXJ0aWZpY2..."
}
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/udr' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{
"GUID": "00000000-0000-0000-0000-000000000000",
"Key": "testfile.text",
"ContentType": "text/plain",
"Type": "Text",
"IncludeFlattened": true,
"CaseInsensitive": true,
"TopTerms": 10,
"AdditionalData": "The body below is simple sample text, base64 encoded, taken from https://en.wikipedia.org/wiki/Artificial_intelligence.",
"Metadata": {
"foo": "bar"
},
"MetadataRule": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My metadata rule",
"ContentType": "text/plain",
"UdrEndpoint": "http://localhost:8000/",
"DataCatalogType": "Lexi",
"DataCatalogEndpoint": "http://localhost:8000/",
"DataCatalogCollection": "00000000-0000-0000-0000-000000000000",
"TopTerms": 10,
"CaseInsensitive": true,
"IncludeFlattened": true
},
"Data": "QXJ0aWZpY2..."
}'
import { ViewProcessorSdk } from "view-sdk";
const api = new ViewProcessorSdk(
"http://localhost:8000/", //endpoint
"<tenant-guid>", //tenant Id
"default" //access token
);
const generateUDR = async () => {
try {
const response = await api.processSdk.generateUdr({
GUID: "<object-guid>",
Key: "testfile.text",
ContentType: "text/plain",
Type: "Text",
IncludeFlattened: true,
CaseInsensitive: true,
TopTerms: 10,
AdditionalData:
"The body below is simple sample text, base64 encoded, taken from https://en.wikipedia.org/wiki/Artificial_intelligence.",
Metadata: {
foo: "bar",
},
MetadataRule: {
GUID: "<metadatarule-guid>",
TenantGUID: "<tenant-guid>",
BucketGUID: "<bucket-guid>",
OwnerGUID: "<owner-guid>",
Name: "My metadata rule",
ContentType: "text/plain",
UdrEndpoint: "http://localhost:8000/",
DataCatalogType: "Lexi",
DataCatalogEndpoint: "http://localhost:8000/",
DataCatalogCollection: "<collection-guid>",
TopTerms: 10,
CaseInsensitive: true,
IncludeFlattened: true,
},
Data: "QXJ0aWZpY2lh...",
});
console.log(response);
} catch (err) {
console.log("Error", err);
}
};
generateUDR();
import view_sdk
from view_sdk import processor
sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")
def udrGeneration():
result = processor.UdrGenerator.generate(
GUID= "<object-guid>",
Key= "testfile.text",
ContentType= "text/plain",
Type= "Text",
IncludeFlattened= True,
CaseInsensitive= True,
TopTerms= 10,
AdditionalData= "The body below is simple sample text, base64 encoded, taken from https://en.wikipedia.org/wiki/Artificial_intelligence.",
Metadata= {
"foo": "bar"
},
MetadataRule= {
"GUID": "<metadatarule-guid>",
"TenantGUID": "<tenant-guid>",
"BucketGUID": "<bucket-guid>",
"OwnerGUID": "<owner-guid>",
"Name": "My metadata rule",
"ContentType": "text/plain",
"UdrEndpoint": "http://localhost:8000/",
"DataCatalogType": "Lexi",
"DataCatalogEndpoint": "http://localhost:8000/",
"DataCatalogCollection": "<collection-guid>",
"TopTerms": 10,
"CaseInsensitive": True,
"IncludeFlattened": True
},
Data= "QXJ0aWZpY2lhbCBp..."
)
print(result)
udrGeneration()
using View.Sdk;
using View.Sdk.Processor;
ViewProcessorSdk sdk = new ViewProcessorSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
UdrDocumentRequest request = new UdrDocumentRequest
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Key = "testfile.text",
ContentType = "text/plain",
Type = "Text",
IncludeFlattened = true,
CaseInsensitive = true,
TopTerms = 10,
AdditionalData = "The body below is simple sample text, base64 encoded, taken from https://en.wikipedia.org/wiki/Artificial_intelligence.",
Metadata = new Dictionary<string, object>(StringComparer.InvariantCultureIgnoreCase)
{
["foo"] = "bar"
},
MetadataRule = new MetadataRule
{
GUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
TenantGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
BucketGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
OwnerGUID = Guid.Parse("00000000-0000-0000-0000-000000000000"),
Name = "My metadata rule",
ContentType = "text/plain",
UdrEndpoint = "http://localhost:8000/",
DataCatalogType = "Lexi",
DataCatalogEndpoint = "http://localhost:8000/",
DataCatalogCollection = "00000000-0000-0000-0000-000000000000",
TopTerms = 10,
CaseInsensitive = true,
IncludeFlattened = true
},
Data = "QXJ0aWZpY2lhbCBpbnRlbG..."
};
UdrDocument response = await sdk.UdrGenerator.GenerateUdr(request);
Response
Returns a fully-populated UDR document with comprehensive metadata, term analysis, and search optimization data.
{
"GUID": "00000000-0000-0000-0000-000000000000",
"Success": true,
"Timestamp": {
"Start": "2025-04-30T12:54:18.561659Z",
"End": "2025-04-30T12:54:18.618885Z",
"TotalMs": 57.23,
"Messages": {}
},
"AdditionalData": "The body below is simple sample text, base64 encoded, taken from https://en.wikipedia.org/wiki/Artificial_intelligence.",
"Metadata": {
"foo": "bar"
},
"Key": "testfile.text",
"Type": "Text",
"Terms": [
"Artificial",
"intelligence",
"broadest",
],
"TopTerms": {
"intelligence": 3,
"machines": 3,
"applications": 3
},
"Schema": {
"Type": "Text",
"Schema": {},
"Metadata": {},
"Flattened": []
},
"Postings": [
{
"Term": "Artificial",
"Count": 1,
"AbsolutePositions": [
0
]
},
{
"Term": "anymore",
"Count": 1,
"AbsolutePositions": [
96
]
}
],
"SemanticCells": []
}
Best Practices
When generating UDR metadata in the View Processing platform, consider the following recommendations for optimal content analysis, term extraction, and search optimization:
- Type Detection: Always perform type detection before UDR generation to ensure appropriate processing for different document formats
- Term Processing: Configure appropriate term processing settings (case sensitivity, top terms count) based on your content and search requirements
- Schema Inference: Enable schema inference and flattened representation for structured documents to enhance query capabilities
- Content Analysis: Use comprehensive content analysis settings to maximize metadata extraction and search optimization
- Performance Optimization: Monitor UDR generation performance and optimize processing parameters for large-scale content analysis
Next Steps
After successfully generating UDR metadata, you can:
- Semantic Cell Extraction: Extract semantic cells from processed documents for enhanced content understanding
- Embeddings Generation: Generate vector embeddings for AI-powered search and content analysis
- Processing Pipeline: Implement comprehensive processing pipeline operations for automated data processing workflows
- Search Integration: Integrate UDR metadata with Lexi search capabilities for enhanced document discovery
- Content Optimization: Optimize content processing and metadata generation based on search performance and user requirements