Type Detection

Comprehensive guide to type detection in the View Processing platform for automated document format identification and content analysis.

Overview

Type detection provides comprehensive automated document format identification and content analysis capabilities within the View Processing platform. It analyzes data content to automatically determine document types, MIME types, and file extensions, enabling appropriate processing workflows for different content formats.

Type detection is accessible via the View Processing API at [http|https]://[hostname]:[port]/[apiversion]/tenants/[tenantguid]/processing/typedetector and supports comprehensive content analysis for various document types and formats.

API Endpoints

  • POST /v1.0/tenants/[tenant-guid]/processing/typedetector - Detect document type and format

Important Notes

  • CSV Detection: CSV documents currently rely on hints (content-type header) as irregular CSV files could otherwise be detected as plain text
  • Content Analysis: Type detection analyzes actual content rather than file extensions for accurate format identification
  • Processing Prerequisites: Type detection is typically performed before other processing operations to ensure appropriate handling

Detect Document Type

Detects document type and format from content using POST /v1.0/tenants/[tenant-guid]/processing/typedetector. Analyzes data content to determine MIME type, file extension, and document type for appropriate processing workflows.

Request Parameters

Required Parameters

  • Data (object/string, Body, Required): Content data to analyze for type detection (can be JSON object, string, or other content format)
curl --location 'http://view.homedns.org:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/typedetection' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}'
import { ViewProcessorSdk } from "view-sdk";

const api = new ViewProcessorSdk(
  "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access key
);

const typeDetection = async () => {
  try {
    const response = await api.processSdk.typeDetection({
      menu: {
        id: "file",
        value: "File",
        popup: {
          menuitem: [
            { value: "New", onclick: "CreateNewDoc()" },
            { value: "Open", onclick: "OpenDoc()" },
            { value: "Close", onclick: "CloseDoc()" },
          ],
        },
      },
    });
    console.log(response);
  } catch (error) {
    console.error("Error type detection:", error);
  }
};

typeDetection();
import view_sdk
from view_sdk import processor

sdk = view_sdk.configure( access_key="default",base_url="localhost", tenant_guid= "<tenant-guid>")

def typeDetection():
    result = processor.TypeDetector.type_detection(
        data={
            "menu": {
                "id": "file",
                "value": "File",
                "popup": {
                    "menuitem": [
                        {"value": "New", "onclick": "CreateNewDoc()"},
                        {"value": "Open", "onclick": "OpenDoc()"},
                        {"value": "Close", "onclick": "CloseDoc()"}
                    ]
                }
            }
        }
    )
    print(result)
    
typeDetection()
using View.Sdk;
using View.Sdk.Processor;

ViewProcessorSdk sdk = new ViewProcessorSdk(Guid.Parse("<tenant-guid>"),"default", "http://localhost:8000/");
            
string req = @"{
                 ""menu"": {
                 ""id"": ""file"",
                 ""value"": ""File"",
                 ""popup"": {
                 ""menuitem"": [
                       {""value"": ""New"", ""onclick"": ""CreateNewDoc()""},
                       {""value"": ""Open"", ""onclick"": ""OpenDoc()""},
                       {""value"": ""Close"", ""onclick"": ""CloseDoc()""}
                   ]
                  }
                }
             }";

TypeResult response = await sdk.TypeDetector.DetectType(req);

Response

Returns type detection results with MIME type, file extension, and document type information. The response structure is consistent across all input data types.

 {
   "MimeType": "application/json",
   "Extension": "json",
   "Type": "Json"
}

Best Practices

When managing type detection in the View Processing platform, consider the following recommendations for optimal document format identification and content analysis:

  • Content Analysis: Provide sufficient content data for accurate type detection, as the system analyzes actual content rather than file extensions
  • CSV Handling: Use content-type headers or explicit hints for CSV documents to ensure accurate detection of irregular CSV formats
  • Processing Workflow: Perform type detection before other processing operations to ensure appropriate handling for different document formats
  • Content Validation: Validate detected types against expected formats to ensure processing accuracy and reliability
  • Performance Optimization: Monitor type detection performance and optimize content analysis parameters for large-scale document processing

Next Steps

After successfully detecting document types, you can:

  • Semantic Cell Extraction: Extract semantic cells from detected documents for enhanced content understanding and analysis
  • UDR Generation: Generate UDR metadata for detected documents to enable comprehensive search capabilities
  • Processing Pipeline: Integrate type detection into comprehensive processing pipeline workflows for automated document processing
  • Content Analysis: Use detected document types for appropriate content analysis and processing strategies
  • Search Optimization: Optimize search capabilities using document type information for enhanced content discovery and analysis