Document Chunking - View Processing Platform

Overview

Document chunking in the View Processing platform provides intelligent content segmentation capabilities for semantic cells, enabling optimal processing for vector embeddings generation and enhanced search capabilities. The chunking service processes semantic cells extracted from documents and creates appropriately sized chunks based on configurable parameters including token limits, content length constraints, and overlap settings.

The chunking service is accessible via the View Processing API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/processing/chunking and supports comprehensive content segmentation for various document types including text, tables, lists, and structured data.

API Endpoints

POST /v1.0/tenants/[tenant-guid]/processing/chunking - Process semantic cells and generate optimized chunks for embeddings

Chunking Components

The chunking process creates structured chunks containing:

Content Segmentation: Intelligent splitting of semantic cells into appropriately sized chunks
Token Management: Configurable token limits and overlap settings for optimal embeddings generation
Position Tracking: Precise start/end positions and length tracking for each chunk
Hash Generation: MD5, SHA1, and SHA256 hashes for content integrity verification
Metadata Preservation: Maintains semantic cell metadata and relationships during chunking

Process Semantic Cells

Processes semantic cells and generates optimized chunks using POST /v1.0/tenants/[tenant-guid]/processing/chunking. Requires an embeddings rule configuration and semantic cells extracted from documents.

Request Parameters

Required Parameters

EmbeddingsRule (object, Body, Required): Embeddings rule configuration containing chunking parameters and processing settings
SemanticCells (array, Body, Required): Array of semantic cells to be processed and chunked

{
    "EmbeddingsRule": {
        "GUID": "00000000-0000-0000-0000-000000000000",
        "TenantGUID": "00000000-0000-0000-0000-000000000000",
        "BucketGUID": "00000000-0000-0000-0000-000000000000",
        "OwnerGUID": "00000000-0000-0000-0000-000000000000",
        "Name": "My storage server embeddings rule",
        "ContentType": "*",
        "GraphRepositoryGUID": "00000000-0000-0000-0000-000000000000",
        "VectorRepositoryGUID": "00000000-0000-0000-0000-000000000000",
        "ProcessingEndpoint": "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
        "ProcessingAccessKey": "***ault",
        "ChunkingServerUrl": "http://nginx-chunker:8000/",
        "ChunkingServerApiKey": "***ault",
        "MaxChunkingTasks": 4,
        "MinChunkContentLength": 1,
        "MaxChunkContentLength": 4096,
        "MaxTokensPerChunk": 256,
        "TokenOverlap": 32,
        "TokenizationModel": "sentence-transformers/all-MiniLM-L6-v2",
        "EmbeddingsServerUrl": "http://nginx-embeddings:8000/",
        "EmbeddingsServerApiKey": "***ault",
        "EmbeddingsGenerator": "LCProxy",
        "EmbeddingsGeneratorUrl": "http://nginx-lcproxy:8000/",
        "EmbeddingsGeneratorApiKey": "***ault",
        "EmbeddingsBatchSize": 512,
        "MaxEmbeddingsTasks": 32,
        "MaxEmbeddingsRetries": 3,
        "MaxEmbeddingsFailures": 3,
        "VectorStoreUrl": "http://nginx-vector:8000/",
        "VectorStoreAccessKey": "***ault",
        "MaxContentLength": 16777216,
        "CreatedUtc": "2025-05-12T20:50:09.462219Z"
    },
    "SemanticCells": [
        {
            "GUID": "5fb67dcf-2b80-4b24-928c-f9e659abc770",
            "CellType": "Table",
            "MD5Hash": "328A482C9D1C0F87B7EF5AA424B0A378",
            "SHA1Hash": "A8E2B4E01E86E7BEF14A3274064C75E268694EDB",
            "SHA256Hash": "03400563FEA89D3458D4304179F2E2690ACDC6E598B23F984B3A99737E9C5A26",
            "Position": 0,
            "Length": 0,
            "Table": {
                "Name": "",
                "Columns": [
                    {
                        "Name": "Column1",
                        "Type": "String"
                    },
                    {
                        "Name": "Column2",
                        "Type": "String"
                    },
                    {
                        "Name": "Column3",
                        "Type": "String"
                    }
                ],
                "Rows": [
                    {
                        "Column1": "Row names",
                        "Column2": "Column 1",
                        "Column3": "Column 2"
                    },
                    {
                        "Column1": "Row 1",
                        "Column2": "Value 1,1",
                        "Column3": "Value 1,2"
                    },
                    {
                        "Column1": "Row 2",
                        "Column2": "Value 1,2",
                        "Column3": "Value 2,2"
                    },
                    {
                        "Column1": "Row 3",
                        "Column2": "Value 1,3",
                        "Column3": "Value 3,2"
                    }
                ]
            },
            "Chunks": [],
            "Children": []
        }
    ]
}

curl --location 'http://localhost:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/chunking' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{
    "EmbeddingsRule": {
        "GUID": "00000000-0000-0000-0000-000000000000",
        "TenantGUID": "00000000-0000-0000-0000-000000000000",
        "BucketGUID": "00000000-0000-0000-0000-000000000000",
        "OwnerGUID": "00000000-0000-0000-0000-000000000000",
        "Name": "My storage server embeddings rule",
        "ContentType": "*",
        "GraphRepositoryGUID": "00000000-0000-0000-0000-000000000000",
        "VectorRepositoryGUID": "00000000-0000-0000-0000-000000000000",
        "ProcessingEndpoint": "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
        "ProcessingAccessKey": "***ault",
        "ChunkingServerUrl": "http://nginx-chunker:8000/",
        "ChunkingServerApiKey": "***ault",
        "MaxChunkingTasks": 4,
        "MinChunkContentLength": 1,
        "MaxChunkContentLength": 4096,
        "MaxTokensPerChunk": 256,
        "TokenOverlap": 32,
        "TokenizationModel": "sentence-transformers/all-MiniLM-L6-v2",
        "EmbeddingsServerUrl": "http://nginx-embeddings:8000/",
        "EmbeddingsServerApiKey": "***ault",
        "EmbeddingsGenerator": "LCProxy",
        "EmbeddingsGeneratorUrl": "http://nginx-lcproxy:8000/",
        "EmbeddingsGeneratorApiKey": "***ault",
        "EmbeddingsBatchSize": 512,
        "MaxEmbeddingsTasks": 32,
        "MaxEmbeddingsRetries": 3,
        "MaxEmbeddingsFailures": 3,
        "VectorStoreUrl": "http://nginx-vector:8000/",
        "VectorStoreAccessKey": "***ault",
        "MaxContentLength": 16777216,
        "CreatedUtc": "2025-05-12T20:50:09.462219Z"
    },
    "SemanticCells": [
        {
            "GUID": "5fb67dcf-2b80-4b24-928c-f9e659abc770",
            "CellType": "Table",
            "MD5Hash": "328A482C9D1C0F87B7EF5AA424B0A378",
            "SHA1Hash": "A8E2B4E01E86E7BEF14A3274064C75E268694EDB",
            "SHA256Hash": "03400563FEA89D3458D4304179F2E2690ACDC6E598B23F984B3A99737E9C5A26",
            "Position": 0,
            "Length": 0,
            "Table": {
                "Name": "",
                "Columns": [
                    {
                        "Name": "Column1",
                        "Type": "String"
                    },
                    {
                        "Name": "Column2",
                        "Type": "String"
                    },
                    {
                        "Name": "Column3",
                        "Type": "String"
                    }
                ],
                "Rows": [
                    {
                        "Column1": "Row names",
                        "Column2": "Column 1",
                        "Column3": "Column 2"
                    },
                    {
                        "Column1": "Row 1",
                        "Column2": "Value 1,1",
                        "Column3": "Value 1,2"
                    },
                    {
                        "Column1": "Row 2",
                        "Column2": "Value 1,2",
                        "Column3": "Value 2,2"
                    },
                    {
                        "Column1": "Row 3",
                        "Column2": "Value 1,3",
                        "Column3": "Value 3,2"
                    }
                ]
            },
            "Chunks": [],
            "Children": []
        },
        {
            "GUID": "f4386f25-ea7c-4785-85f1-6b4c940a15b5",
            "CellType": "List",
            "MD5Hash": "CECA959DE15881953B8752F0EBF349E0",
            "SHA1Hash": "1CF7F4E3BF58B6D17D5A12A87643535EC526A3DD",
            "SHA256Hash": "783953D669985425418A39D45B78ACFB426B6A52745E677A2072A6EF0613F9FE",
            "Position": 1,
            "Length": 0,
            "UnorderedList": [
                "Item 1",
                "Item 2",
                "Item 3"
            ],
            "Chunks": [],
            "Children": []
        },
        {
            "GUID": "acc2eee6-d676-4071-84cf-9bf2c988d40e",
            "CellType": "Text",
            "MD5Hash": "0E8C870D0EBFA77DBD48AE497B47A60F",
            "SHA1Hash": "813CC76A674EE67B9BD794552E1251C68A15C27F",
            "SHA256Hash": "76F66994ECBFB99B9D22A6C1EDD3962858EFA8AA4D08B1B921948A32BAB59A4D",
            "Position": 2,
            "Length": 0,
            "Content": "This is a sample PDF document.",
            "Chunks": [],
            "Children": []
        },
        {
            "GUID": "d27836e3-1ba9-4eae-aba3-5523c6bdbe08",
            "CellType": "Text",
            "MD5Hash": "6CD3556DEB0DA54BCA060B4C39479839",
            "SHA1Hash": "943A702D06F34599AEE1F8DA8EF9F7296031D699",
            "SHA256Hash": "315F5BDB76D078C43B8AC0064E4A0164612B1FCE77C869345BFC94C75894EDD3",
            "Position": 3,
            "Length": 0,
            "Content": "Hello, world!",
            "Chunks": [],
            "Children": []
        }
    ]
}'

import { ViewProcessorSdk } from "view-sdk";

const api = new ViewProcessorSdk(
   "http://localhost:8000/", //endpoint
  "<tenant-guid>", //tenant Id
  "default" //access token
);

const chunking = async () => {
  try {
    const response = await api.process.chunking({
      EmbeddingsRule: {
        GUID: "<embeddings-rule-guid>",
        TenantGUID: "<tenant-guid>",
        BucketGUID: "<bucket-guid>",
        OwnerGUID: "<owner-guid>",
        Name: "My storage server embeddings rule",
        ContentType: "*",
        GraphRepositoryGUID: "<graph-repository-guid>",
        VectorRepositoryGUID: "<vector-repository-guid>",
        ProcessingEndpoint: "http://nginx-processor:8000/v1.0/tenants/<tenant-guid>/processing",
        ProcessingAccessKey: "***ault",
        ChunkingServerUrl: "http://nginx-chunker:8000/",
        ChunkingServerApiKey: "***ault",
        MaxChunkingTasks: 4,
        MinChunkContentLength: 1,
        MaxChunkContentLength: 4096,
        MaxTokensPerChunk: 256,
        TokenOverlap: 32,
        TokenizationModel: "sentence-transformers/all-MiniLM-L6-v2",
        EmbeddingsServerUrl: "http://nginx-embeddings:8000/",
        EmbeddingsServerApiKey: "***ault",
        EmbeddingsGenerator: "LCProxy",
        EmbeddingsGeneratorUrl: "http://nginx-lcproxy:8000/",
        EmbeddingsGeneratorApiKey: "***ault",
        EmbeddingsBatchSize: 512,
        MaxEmbeddingsTasks: 32,
        MaxEmbeddingsRetries: 3,
        MaxEmbeddingsFailures: 3,
        VectorStoreUrl: "http://nginx-vector:8000/",
        VectorStoreAccessKey: "***ault",
        MaxContentLength: 16777216
      },
      SemanticCells: [
        {
          GUID: "<semantic-cell-guid>",
          CellType: "Text",
          Content: "Sample text content to be chunked",
          Chunks: [],
          Children: []
        }
      ]
    });
    console.log(response);
  } catch (error) {
    console.error('Error chunking:', error);
  }
};

chunking();

Response

[
    {
        "GUID": "5fb67dcf-2b80-4b24-928c-f9e659abc770",
        "CellType": "Table",
        "MD5Hash": "328A482C9D1C0F87B7EF5AA424B0A378",
        "SHA1Hash": "A8E2B4E01E86E7BEF14A3274064C75E268694EDB",
        "SHA256Hash": "03400563FEA89D3458D4304179F2E2690ACDC6E598B23F984B3A99737E9C5A26",
        "Position": 0,
        "Length": 346,
        "Table": {
            "Columns": [
                {
                    "Name": "Column1",
                    "Type": "String"
                },
                {
                    "Name": "Column2",
                    "Type": "String"
                },
                {
                    "Name": "Column3",
                    "Type": "String"
                }
            ],
            "Rows": [
                {
                    "Column1": "Row names",
                    "Column2": "Column 1",
                    "Column3": "Column 2"
                },
                {
                    "Column1": "Row 1",
                    "Column2": "Value 1,1",
                    "Column3": "Value 1,2"
                },
                {
                    "Column1": "Row 2",
                    "Column2": "Value 1,2",
                    "Column3": "Value 2,2"
                },
                {
                    "Column1": "Row 3",
                    "Column2": "Value 1,3",
                    "Column3": "Value 3,2"
                }
            ]
        },
        "Chunks": [
            {
                "GUID": "5ee15429-1b62-468d-809e-c7141938ff89",
                "MD5Hash": "24122271F12B374F3063B862A6760910",
                "SHA1Hash": "B11B311C8DA67D94ECB96F3C3E007AC5F168B9D0",
                "SHA256Hash": "3FF94178D1FF71B447213A1107E87A28FCF95FFF4C1C56BB5BC1E160CD831352",
                "Position": 0,
                "Start": 0,
                "End": 88,
                "Length": 88,
                "Content": "| Column1 | Column2 | Column3 |\n| --- | --- | --- |\n| Row names | Column 1 | Column 2 |\n",
                "Embeddings": []
            },
            {
                "GUID": "f4af12e2-3471-4346-a52a-b5ab39fed84d",
                "MD5Hash": "76683CF2E648B7AB1D93E30EE62639E1",
                "SHA1Hash": "7C3C82CB6D6E638A4F38B096C372E1A0B114FC23",
                "SHA256Hash": "1084DF649B4F40478B81688709412A4C36CBC592A78578EBD8767E1F06C18E38",
                "Position": 1,
                "Start": 88,
                "End": 174,
                "Length": 86,
                "Content": "| Column1 | Column2 | Column3 |\n| --- | --- | --- |\n| Row 1 | Value 1,1 | Value 1,2 |\n",
                "Embeddings": []
            },
            {
                "GUID": "1e93f264-1959-41e4-bb90-cca849a7915c",
                "MD5Hash": "0572CBCBA0401696B8B589A77B7EABAE",
                "SHA1Hash": "1F0460E5D96B66288C81ABDAF32E8A7380F613BC",
                "SHA256Hash": "C41D16155AEC23EC0CDCED1E3587B8750D786DCCC4115A61E5D37890B7E57136",
                "Position": 2,
                "Start": 174,
                "End": 260,
                "Length": 86,
                "Content": "| Column1 | Column2 | Column3 |\n| --- | --- | --- |\n| Row 2 | Value 1,2 | Value 2,2 |\n",
                "Embeddings": []
            },
            {
                "GUID": "51f8b3e9-4bc7-4f0b-a365-0fb7ae76a292",
                "MD5Hash": "4A3C51A2FE593342DC2AEA745614F574",
                "SHA1Hash": "E7F8F855DC40762B15E90D38F3654B37FBA0484E",
                "SHA256Hash": "4F7B475F316A6A269F050BCDB226DDD4191557670431BB3F8CD451AF96D09738",
                "Position": 3,
                "Start": 260,
                "End": 346,
                "Length": 86,
                "Content": "| Column1 | Column2 | Column3 |\n| --- | --- | --- |\n| Row 3 | Value 1,3 | Value 3,2 |\n",
                "Embeddings": []
            }
        ],
        "Children": []
    },
    {
        "GUID": "f4386f25-ea7c-4785-85f1-6b4c940a15b5",
        "CellType": "List",
        "MD5Hash": "CECA959DE15881953B8752F0EBF349E0",
        "SHA1Hash": "1CF7F4E3BF58B6D17D5A12A87643535EC526A3DD",
        "SHA256Hash": "783953D669985425418A39D45B78ACFB426B6A52745E677A2072A6EF0613F9FE",
        "Position": 1,
        "Length": 19,
        "UnorderedList": [
            "Item 1",
            "Item 2",
            "Item 3"
        ],
        "Chunks": [
            {
                "GUID": "287721a2-1eda-472c-a82e-ddba1be6e541",
                "MD5Hash": "5B83415D8D003EE33EC1B30D86F6E249",
                "SHA1Hash": "F4CB2BE5A4175F964FAF6C057E8A022D7E02C252",
                "SHA256Hash": "8B5B410B7110D6BCDE1844CE42A99E99372EE24A28F9C7CE375E9DED8596281D",
                "Position": 0,
                "Start": 0,
                "End": 19,
                "Length": 19,
                "Content": "\nItem 1Item 2Item 3",
                "Embeddings": []
            }
        ],
        "Children": []
    },
    {
        "GUID": "acc2eee6-d676-4071-84cf-9bf2c988d40e",
        "CellType": "Text",
        "MD5Hash": "0E8C870D0EBFA77DBD48AE497B47A60F",
        "SHA1Hash": "813CC76A674EE67B9BD794552E1251C68A15C27F",
        "SHA256Hash": "76F66994ECBFB99B9D22A6C1EDD3962858EFA8AA4D08B1B921948A32BAB59A4D",
        "Position": 2,
        "Length": 31,
        "Content": "This is a sample PDF document.",
        "Chunks": [
            {
                "GUID": "edf3e5b1-bf1c-4388-8452-af287f93cbc6",
                "MD5Hash": "6C533E89A67F27C8E673144AF9E315F4",
                "SHA1Hash": "32AA918A464CAAE5A05DC0F3A3CE18B930EB0F8E",
                "SHA256Hash": "599114DB5F943C34493A4CAB0D325F31F4CCF5815F7AA25FE0AB4FA57AFBA58C",
                "Position": 0,
                "Start": 0,
                "End": 6,
                "Length": 31,
                "Content": "This is a sample PDF document .",
                "Embeddings": []
            }
        ],
        "Children": []
    },
    {
        "GUID": "d27836e3-1ba9-4eae-aba3-5523c6bdbe08",
        "CellType": "Text",
        "MD5Hash": "6CD3556DEB0DA54BCA060B4C39479839",
        "SHA1Hash": "943A702D06F34599AEE1F8DA8EF9F7296031D699",
        "SHA256Hash": "315F5BDB76D078C43B8AC0064E4A0164612B1FCE77C869345BFC94C75894EDD3",
        "Position": 3,
        "Length": 14,
        "Content": "Hello, world!",
        "Chunks": [
            {
                "GUID": "261e5733-7468-47bc-8b24-a73a5a0279de",
                "MD5Hash": "B78CE4F8ABFA36E99230881237FCA3CE",
                "SHA1Hash": "C3FB9F01453F15D5BC91625E5A8B7D3A34DC9CC7",
                "SHA256Hash": "216F2EB16E159AAE2EB7A6448599D22F4B336DDDB8BD63F39ACEB6E85163EDF6",
                "Position": 0,
                "Start": 0,
                "End": 2,
                "Length": 14,
                "Content": "Hello , world!",
                "Embeddings": []
            }
        ],
        "Children": []
    },
    {
        "GUID": "ee46f6a0-6682-47ca-b562-fa91349f206f",
        "CellType": "Text",
        "MD5Hash": "B2DDFDC0D5ED913DB6DCCE6AE4ABD79E",
        "SHA1Hash": "47631108BB5E6C51E2712797C743CA777AF49F3B",
        "SHA256Hash": "F525F71A38AE651D4639D8CCA907985B731704217D0CE91349AD1287A379EF41",
        "Position": 4,
        "Length": 3290,
        "Content": "Artificial intelligence (AI) refers to the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.\nHigh-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: 'A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore.'\nVarious subfields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, perception, and support for robotics.[a] General intelligence—the ability to complete any task performed by a human on an at least equal level—is among the field's long-term goals. To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.\nArtificial intelligence was founded as an academic discipline in 1956, and the field went through multiple cycles of optimism throughout its history, followed by periods of disappointment and loss of funding, known as AI winters. Funding and interest vastly increased after 2012 when deep learning outperformed previous AI techniques. This growth accelerated further after 2017 with the transformer architecture, and by the early 2020s many billions of dollars were being invested in AI and the field experienced rapid ongoing progress in what has become known as the AI boom. The emergence of advanced generative AI in the midst of the AI boom and its ability to create and modify content exposed several unintended consequences and harms in the present and raised concerns about the risks of AI and its long-term effects in the future, prompting discussions about regulatory policies to ensure the safety and benefits of the technology.",
        "Chunks": [
            {
                "GUID": "c971808b-085b-4fcf-8d33-4aa8d731d5d0",
                "MD5Hash": "D051C6F784313D0854D114C7D77AC3C5",
                "SHA1Hash": "95AF3CAA603F21A585D968660A90CD4A99A5ACA6",
                "SHA256Hash": "49DFC276826BBC5D8940EEC8996A6A5E4AC896308F122888ABCF55ECD456E1DA",
                "Position": 0,
                "Start": 0,
                "End": 255,
                "Length": 1365,
                "Content": "Artificial intelligence ( AI ) refers to the capability of computational systems to perform tasks typically associated with human intelligence , such as learning , reasoning , problem - solving , perception , and decision - making . It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals . Such machines may be called AIs . High - profile applications of AI include advanced web search engines ( e . g . , Google Search ) ; recommendation systems ( used by YouTube , Amazon , and Netflix ) ; virtual assistants ( e . g . , Google Assistant , Siri , and Alexa ) ; autonomous vehicles ( e . g . , Waymo ) ; generative and creative tools ( e . g . , ChatGPT and AI art ) ; and superhuman play and analysis in strategy games ( e . g . , chess and Go ) . However , many AI applications are not perceived as AI : ' A lot of cutting edge AI has filtered into general applications , often without being called AI because once something becomes useful enough and common enough it ' s not labeled AI anymore . ' Various subfields of AI research are centered around particular goals and the use of particular tools . The traditional goals of AI research include learning , reasoning",
                "Embeddings": []
            },
            {
                "GUID": "2aa4a98e-a392-4de6-abb2-cffc8b8ba3cb",
                "MD5Hash": "8AB83A89E07A09076FA178EA40F13C8B",
                "SHA1Hash": "AF9BBC0E11CF455A0B912EEBF30A7516A4B236ED",
                "SHA256Hash": "8953557648F1115202178F3342B668E4084B93F42182DC48EB4EE0870AAFFEE0",
                "Position": 1,
                "Start": 224,
                "End": 479,
                "Length": 1501,
                "Content": "labeled AI anymore . ' Various subfields of AI research are centered around particular goals and the use of particular tools . The traditional goals of AI research include learning , reasoning , knowledge representation , planning , natural language processing , perception , and support for robotics . [ a ] General intelligence—the ability to complete any task performed by a human on an at least equal level—is among the field ' s long - term goals . To reach these goals , AI researchers have adapted and integrated a wide range of techniques , including search and mathematical optimization , formal logic , artificial neural networks , and methods based on statistics , operations research , and economics . [ b ] AI also draws upon psychology , linguistics , philosophy , neuroscience , and other fields . Artificial intelligence was founded as an academic discipline in 1956 , and the field went through multiple cycles of optimism throughout its history , followed by periods of disappointment and loss of funding , known as AI winters . Funding and interest vastly increased after 2012 when deep learning outperformed previous AI techniques . This growth accelerated further after 2017 with the transformer architecture , and by the early 2020s many billions of dollars were being invested in AI and the field experienced rapid ongoing progress in what has become known as the AI boom . The emergence of advanced generative AI in the midst of the AI boom and its ability to create and modify",
                "Embeddings": []
            },
            {
                "GUID": "86d733c5-c577-4cf4-9134-663642c16582",
                "MD5Hash": "D77E3ACBC8277B15D577CB2F83BF2EAF",
                "SHA1Hash": "50C6069CF0F64299D36BE4680F38D43A8416518C",
                "SHA256Hash": "04D32AAEB1A6E4EE5EF48A9CFFD005238061EB6134FE86C224E41264D3505D22",
                "Position": 2,
                "Start": 448,
                "End": 522,
                "Length": 424,
                "Content": "ongoing progress in what has become known as the AI boom . The emergence of advanced generative AI in the midst of the AI boom and its ability to create and modify content exposed several unintended consequences and harms in the present and raised concerns about the risks of AI and its long - term effects in the future , prompting discussions about regulatory policies to ensure the safety and benefits of the technology .",
                "Embeddings": []
            }
        ],
        "Children": []
    }
]

Next Steps

After successfully processing semantic cells through chunking, you can:

Embeddings Generation: Generate vector embeddings for the created chunks using the embeddings generation service
Vector Storage: Store generated embeddings in the vector database for enhanced search capabilities
Search Integration: Integrate chunked content with Lexi search capabilities for semantic document discovery
Processing Pipeline: Implement comprehensive processing pipeline operations for automated document processing workflows
Content Optimization: Optimize chunking parameters based on search performance and user requirements
Metadata Enhancement: Combine chunking results with UDR metadata generation for comprehensive content analysis