Comprehensive guide to document chunking in the View Processing platform for enhanced content analysis, semantic understanding, and vector embeddings generation.
Overview
Document chunking in the View Processing platform provides intelligent content segmentation capabilities for semantic cells, enabling optimal processing for vector embeddings generation and enhanced search capabilities. The chunking service processes semantic cells extracted from documents and creates appropriately sized chunks based on configurable parameters including token limits, content length constraints, and overlap settings.
The chunking service is accessible via the View Processing API at [http|https]://[hostname]:[port]/v1.0/tenants/[tenant-guid]/processing/chunking
and supports comprehensive content segmentation for various document types including text, tables, lists, and structured data.
API Endpoints
- POST
/v1.0/tenants/[tenant-guid]/processing/chunking
- Process semantic cells and generate optimized chunks for embeddings
Chunking Components
The chunking process creates structured chunks containing:
- Content Segmentation: Intelligent splitting of semantic cells into appropriately sized chunks
- Token Management: Configurable token limits and overlap settings for optimal embeddings generation
- Position Tracking: Precise start/end positions and length tracking for each chunk
- Hash Generation: MD5, SHA1, and SHA256 hashes for content integrity verification
- Metadata Preservation: Maintains semantic cell metadata and relationships during chunking
Process Semantic Cells
Processes semantic cells and generates optimized chunks using POST /v1.0/tenants/[tenant-guid]/processing/chunking
. Requires an embeddings rule configuration and semantic cells extracted from documents.
Request Parameters
Required Parameters
- EmbeddingsRule (object, Body, Required): Embeddings rule configuration containing chunking parameters and processing settings
- SemanticCells (array, Body, Required): Array of semantic cells to be processed and chunked
{
"EmbeddingsRule": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My storage server embeddings rule",
"ContentType": "*",
"GraphRepositoryGUID": "00000000-0000-0000-0000-000000000000",
"VectorRepositoryGUID": "00000000-0000-0000-0000-000000000000",
"ProcessingEndpoint": "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
"ProcessingAccessKey": "***ault",
"ChunkingServerUrl": "http://nginx-chunker:8000/",
"ChunkingServerApiKey": "***ault",
"MaxChunkingTasks": 4,
"MinChunkContentLength": 1,
"MaxChunkContentLength": 4096,
"MaxTokensPerChunk": 256,
"TokenOverlap": 32,
"TokenizationModel": "sentence-transformers/all-MiniLM-L6-v2",
"EmbeddingsServerUrl": "http://nginx-embeddings:8000/",
"EmbeddingsServerApiKey": "***ault",
"EmbeddingsGenerator": "LCProxy",
"EmbeddingsGeneratorUrl": "http://nginx-lcproxy:8000/",
"EmbeddingsGeneratorApiKey": "***ault",
"EmbeddingsBatchSize": 512,
"MaxEmbeddingsTasks": 32,
"MaxEmbeddingsRetries": 3,
"MaxEmbeddingsFailures": 3,
"VectorStoreUrl": "http://nginx-vector:8000/",
"VectorStoreAccessKey": "***ault",
"MaxContentLength": 16777216,
"CreatedUtc": "2025-05-12T20:50:09.462219Z"
},
"SemanticCells": [
{
"GUID": "5fb67dcf-2b80-4b24-928c-f9e659abc770",
"CellType": "Table",
"MD5Hash": "328A482C9D1C0F87B7EF5AA424B0A378",
"SHA1Hash": "A8E2B4E01E86E7BEF14A3274064C75E268694EDB",
"SHA256Hash": "03400563FEA89D3458D4304179F2E2690ACDC6E598B23F984B3A99737E9C5A26",
"Position": 0,
"Length": 0,
"Table": {
"Name": "",
"Columns": [
{
"Name": "Column1",
"Type": "String"
},
{
"Name": "Column2",
"Type": "String"
},
{
"Name": "Column3",
"Type": "String"
}
],
"Rows": [
{
"Column1": "Row names",
"Column2": "Column 1",
"Column3": "Column 2"
},
{
"Column1": "Row 1",
"Column2": "Value 1,1",
"Column3": "Value 1,2"
},
{
"Column1": "Row 2",
"Column2": "Value 1,2",
"Column3": "Value 2,2"
},
{
"Column1": "Row 3",
"Column2": "Value 1,3",
"Column3": "Value 3,2"
}
]
},
"Chunks": [],
"Children": []
}
]
}
curl --location 'http://localhost:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing/chunking' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{
"EmbeddingsRule": {
"GUID": "00000000-0000-0000-0000-000000000000",
"TenantGUID": "00000000-0000-0000-0000-000000000000",
"BucketGUID": "00000000-0000-0000-0000-000000000000",
"OwnerGUID": "00000000-0000-0000-0000-000000000000",
"Name": "My storage server embeddings rule",
"ContentType": "*",
"GraphRepositoryGUID": "00000000-0000-0000-0000-000000000000",
"VectorRepositoryGUID": "00000000-0000-0000-0000-000000000000",
"ProcessingEndpoint": "http://nginx-processor:8000/v1.0/tenants/00000000-0000-0000-0000-000000000000/processing",
"ProcessingAccessKey": "***ault",
"ChunkingServerUrl": "http://nginx-chunker:8000/",
"ChunkingServerApiKey": "***ault",
"MaxChunkingTasks": 4,
"MinChunkContentLength": 1,
"MaxChunkContentLength": 4096,
"MaxTokensPerChunk": 256,
"TokenOverlap": 32,
"TokenizationModel": "sentence-transformers/all-MiniLM-L6-v2",
"EmbeddingsServerUrl": "http://nginx-embeddings:8000/",
"EmbeddingsServerApiKey": "***ault",
"EmbeddingsGenerator": "LCProxy",
"EmbeddingsGeneratorUrl": "http://nginx-lcproxy:8000/",
"EmbeddingsGeneratorApiKey": "***ault",
"EmbeddingsBatchSize": 512,
"MaxEmbeddingsTasks": 32,
"MaxEmbeddingsRetries": 3,
"MaxEmbeddingsFailures": 3,
"VectorStoreUrl": "http://nginx-vector:8000/",
"VectorStoreAccessKey": "***ault",
"MaxContentLength": 16777216,
"CreatedUtc": "2025-05-12T20:50:09.462219Z"
},
"SemanticCells": [
{
"GUID": "5fb67dcf-2b80-4b24-928c-f9e659abc770",
"CellType": "Table",
"MD5Hash": "328A482C9D1C0F87B7EF5AA424B0A378",
"SHA1Hash": "A8E2B4E01E86E7BEF14A3274064C75E268694EDB",
"SHA256Hash": "03400563FEA89D3458D4304179F2E2690ACDC6E598B23F984B3A99737E9C5A26",
"Position": 0,
"Length": 0,
"Table": {
"Name": "",
"Columns": [
{
"Name": "Column1",
"Type": "String"
},
{
"Name": "Column2",
"Type": "String"
},
{
"Name": "Column3",
"Type": "String"
}
],
"Rows": [
{
"Column1": "Row names",
"Column2": "Column 1",
"Column3": "Column 2"
},
{
"Column1": "Row 1",
"Column2": "Value 1,1",
"Column3": "Value 1,2"
},
{
"Column1": "Row 2",
"Column2": "Value 1,2",
"Column3": "Value 2,2"
},
{
"Column1": "Row 3",
"Column2": "Value 1,3",
"Column3": "Value 3,2"
}
]
},
"Chunks": [],
"Children": []
},
{
"GUID": "f4386f25-ea7c-4785-85f1-6b4c940a15b5",
"CellType": "List",
"MD5Hash": "CECA959DE15881953B8752F0EBF349E0",
"SHA1Hash": "1CF7F4E3BF58B6D17D5A12A87643535EC526A3DD",
"SHA256Hash": "783953D669985425418A39D45B78ACFB426B6A52745E677A2072A6EF0613F9FE",
"Position": 1,
"Length": 0,
"UnorderedList": [
"Item 1",
"Item 2",
"Item 3"
],
"Chunks": [],
"Children": []
},
{
"GUID": "acc2eee6-d676-4071-84cf-9bf2c988d40e",
"CellType": "Text",
"MD5Hash": "0E8C870D0EBFA77DBD48AE497B47A60F",
"SHA1Hash": "813CC76A674EE67B9BD794552E1251C68A15C27F",
"SHA256Hash": "76F66994ECBFB99B9D22A6C1EDD3962858EFA8AA4D08B1B921948A32BAB59A4D",
"Position": 2,
"Length": 0,
"Content": "This is a sample PDF document.",
"Chunks": [],
"Children": []
},
{
"GUID": "d27836e3-1ba9-4eae-aba3-5523c6bdbe08",
"CellType": "Text",
"MD5Hash": "6CD3556DEB0DA54BCA060B4C39479839",
"SHA1Hash": "943A702D06F34599AEE1F8DA8EF9F7296031D699",
"SHA256Hash": "315F5BDB76D078C43B8AC0064E4A0164612B1FCE77C869345BFC94C75894EDD3",
"Position": 3,
"Length": 0,
"Content": "Hello, world!",
"Chunks": [],
"Children": []
}
]
}'
import { ViewProcessorSdk } from "view-sdk";
const api = new ViewProcessorSdk(
"http://localhost:8000/", //endpoint
"<tenant-guid>", //tenant Id
"default" //access token
);
const chunking = async () => {
try {
const response = await api.process.chunking({
EmbeddingsRule: {
GUID: "<embeddings-rule-guid>",
TenantGUID: "<tenant-guid>",
BucketGUID: "<bucket-guid>",
OwnerGUID: "<owner-guid>",
Name: "My storage server embeddings rule",
ContentType: "*",
GraphRepositoryGUID: "<graph-repository-guid>",
VectorRepositoryGUID: "<vector-repository-guid>",
ProcessingEndpoint: "http://nginx-processor:8000/v1.0/tenants/<tenant-guid>/processing",
ProcessingAccessKey: "***ault",
ChunkingServerUrl: "http://nginx-chunker:8000/",
ChunkingServerApiKey: "***ault",
MaxChunkingTasks: 4,
MinChunkContentLength: 1,
MaxChunkContentLength: 4096,
MaxTokensPerChunk: 256,
TokenOverlap: 32,
TokenizationModel: "sentence-transformers/all-MiniLM-L6-v2",
EmbeddingsServerUrl: "http://nginx-embeddings:8000/",
EmbeddingsServerApiKey: "***ault",
EmbeddingsGenerator: "LCProxy",
EmbeddingsGeneratorUrl: "http://nginx-lcproxy:8000/",
EmbeddingsGeneratorApiKey: "***ault",
EmbeddingsBatchSize: 512,
MaxEmbeddingsTasks: 32,
MaxEmbeddingsRetries: 3,
MaxEmbeddingsFailures: 3,
VectorStoreUrl: "http://nginx-vector:8000/",
VectorStoreAccessKey: "***ault",
MaxContentLength: 16777216
},
SemanticCells: [
{
GUID: "<semantic-cell-guid>",
CellType: "Text",
Content: "Sample text content to be chunked",
Chunks: [],
Children: []
}
]
});
console.log(response);
} catch (error) {
console.error('Error chunking:', error);
}
};
chunking();
Response
[
{
"GUID": "5fb67dcf-2b80-4b24-928c-f9e659abc770",
"CellType": "Table",
"MD5Hash": "328A482C9D1C0F87B7EF5AA424B0A378",
"SHA1Hash": "A8E2B4E01E86E7BEF14A3274064C75E268694EDB",
"SHA256Hash": "03400563FEA89D3458D4304179F2E2690ACDC6E598B23F984B3A99737E9C5A26",
"Position": 0,
"Length": 346,
"Table": {
"Columns": [
{
"Name": "Column1",
"Type": "String"
},
{
"Name": "Column2",
"Type": "String"
},
{
"Name": "Column3",
"Type": "String"
}
],
"Rows": [
{
"Column1": "Row names",
"Column2": "Column 1",
"Column3": "Column 2"
},
{
"Column1": "Row 1",
"Column2": "Value 1,1",
"Column3": "Value 1,2"
},
{
"Column1": "Row 2",
"Column2": "Value 1,2",
"Column3": "Value 2,2"
},
{
"Column1": "Row 3",
"Column2": "Value 1,3",
"Column3": "Value 3,2"
}
]
},
"Chunks": [
{
"GUID": "5ee15429-1b62-468d-809e-c7141938ff89",
"MD5Hash": "24122271F12B374F3063B862A6760910",
"SHA1Hash": "B11B311C8DA67D94ECB96F3C3E007AC5F168B9D0",
"SHA256Hash": "3FF94178D1FF71B447213A1107E87A28FCF95FFF4C1C56BB5BC1E160CD831352",
"Position": 0,
"Start": 0,
"End": 88,
"Length": 88,
"Content": "| Column1 | Column2 | Column3 |\n| --- | --- | --- |\n| Row names | Column 1 | Column 2 |\n",
"Embeddings": []
},
{
"GUID": "f4af12e2-3471-4346-a52a-b5ab39fed84d",
"MD5Hash": "76683CF2E648B7AB1D93E30EE62639E1",
"SHA1Hash": "7C3C82CB6D6E638A4F38B096C372E1A0B114FC23",
"SHA256Hash": "1084DF649B4F40478B81688709412A4C36CBC592A78578EBD8767E1F06C18E38",
"Position": 1,
"Start": 88,
"End": 174,
"Length": 86,
"Content": "| Column1 | Column2 | Column3 |\n| --- | --- | --- |\n| Row 1 | Value 1,1 | Value 1,2 |\n",
"Embeddings": []
},
{
"GUID": "1e93f264-1959-41e4-bb90-cca849a7915c",
"MD5Hash": "0572CBCBA0401696B8B589A77B7EABAE",
"SHA1Hash": "1F0460E5D96B66288C81ABDAF32E8A7380F613BC",
"SHA256Hash": "C41D16155AEC23EC0CDCED1E3587B8750D786DCCC4115A61E5D37890B7E57136",
"Position": 2,
"Start": 174,
"End": 260,
"Length": 86,
"Content": "| Column1 | Column2 | Column3 |\n| --- | --- | --- |\n| Row 2 | Value 1,2 | Value 2,2 |\n",
"Embeddings": []
},
{
"GUID": "51f8b3e9-4bc7-4f0b-a365-0fb7ae76a292",
"MD5Hash": "4A3C51A2FE593342DC2AEA745614F574",
"SHA1Hash": "E7F8F855DC40762B15E90D38F3654B37FBA0484E",
"SHA256Hash": "4F7B475F316A6A269F050BCDB226DDD4191557670431BB3F8CD451AF96D09738",
"Position": 3,
"Start": 260,
"End": 346,
"Length": 86,
"Content": "| Column1 | Column2 | Column3 |\n| --- | --- | --- |\n| Row 3 | Value 1,3 | Value 3,2 |\n",
"Embeddings": []
}
],
"Children": []
},
{
"GUID": "f4386f25-ea7c-4785-85f1-6b4c940a15b5",
"CellType": "List",
"MD5Hash": "CECA959DE15881953B8752F0EBF349E0",
"SHA1Hash": "1CF7F4E3BF58B6D17D5A12A87643535EC526A3DD",
"SHA256Hash": "783953D669985425418A39D45B78ACFB426B6A52745E677A2072A6EF0613F9FE",
"Position": 1,
"Length": 19,
"UnorderedList": [
"Item 1",
"Item 2",
"Item 3"
],
"Chunks": [
{
"GUID": "287721a2-1eda-472c-a82e-ddba1be6e541",
"MD5Hash": "5B83415D8D003EE33EC1B30D86F6E249",
"SHA1Hash": "F4CB2BE5A4175F964FAF6C057E8A022D7E02C252",
"SHA256Hash": "8B5B410B7110D6BCDE1844CE42A99E99372EE24A28F9C7CE375E9DED8596281D",
"Position": 0,
"Start": 0,
"End": 19,
"Length": 19,
"Content": "\nItem 1Item 2Item 3",
"Embeddings": []
}
],
"Children": []
},
{
"GUID": "acc2eee6-d676-4071-84cf-9bf2c988d40e",
"CellType": "Text",
"MD5Hash": "0E8C870D0EBFA77DBD48AE497B47A60F",
"SHA1Hash": "813CC76A674EE67B9BD794552E1251C68A15C27F",
"SHA256Hash": "76F66994ECBFB99B9D22A6C1EDD3962858EFA8AA4D08B1B921948A32BAB59A4D",
"Position": 2,
"Length": 31,
"Content": "This is a sample PDF document.",
"Chunks": [
{
"GUID": "edf3e5b1-bf1c-4388-8452-af287f93cbc6",
"MD5Hash": "6C533E89A67F27C8E673144AF9E315F4",
"SHA1Hash": "32AA918A464CAAE5A05DC0F3A3CE18B930EB0F8E",
"SHA256Hash": "599114DB5F943C34493A4CAB0D325F31F4CCF5815F7AA25FE0AB4FA57AFBA58C",
"Position": 0,
"Start": 0,
"End": 6,
"Length": 31,
"Content": "This is a sample PDF document .",
"Embeddings": []
}
],
"Children": []
},
{
"GUID": "d27836e3-1ba9-4eae-aba3-5523c6bdbe08",
"CellType": "Text",
"MD5Hash": "6CD3556DEB0DA54BCA060B4C39479839",
"SHA1Hash": "943A702D06F34599AEE1F8DA8EF9F7296031D699",
"SHA256Hash": "315F5BDB76D078C43B8AC0064E4A0164612B1FCE77C869345BFC94C75894EDD3",
"Position": 3,
"Length": 14,
"Content": "Hello, world!",
"Chunks": [
{
"GUID": "261e5733-7468-47bc-8b24-a73a5a0279de",
"MD5Hash": "B78CE4F8ABFA36E99230881237FCA3CE",
"SHA1Hash": "C3FB9F01453F15D5BC91625E5A8B7D3A34DC9CC7",
"SHA256Hash": "216F2EB16E159AAE2EB7A6448599D22F4B336DDDB8BD63F39ACEB6E85163EDF6",
"Position": 0,
"Start": 0,
"End": 2,
"Length": 14,
"Content": "Hello , world!",
"Embeddings": []
}
],
"Children": []
},
{
"GUID": "ee46f6a0-6682-47ca-b562-fa91349f206f",
"CellType": "Text",
"MD5Hash": "B2DDFDC0D5ED913DB6DCCE6AE4ABD79E",
"SHA1Hash": "47631108BB5E6C51E2712797C743CA777AF49F3B",
"SHA256Hash": "F525F71A38AE651D4639D8CCA907985B731704217D0CE91349AD1287A379EF41",
"Position": 4,
"Length": 3290,
"Content": "Artificial intelligence (AI) refers to the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.\nHigh-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: 'A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore.'\nVarious subfields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, perception, and support for robotics.[a] General intelligence—the ability to complete any task performed by a human on an at least equal level—is among the field's long-term goals. To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.\nArtificial intelligence was founded as an academic discipline in 1956, and the field went through multiple cycles of optimism throughout its history, followed by periods of disappointment and loss of funding, known as AI winters. Funding and interest vastly increased after 2012 when deep learning outperformed previous AI techniques. This growth accelerated further after 2017 with the transformer architecture, and by the early 2020s many billions of dollars were being invested in AI and the field experienced rapid ongoing progress in what has become known as the AI boom. The emergence of advanced generative AI in the midst of the AI boom and its ability to create and modify content exposed several unintended consequences and harms in the present and raised concerns about the risks of AI and its long-term effects in the future, prompting discussions about regulatory policies to ensure the safety and benefits of the technology.",
"Chunks": [
{
"GUID": "c971808b-085b-4fcf-8d33-4aa8d731d5d0",
"MD5Hash": "D051C6F784313D0854D114C7D77AC3C5",
"SHA1Hash": "95AF3CAA603F21A585D968660A90CD4A99A5ACA6",
"SHA256Hash": "49DFC276826BBC5D8940EEC8996A6A5E4AC896308F122888ABCF55ECD456E1DA",
"Position": 0,
"Start": 0,
"End": 255,
"Length": 1365,
"Content": "Artificial intelligence ( AI ) refers to the capability of computational systems to perform tasks typically associated with human intelligence , such as learning , reasoning , problem - solving , perception , and decision - making . It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals . Such machines may be called AIs . High - profile applications of AI include advanced web search engines ( e . g . , Google Search ) ; recommendation systems ( used by YouTube , Amazon , and Netflix ) ; virtual assistants ( e . g . , Google Assistant , Siri , and Alexa ) ; autonomous vehicles ( e . g . , Waymo ) ; generative and creative tools ( e . g . , ChatGPT and AI art ) ; and superhuman play and analysis in strategy games ( e . g . , chess and Go ) . However , many AI applications are not perceived as AI : ' A lot of cutting edge AI has filtered into general applications , often without being called AI because once something becomes useful enough and common enough it ' s not labeled AI anymore . ' Various subfields of AI research are centered around particular goals and the use of particular tools . The traditional goals of AI research include learning , reasoning",
"Embeddings": []
},
{
"GUID": "2aa4a98e-a392-4de6-abb2-cffc8b8ba3cb",
"MD5Hash": "8AB83A89E07A09076FA178EA40F13C8B",
"SHA1Hash": "AF9BBC0E11CF455A0B912EEBF30A7516A4B236ED",
"SHA256Hash": "8953557648F1115202178F3342B668E4084B93F42182DC48EB4EE0870AAFFEE0",
"Position": 1,
"Start": 224,
"End": 479,
"Length": 1501,
"Content": "labeled AI anymore . ' Various subfields of AI research are centered around particular goals and the use of particular tools . The traditional goals of AI research include learning , reasoning , knowledge representation , planning , natural language processing , perception , and support for robotics . [ a ] General intelligence—the ability to complete any task performed by a human on an at least equal level—is among the field ' s long - term goals . To reach these goals , AI researchers have adapted and integrated a wide range of techniques , including search and mathematical optimization , formal logic , artificial neural networks , and methods based on statistics , operations research , and economics . [ b ] AI also draws upon psychology , linguistics , philosophy , neuroscience , and other fields . Artificial intelligence was founded as an academic discipline in 1956 , and the field went through multiple cycles of optimism throughout its history , followed by periods of disappointment and loss of funding , known as AI winters . Funding and interest vastly increased after 2012 when deep learning outperformed previous AI techniques . This growth accelerated further after 2017 with the transformer architecture , and by the early 2020s many billions of dollars were being invested in AI and the field experienced rapid ongoing progress in what has become known as the AI boom . The emergence of advanced generative AI in the midst of the AI boom and its ability to create and modify",
"Embeddings": []
},
{
"GUID": "86d733c5-c577-4cf4-9134-663642c16582",
"MD5Hash": "D77E3ACBC8277B15D577CB2F83BF2EAF",
"SHA1Hash": "50C6069CF0F64299D36BE4680F38D43A8416518C",
"SHA256Hash": "04D32AAEB1A6E4EE5EF48A9CFFD005238061EB6134FE86C224E41264D3505D22",
"Position": 2,
"Start": 448,
"End": 522,
"Length": 424,
"Content": "ongoing progress in what has become known as the AI boom . The emergence of advanced generative AI in the midst of the AI boom and its ability to create and modify content exposed several unintended consequences and harms in the present and raised concerns about the risks of AI and its long - term effects in the future , prompting discussions about regulatory policies to ensure the safety and benefits of the technology .",
"Embeddings": []
}
],
"Children": []
}
]
Next Steps
After successfully processing semantic cells through chunking, you can:
- Embeddings Generation: Generate vector embeddings for the created chunks using the embeddings generation service
- Vector Storage: Store generated embeddings in the vector database for enhanced search capabilities
- Search Integration: Integrate chunked content with Lexi search capabilities for semantic document discovery
- Processing Pipeline: Implement comprehensive processing pipeline operations for automated document processing workflows
- Content Optimization: Optimize chunking parameters based on search performance and user requirements
- Metadata Enhancement: Combine chunking results with UDR metadata generation for comprehensive content analysis