Overview
The Contextual Document Loader node enhances document chunking by generating succinct contextual descriptions for each chunk. It processes input documents, splits them into smaller chunks using a connected text splitter, and then uses a connected AI language model to generate short context prompts that situate each chunk within the whole document. This improves retrieval accuracy in vector search or other document retrieval systems by providing richer metadata for each chunk.
Typical use cases include:
- Preparing documents for semantic search or retrieval-augmented generation (RAG) workflows.
- Enhancing vector store nodes with contextual information to improve search relevance.
- Automatically annotating document chunks with meaningful summaries to aid downstream AI tasks.
For example, when processing a large report, this node can generate brief contextual snippets for each section, helping a search engine understand where each chunk fits in the overall document structure.
Properties
| Name | Meaning |
|---|---|
| Context Prompt | Template prompt used by the AI model to generate a succinct context description for each document chunk. The full document and chunk are provided automatically. |
| Options | Collection of additional settings: |
| - Batch Size | Number of chunks processed in parallel when generating context (default 10). |
| - Context Prefix | Text prefix added before the generated contextual description (default "Context: "). |
| - Context Separator | Separator string placed between the context and the chunk content in the output (default is two newlines). |
| - Max Retries | Maximum number of retry attempts if context generation fails (default 3). |
| - Metadata | JSON object containing extra metadata fields to add to all output documents. |
Note: There is also a notice property indicating that this node can be connected to Vector Store nodes, but it has no user-configurable value.
Output
The node outputs an array of document objects, each representing a chunk of the original input document enriched with contextual information. Each output document contains:
pageContent: A string combining the generated context (prefixed and separated as configured) followed by the original chunk text.metadata: An object including:- Any user-provided metadata from the options.
chunkIndex: The index of the chunk within the document.originalChunk: The raw chunk text before adding context.hasContext: Boolean indicating whether context was successfully generated.context: The generated contextual description string (if any).- Additional metadata fields such as source, fileName, and fileType if present in the input.
This enriched output is designed to be consumed by downstream nodes like vector stores for improved semantic search.
Dependencies
- Requires a connected AI Language Model node to generate contextual descriptions.
- Requires a connected Text Splitter node to split input documents into chunks.
- Uses an external AI model invocation method (via the connected language model) to generate context.
- No direct external API keys or environment variables are managed by this node itself; these must be configured in the connected AI model node.
Troubleshooting
- No language model connected error: The node throws an error if no AI language model is connected. Ensure you connect a compatible AI model node before execution.
- No text splitter connected error: The node requires a text splitter node connection to function. Connect a text splitter node to provide chunking functionality.
- Invalid JSON in metadata field: If the metadata option contains invalid JSON, the node will throw an error. Validate your JSON syntax before saving.
- Context generation failures: The node retries context generation up to the configured max retries. Persistent failures may indicate issues with the AI model or network connectivity.
- Direct execution error: This node is intended as a sub-node and cannot be executed standalone. Attempting to run it directly results in an error.
Links and References
- GitHub Repository for Contextual Document Loader
Contains source code and documentation for this node.