Skip to Content
Welcome to the new Evolvable AI docs! đź‘‹
User GuideKnowledge

Knowledge Base

Overview

The Knowledge Base lets you upload documents and content so your agents can search and reference them when answering questions. Instead of relying solely on general training data, an agent with a Knowledge Base can find specific information from your own files and give more accurate, grounded answers.

There are two ways an agent can retrieve knowledge:

  • RAG (Retrieval-Augmented Generation): finds the most semantically similar passages to the user’s question and injects them into the agent’s context.
  • GraphRAG: builds a knowledge graph from your documents — extracting entities and their relationships — and traverses it to answer complex, multi-hop questions that plain similarity search may miss.

Both can be active at the same time and complement each other.


Table of Contents

  1. Documents
  2. Processing Pipeline
  3. Tags
  4. Settings
  5. Using Knowledge in Agents
  6. Inspecting Processed Knowledge
  7. Limits & Constraints

Documents

Uploading Documents

You can upload one or more files at a time. Supported formats include PDF, Word documents, plain text, HTML, and many others — text is extracted automatically from the file contents.

During upload you can:

  • Assign tags to the documents for later filtering
  • Choose whether to replace an existing document with the same name

Once uploaded, a document is queued for processing. The document record is created immediately; embedding and graph creation happen asynchronously in the background.

Uploading Raw Content

Instead of a file, you can push plain text directly — useful when the content comes from a database, external API, or any programmatic source. You provide a name and the text body; the rest of the pipeline is identical to a file upload.

Searching & Filtering Documents

The document list supports:

  • Full-text search by document name
  • Filtering by tag
  • Pagination for large libraries

Each row shows the document name, assigned tags, and the current processing status for embeddings and graph creation.

Updating Documents

You can rename a document or change its tags at any time. These changes do not trigger reprocessing.

Deleting Documents

Deleting a document removes the file, its vector embeddings, and its knowledge graph nodes and relationships. This action cannot be undone. Bulk deletion is supported.

Reprocessing Documents

If you change the embedding model or graph settings after a document was already processed, you can reprocess it selectively:

  • Reprocess embeddings — deletes existing vector chunks and re-runs the embedding pipeline with the current settings.
  • Reprocess graph — deletes existing graph nodes/relationships and re-runs entity extraction.

Both options can be combined or run independently.


Processing Pipeline

Text Extraction

When a file is uploaded, text is extracted from its contents automatically. The raw text is then passed to the rest of the pipeline.

Chunking

The extracted text is split into overlapping chunks of a configurable size (measured in tokens). Chunks that are too short are discarded. Each chunk keeps metadata about which document it came from and its position in the document.

Embeddings (RAG)

Each chunk is converted into a vector embedding using the configured embedding model. These vectors are stored in a vector database indexed for fast similarity search. When an agent queries the knowledge base, the user’s message is embedded the same way and the most similar chunks are retrieved and injected into the prompt.

Optional enrichment can be applied to each chunk before embedding:

  • Keywords — a short list of keywords is extracted and prepended to the chunk text, improving retrieval precision.
  • Summary — an LLM-generated summary of the chunk (and optionally its adjacent chunks) is prepended, helping the model understand context.

Knowledge Graph (GraphRAG)

When graph creation is enabled, each chunk is also analysed by an LLM to extract entities (people, places, concepts, organisations, etc.) and relationships between them. The result is stored as a graph database.

At query time, the user’s question is used to search for relevant entities by name. The graph is then traversed to gather related nodes and relationships, which are injected into the agent’s prompt alongside or instead of vector search results.

GraphRAG is especially useful for:

  • Questions that require following chains of relationships (“Who works with whom?”, “What depends on what?”)
  • Fact-checking across multiple documents
  • Structured domain knowledge (e.g., ontologies, regulatory frameworks)

Processing Status

Each document tracks two independent statuses — one for embeddings and one for the knowledge graph:

StatusMeaning
Not startedThe pipeline has not begun yet
ProcessingCurrently running
CompletedFinished successfully
FailedAn error occurred — a reason is provided
SkippedProcessing was skipped (e.g., document already up to date)

Token usage is tracked per document for billing and observability purposes.


Tags

Tags are free-form labels you attach to documents. They serve two purposes:

  1. Organisation — filter and find documents in the document library.
  2. Scoping retrieval — when configuring an agent, you specify which tags it should retrieve from. This allows one agent to search only “product manuals” while another searches only “legal documents”, even if they share the same knowledge base.

Tags can be assigned at upload time and updated at any time afterwards.


Settings

Embedding Settings

SettingDescription
Embedding modelThe model used to create vector embeddings (e.g. text-embedding-3-small)
Chunk sizeTarget number of tokens per chunk (default: 450)
Min chunk sizeMinimum character length — shorter chunks are discarded
Max chunksHard cap on the total number of chunks per document
Top KHow many chunks to retrieve per query
Distance typeSimilarity metric used for retrieval: Cosine, Euclidean, or Manhattan

Changing the embedding model or chunk size requires reprocessing existing documents to take effect.

Enrichment Settings

SettingDescription
Keywords enabledExtract keywords per chunk and include them in the embedding
Keywords per chunkNumber of keywords to extract (default: 5)
Summary enabledPrepend an LLM-generated summary to each chunk before embedding
Adjacent summaryAlso include summaries from the preceding and following chunks

Graph Settings

SettingDescription
Graph creation enabledWhether to run the GraphRAG extraction pipeline on new documents
LLM modelThe language model used to extract entities and relationships
Fulltext indexWhether to create a fulltext index on entity names for fast search

Using Knowledge in Agents

To give an agent access to your knowledge base:

  1. Make sure at least one document has been successfully processed (status: Completed).
  2. On the agent’s configuration, enable RAG and/or GraphRAG.
  3. Specify which tags the agent should retrieve from, to scope the search to the relevant documents.

The agent will then automatically search the knowledge base for every user message and include relevant passages in its context before generating a response.


Inspecting Processed Knowledge

Embeddings

You can view the chunks and their embeddings stored for a document. This is useful for verifying that the document was chunked and enriched as expected, and for debugging retrieval quality.

Knowledge Graph

You can inspect the graph extracted from a document:

  • Graph view — full list of nodes and relationships
  • Entities view — entity list with types and properties
  • Graph status — processing metadata including token usage and timing

Limits & Constraints

  • Maximum file size per upload: 200 MB
  • Maximum chunks per document: 10,000
  • Processing is asynchronous — large documents may take several minutes
  • Reprocessing deletes existing embeddings or graph data before re-running
  • Changing the embedding model requires reprocessing all affected documents
  • Deleting a document is permanent and cannot be undone
Last updated on