Technical Specification: AI Assistant
This document details the technical architecture, RAG pipeline, and infrastructure for the AI Assistant module.
1. High-Level Architecture
The AI Assistant is an RAG-based (Retrieval-Augmented Generation) system. It utilizes a vector database to perform semantic search over organization-scoped data before generating responses via a Large Language Model (LLM).
Component Diagram
2. Technology Stack
Backend (API)
- Runtime: Bun
- Framework: Ignis Framework
- Router: Hono
- Vector DB: PostgreSQL with pgvector extension.
- LLM Engine: OpenAI (GPT-4o / GPT-4o-mini).
- Embeddings:
text-embedding-3-small(1536 dimensions).
Frontend (UI)
- Library: React 18
- Build Tool: Vite
- Styling: Tailwind CSS v4 + ARDOR UI Kit
- Streaming: Server-Sent Events (SSE) for real-time response rendering.
3. RAG Implementation Details
3.1 Indexing Pipeline (Knowledge Base)
- Extraction: Raw text is pulled from
Drive(via S3 streaming) orDocument(via SQL). - Chunking: Text is split into overlapping chunks (e.g., 500 tokens with 50-token overlap).
- Embedding: Chunks are sent to OpenAI's embedding API.
- Persistence: Vectors and text chunks are stored in the
kb_chunkstable, isolated byorgId.
3.2 Retrieval Pipeline (Q&A)
- Query Embedding: The user's question is embedded into a 1536-dim vector.
- Vector Search: A cosine similarity search is performed using pgvector's
<=>operator. - Context Assembly: The top
Kmost relevant chunks are retrieved and formatted into the LLM system prompt. - Generation: The LLM generates the answer, following strict instructions to cite sources only from the provided context.
4. Security & Privacy
4.1 Data Isolation
Every knowledge base source and its corresponding chunks are tagged with an organizationId. The vector search queries are strictly scoped to the user's organization.
4.2 Training Opt-out
VENI-AI uses Enterprise API agreements with LLM providers ensuring that data sent for completion or embedding is never used for training public models.
4.3 Citation Integrity
The system implements a "Contextual Highlight" feature that fetches the source chunk directly from the database when a citation is clicked, ensuring the user sees exactly what the AI read.
Related Links