Database Schema: AI Assistant
This document details the database structure for the AI Assistant module, featuring pgvector support.
1. Overview
The AI Assistant module uses a PostgreSQL database with the pgvector extension. It follows a multi-tenant structure where all knowledge base content, vectors, and chat history are isolated by orgId.
Core Domain Model
2. Table Definitions
kb_sources
Primary registry for files and links added to the knowledge base.
| Field | Type | Description |
|---|---|---|
id | uuid | PK. |
org_id | uuid | Organization scope. |
name | text | Display name of the source. |
type | enum | drive, document, url. |
status | enum | indexing, synced, error. |
external_id | text | ID in the target system (e.g., Drive File ID). |
kb_chunks
Individual text fragments with vector embeddings.
| Field | Type | Description |
|---|---|---|
id | uuid | PK. |
source_id | uuid | FK to kb_sources.id. |
content | text | The raw text fragment. |
embedding | vector(1536) | OpenAI-compatible embedding. |
chunk_index | integer | Sequence order within the source. |
chat_sessions
Persistent user conversation threads.
| Field | Type | Description |
|---|---|---|
id | uuid | PK. |
org_id | uuid | Organization scope. |
user_id | uuid | Owner of the session. |
title | text | Auto-generated title for the chat. |
chat_messages
Individual exchanges within a session.
| Field | Type | Description |
|---|---|---|
id | uuid | PK. |
session_id | uuid | FK to chat_sessions.id. |
role | enum | user, assistant. |
content | text | Message content (Markdown). |
citations | jsonb | Array of { sourceId, excerpt, score }. |
3. Vector Search Indexing
The kb_chunks table uses an IVFFlat or HNSW index to optimize cosine similarity searches.
sql
-- Example Index Creation
CREATE INDEX ON kb_chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);Retrieval Query Example
sql
SELECT content, source_id, 1 - (embedding <=> :query_vector) AS score
FROM kb_chunks
WHERE org_id = :org_id
ORDER BY embedding <=> :query_vector
LIMIT 5;4. Multi-tenant Isolation
All queries MUST filter by org_id. This is enforced at the service layer by extracting the organizationId from the Platform JWT.
typescript
// Enforced Isolation
const relevantChunks = await db.select()
.from(kbChunks)
.where(and(
eq(kbChunks.orgId, orgId),
cosineDistance(kbChunks.embedding, queryVector).lt(0.2)
))
.limit(5);