Feature Specification: Data Connectors
1. Overview & Vision
Data Connectors are the pipeline through which organizational knowledge flows into the report engine. They provide a standardized way to fetch, clean, and compress data from satellite modules (HRM, Drive, Docs), ensuring that the LLM has the most relevant context to generate accurate reports.
2. Personas & Stakeholders
| Persona | Goal |
|---|---|
| Admin | Configure which data points are included in a report. |
| Developer | Maintain and extend the connector library for new modules. |
3. User Stories
- As an admin, I want to ingest the last 7 days of "Leave Requests" to generate a weekly attendance summary.
- As an admin, I want to ingest "Project Documents" from Drive to generate a monthly project health report.
4. Functional Requirements (FR)
- REQ-CON-001: Support for internal module connectors (HRM, Drive, Document).
- REQ-CON-002: Recursive data fetching for complex objects (e.g., Folders + Files).
- REQ-CON-003: In-flight data cleaning (stripping HTML, PII masking).
- REQ-CON-004: Context window management (automatic truncation/summarization).
5. Non-Functional Requirements (NFR)
- Security: Ingestion MUST use Service-to-Service (S2S) tokens with restricted scopes.
- Performance: Data fetching from internal modules < 2 seconds.
6. Business Logic & Rules
- Snapshotting: Fetched data is saved as a JSON snapshot alongside the report.
- Compression: Payloads exceeding 10,000 tokens are automatically summarized by a smaller model before being passed to the main generation model.
- Tenant Integrity: Connectors MUST pass the
organizationIdheader to every satellite API call.
7. User Interface (UI/UX)
- Source Configurator: Tree-style UI for selecting specific module endpoints.
- Mapping Table: Visual list showing which fields are being ingested from each source.
8. Information Architecture
- Part of the "Template Configuration" workflow.
9. Data Model & Persistence
- Table:
report_results(stores thedata_snapshotjsonb). - Config: Defined in
report_templates.sources.
10. API & Service Layer
GET /internal/sources: Internal registry of available connector endpoints.IngestionServiceorchestrates the multi-module fetch.
11. Integration Patterns
- gRPC/Connect: Preferred protocol for internal data ingestion.
- Platform JWT: S2S tokens issued by the Shell for each connector request.
12. Security & Permissions
- Scoping: Connectors are only granted
readaccess to the required resources. - Data Privacy: PII (Personally Identifiable Information) can be optionally masked during ingestion (planned for v1.2).
13. Error Handling & Resilience
- Fail-fast: If a mandatory data source fails, the entire report run is marked as "Error".
- Partial Context: (Optional) Allow generation even if non-essential sources are unreachable.
14. Performance & Scalability
- Parallel fetching from multiple satellite modules using
Promise.all. - Result caching for recurring daily reports.
15. Globalization & i18n
- Support for ingesting multi-language content (UTF-8).
16. Accessibility (a11y)
- Accessible tree view for source selection.
17. Observability & Analytics
- Tracking of "Payload Size" per connector to monitor token costs.
- Latency monitoring for satellite module responses.
18. Testing & Quality
- Mock satellite APIs for ingestion unit tests.
- Validation of JSON-to-Text transformation logic.
19. Constraints & Assumptions
- Assumes satellite modules follow the standard VENI-AI API response format.
20. Future Enhancements
- External connectors (Google Sheets, Salesforce, HubSpot).
- Real-time data streaming (v2).