Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
368 changes: 31 additions & 337 deletions docs/app-mode-ai-chat-review.md
Original file line number Diff line number Diff line change
@@ -1,354 +1,48 @@
# App Mode AI Chat Review

## Purpose
This note only tracks the highest-value next steps for making app-mode AI chat
safer and more efficient.

This document reviews the current app-mode AI chat design with a focus on:
## Recommended Next Steps

- keeping prompts and context as small as possible;
- requiring user confirmation for important actions;
- making datatable integration smooth and safe for users.
1. Add confirmation for dangerous app tools.

## Short verdict
Require explicit user confirmation before file writes, file deletes, backend
runnable writes, backend runnable deletes, and datatable SQL execution. Show a
useful diff or exact SQL before applying the action.

The app-mode AI chat has a solid foundation: mode-specific helpers, explicit `@` context, app snapshots/revert, datatable whitelisting, and generic confirmation UI already exist.
2. Enforce datatable SQL safety in code.

However, it is not yet optimal for minimal context and user-safe automation:
Do not rely on prompt instructions for SQL safety. Classify statements before
execution, block DDL unless table creation is allowed, and require
confirmation for DDL, DML, and row-returning reads that would expose data back
to the model.

1. **Context is still too large by default**, especially the app system prompt, broad file-discovery guidance, full datatable schemas, and persistent `@` context. (`get_files()` has since been replaced by metadata-only `list_files()`.)
2. **Important app/datatable actions are not consistently confirmed**. The confirmation infrastructure exists, but app tools mostly bypass it.
3. **Datatables UX is promising but has rough edges**: stale cached table context, weak SQL safety, policy persistence issues, and too-heavy full-schema fetching.
3. Keep default context demand-driven.

## Relevant files
Prefer selected context and targeted reads before broad discovery. Keep file
listings metadata-only, avoid sending full datatable schemas by default, and
keep SDK/reference material out of the base prompt unless it is requested or
needed for the task.

### AI chat orchestration
4. Improve app context lifecycle.

- `frontend/src/lib/components/copilot/chat/AIChatManager.svelte.ts`
- `frontend/src/lib/components/copilot/chat/chatLoop.ts`
- `frontend/src/lib/components/copilot/chat/shared.ts`
- `frontend/src/lib/components/copilot/chat/AIChat.svelte`
- `frontend/src/lib/components/copilot/chat/AIChatDisplay.svelte`
- `frontend/src/lib/components/copilot/chat/AIChatInput.svelte`
- `frontend/src/lib/components/copilot/chat/ToolExecutionDisplay.svelte`
Treat `@` context as per-message by default, with an explicit pinning affordance
for context that should persist. Lazy-load file and runnable contents, and add
a visible approximate context-size indicator so users can spot prompt bloat.

### App mode
5. Refresh datatable context after mutations.

- `frontend/src/lib/components/copilot/chat/app/core.ts`
- `frontend/src/lib/components/copilot/chat/AppAvailableContextList.svelte`
- `frontend/src/lib/components/copilot/chat/ContextElementBadge.svelte`
- `frontend/src/lib/components/copilot/chat/DatatableCreationPolicy.svelte`
Refresh table metadata after data-panel changes and after AI-created tables so
follow-up tool calls and user-visible context do not use stale schema data.

### Raw app editor and datatables
6. Persist table creation policy explicitly.

- `frontend/src/lib/components/raw_apps/RawAppEditor.svelte`
- `frontend/src/lib/components/raw_apps/RawAppDataTableList.svelte`
- `frontend/src/lib/components/raw_apps/RawAppDataTableDrawer.svelte`
- `frontend/src/lib/components/raw_apps/DefaultDatabaseSelector.svelte`
- `frontend/src/lib/components/raw_apps/dataTableRefUtils.ts`
- `frontend/src/lib/components/raw_apps/datatableUtils.svelte.ts`
- `frontend/src/routes/(root)/(logged)/apps_raw/add/+page.svelte`
- `frontend/src/routes/(root)/(logged)/apps_raw/edit/[...path]/+page.svelte`
Store whether AI table creation is enabled as an explicit app setting instead
of inferring it from the presence of datatable configuration.

### Backend datatable APIs
7. Add focused eval coverage for these behaviors.

- `backend/windmill-api-workspaces/src/workspaces.rs`
- `list_datatables`
- `list_datatable_schemas`
- `get_datatable_schema`
- `edit_datatable_config`

### System prompts

- `system_prompts/README.md`
- `system_prompts/auto-generated/index.ts`
- `system_prompts/auto-generated/sdks/datatable-typescript.md`
- `system_prompts/auto-generated/sdks/datatable-python.md`

## How app mode works today

In the raw app editor, `RawAppEditor.svelte` initializes app-mode AI chat on mount:

- calls `aiChatManager.saveAndClear()`;
- calls `aiChatManager.changeMode(AIMode.APP)`;
- registers app helpers through `aiChatManager.setAppHelpers(...)`.

Those app helpers expose operations for:

- frontend files;
- backend runnables;
- current selected editor context;
- linting;
- app snapshots and revert;
- datatable schema loading;
- SQL execution;
- app table whitelisting.

When app mode is active, `AIChatManager.changeMode(AIMode.APP)` sets:

- system prompt: `prepareAppSystemMessage(...)`;
- tools: `getAppTools()`;
- helpers: `appAiChatHelpers`.

When the user sends a message, `prepareAppUserMessage(...)` builds the user prompt from:

- current frontend/backend file selection, unless excluded;
- inspector-selected DOM element;
- editor code selection;
- additional `@`-mentioned context;
- the user instructions.

`runChatLoop(...)` then sends the system message, history, user message, and tool definitions to the selected model. Tool calls go through `processToolCall(...)`, which supports confirmation only when a tool opts into `requiresConfirmation`.

## Current app tools

### Read and discovery tools

These are generally safe without confirmation:

- `list_files`
- `get_frontend_file`
- `get_backend_runnable`
- `get_selected_context`
- `lint`
- `search_workspace`
- `get_runnable_details`
- `search_hub_scripts`
- `list_datatables`
- `get_datatable_table_schema`

### Mutating tools

These currently execute directly in app mode:

- `set_frontend_file`
- `patch_file`
- `delete_frontend_file`
- `set_backend_runnable`
- `delete_backend_runnable`
- `exec_datatable_sql`

This is the biggest mismatch with the requirement that every important action should be confirmed by the user.

## System prompt assessment

The app system prompt is useful but heavier than ideal.

### Strengths

- Clearly explains raw app structure.
- Explains the frontend/backend runnable split.
- Encourages `patch_file` for small edits.
- Pushes datatables for persisted app storage.
- Explains that datatable DDL should go through `exec_datatable_sql`.
- Includes table creation policy context.

### Concerns

1. It always includes broad app-building instructions, even for small localized edits.
2. The previous prompt included the datatable SDK reference for both TypeScript and Python every time. This has since been removed; concise examples remain in the prompt.
3. The previous prompt told the model to start with `get_files()`, which encouraged loading all files even when selected context was sufficient. This is now improved by `list_files()`, but the prompt still needs to stay demand-driven.
4. It relies heavily on prompt instructions for datatable safety instead of enforcing safety in tools.
5. Custom workspace/user prompts are appended as `USER GIVEN INSTRUCTIONS`, which is flexible but can further increase context.

### Recommendation

The base app prompt should be shorter and more demand-driven:

- Keep file discovery demand-driven: use selected and explicitly provided context first; call `list_files()` only when a broader metadata overview is needed.
- Keep full SDK details out of the default prompt; concise examples are usually enough. Add an on-demand SDK reference only if it does not cause unnecessary extra tool turns.
- Keep only minimal datatable rules in the base prompt:
- use datatables for persistence;
- call `list_datatables()` before schema work;
- DDL must use `exec_datatable_sql`;
- non-read SQL requires confirmation.

## Additional context assessment

The `@` context system is a good UX foundation.

App mode exposes categories for:

- frontend files;
- backend runnables;
- datatables.

Selecting a datatable context includes its columns and also calls `addTableToWhitelist(...)`, adding the table to the app data panel.

### Strengths

- Context is explicit and user-controllable.
- Datatable table selection is naturally integrated into the chat input.
- Selected app file/runnable chips are visible and can be excluded.
- Inspector and code-selection context are compact and useful.

### Concerns

1. `@` context persists across messages until manually removed, which can silently bloat follow-up prompts.
2. Available app context currently includes file contents/runnable configs in memory before selection.
3. Each selected context item is truncated, but there is no overall context budget indicator.
4. Current file/runnable selection is included by default unless excluded, which is convenient but not minimal.

### Recommendation

- Make app `@` context per-message by default.
- Add an explicit “pin” option for context that should persist across messages.
- Lazy-load file/runnable content when selected or when a message is sent.
- Show an approximate context-size/token budget indicator.
- Prefer sending path/name and selected code first; fetch full files only when necessary.

## Confirmation assessment

The generic confirmation mechanism already exists:

- `processToolCall(...)` checks `tool.requiresConfirmation`.
- `ToolExecutionDisplay.svelte` renders Run/Cancel controls.
- Script test runs, flow test runs, and mutating API calls already use confirmation.

App mode should use the same infrastructure for important actions.

### Suggested confirmation policy

#### No confirmation required

- `list_files`, as a metadata-only response;
- `get_frontend_file`;
- `get_backend_runnable`;
- `get_selected_context`;
- `list_datatables`, as table-name metadata only;
- `get_datatable_table_schema`, as a targeted schema read;
- `lint`;
- search tools.

#### Confirmation required

- `set_frontend_file`;
- `patch_file`;
- `delete_frontend_file`;
- `set_backend_runnable`;
- `delete_backend_runnable`;
- `exec_datatable_sql` for any DDL or DML;
- `exec_datatable_sql` for `SELECT` if it returns real row data that will be sent back to the model.

### Recommended UX

For files/runnables:

- Prefer batched proposed edits.
- Show a diff.
- Let the user click “Apply changes”.
- Run lint after applying.

For SQL:

- Show the exact SQL.
- Classify the query as:
- schema read;
- data read;
- insert/update/delete;
- DDL.
- Require confirmation before data reads and all mutations.
- For table creation, require both:
- table creation policy enabled;
- explicit confirmation of the `CREATE TABLE` SQL.

## Datatables integration assessment

The datatable integration is directionally good and already has several strong user-facing pieces.

### Current strengths

The new app setup lets the user choose:

- default datatable;
- schema mode: none, new, existing;
- whether AI can create tables;
- pre-whitelisted existing tables.

The raw app data panel lets users:

- add datatable table references;
- inspect tables through the DB manager drawer;
- configure the default datatable/schema for new tables.

The AI chat integration lets users:

- mention datatable tables through `@` context;
- add mentioned tables to the app whitelist;
- list datatable/schema/table names with `list_datatables()`;
- retrieve one table's columns with `get_datatable_table_schema()`;
- create tables through `exec_datatable_sql(..., new_table)`.

### Concerns

1. **`exec_datatable_sql` is too powerful without confirmation.**
It can run `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `CREATE`, `DROP`, `ALTER`, etc.

2. **Table creation policy is not fully enforced in code.**
The tool blocks `new_table` when policy is disabled, but it does not block DDL if the model omits `new_table`.

3. **Table creation disabled state may not persist cleanly.**
`RawAppData` stores `datatable` and `schema`, but not an explicit `enabled` value. `RawAppEditor` infers enabled from `data.datatable !== undefined`, which can re-enable table creation after reopening.

4. **Datatable context cache can become stale.**
`AIChatManager.refreshDatatables()` runs when app helpers are set, but may not refresh immediately after data panel changes or after AI creates a new table.

5. **Full schema loading can still be too expensive internally.**
`list_datatables()` and `get_datatable_table_schema()` reduce what is sent to the model, but they still currently rely on app helpers that fetch full schema data before filtering.

6. **Auto-whitelisting from `@table` is convenient but silent.**
It mutates app data without an obvious confirmation or undo affordance.

### Recommended datatable tool design

Instead of one broad schema tool and one unrestricted SQL tool, prefer smaller tools:

- `list_datatables()`
- `list_datatable_tables(datatable, schema?, search?)` (optional backend/API optimization if table lists need server-side filtering)
- `get_datatable_table_schema(datatable, schema, table)`
- `preview_datatable_rows(datatable, schema, table, limit)` with confirmation
- `execute_datatable_sql(datatable, sql)` with query classification and confirmation
- `create_datatable_table(datatable, schema, table, columns)` as a structured safe path for table creation

## Priority recommendations

1. **Add confirmation to dangerous app tools**
- file/runnable writes;
- file/runnable deletes;
- datatable SQL;
- especially DDL/DML.

2. **Enforce SQL safety in code, not only in prompts**
- block DDL unless `new_table` is provided and policy allows it;
- confirm all non-`SELECT` statements;
- consider confirming `SELECT` row reads too.

3. **Reduce default prompt/tool context**
- keep `list_files()` metadata-only and demand-driven;
- use selected context first;
- keep full SDK references out of the default prompt;
- keep datatable tools split into smaller schema/table lookups.

4. **Refresh datatable context reliably**
- refresh after data panel changes;
- refresh after `exec_datatable_sql(..., new_table)`;
- remove debug logging from datatable refresh.

5. **Persist table creation policy explicitly**
- store a boolean such as `tableCreationEnabled` in raw app data;
- do not infer enabled solely from `data.datatable`.

6. **Improve `@` context lifecycle**
- make app `@` context per-message by default;
- add pinning for persistent context;
- lazy-load file/runnable contents;
- show approximate context size.

## Overall opinion

The current architecture is good and extensible, but it should become more demand-driven and safer before being considered efficient and user-safe.

The highest-impact changes are:

- add confirmation for app mutations and datatable SQL;
- enforce datatable SQL policy programmatically;
- reduce the app system prompt and avoid automatic broad context loading;
- split datatable schema access into smaller, targeted tools.
Cover confirmation requirements, datatable SQL policy enforcement, selected
context minimization, and stale-schema refresh behavior with targeted app-mode
evals or lower-level tests where practical.
Loading
Loading