Skip to content

Latest commit

 

History

History
288 lines (196 loc) · 9.61 KB

File metadata and controls

288 lines (196 loc) · 9.61 KB

Integration Test Status (Nov 2025)

Integration tests for natural language automation were disabled due to persistent log file path issues and unreliable automation. The tests were not providing actionable value and were causing workflow confusion. All test methods in IntegrationTests.cs have been commented out.

Future testing should focus on manual validation and targeted unit tests for core logic.

If integration tests are needed again, ensure log file paths are unified and automation is robust before re-enabling.

Natural Language Windows Control – Phase 1 Plan

Goal

Add a "natural" mode to the existing .NET assistant so that:

  • Talon calls the app like:
    ExecuteCommands.exe natural <dictation>
  • The app interprets the text and directly performs:
    • Window management
    • App launch / switch
    • Basic key sequences
    • Opening common folders

No callback into Talon for this phase.


High-Level Flow

  1. Talon

    • Command: natural <dictation>
    • Runs: NaturalCommands.exe natural "<dictation>"
  2. .NET app (Program.Main)

    • Parses args[0] as the mode (natural, sharp, etc.).
    • Joins the remaining args into a single text string.
    • If mode is natural, calls HandleNaturalAsync(text).
  3. HandleNaturalAsync(text)

    • Calls InterpretAsync(text) → returns an ActionBase (one of several action types).
    • Calls ExecuteActionAsync(action) to actually perform the automation.
  4. Automation layer

    • Handles actions such as:
      • Move active window (via Win32)
      • Launch app (via Process.Start)
      • Send key sequences (via existing input simulator)
      • Open known folders (via Environment.GetFolderPath)

CLI Contract

  • Existing:
    NaturalCommands.exe sharp <dictation> → existing “sharp” behaviour.

  • New:
    NaturalCommands.exe natural <dictation> → natural language mode.

Program.Main outline

  • Read args[0] as mode.
  • text = string.Join(" ", args.Skip(1)).
  • switch(mode):
    • "natural"HandleNaturalAsync(text)
    • "sharp" → existing handler
    • default → treat as natural for now.

Action Model

Create a small set of action types (records) to represent what the interpreter can do:

  • MoveWindowAction

    • Target ("active")
    • Monitor ("current" | "next" | "previous")
    • Position ("left" | "right" | "top" | "bottom" | "center" | null)
    • WidthPercent (1–100, nullable)
    • HeightPercent (1–100, nullable)
  • LaunchAppAction

    • AppIdOrPath (e.g. "msedge.exe", "code.exe", or full path)
  • SendKeysAction

    • KeysText – raw text like "control shift b" to be parsed by the input simulator layer.
  • OpenFolderAction

    • KnownFolder – e.g. "Downloads", "Documents".

These are used as the “internal API” between the interpreter and executor.


Natural Language Interpreter (Phase 1 – Rule-based)

Implement InterpretAsync(string text) as a simple, deterministic rules engine first, with fallback to an AI-based interpreter (OpenAI) for unhandled commands.

Window Management Examples

  • "move this window to the other screen"

    • Detect: text contains move, window, other screen.
    • Return MoveWindowAction(Target: "active", Monitor: "next", Position: null, WidthPercent: null, HeightPercent: null).
  • "make this window full screen" / "maximize this window"

    • Detect: text contains window and either full screen or maximize.
    • Return MoveWindowAction(Target: "active", Monitor: "current", Position: "center", WidthPercent: 100, HeightPercent: 100).
  • "put this window on the left half"

    • Detect: text contains window, left, half.
    • Return MoveWindowAction(Target: "active", Monitor: "current", Position: "left", WidthPercent: 50, HeightPercent: 100).
  • "put this window on the right half"

    • Similar to above but Position: "right".

App Launch Examples

  • Pattern: "open <something>".

  • Extract <something> and map to known EXEs:

    • "edge" / "microsoft edge""msedge.exe"
    • "chrome""chrome.exe"
    • "visual studio""devenv.exe" (or full path)
    • "visual studio code" / "code""code.exe"
    • "outlook""outlook.exe"
    • default: use the raw string as AppIdOrPath.

Return LaunchAppAction(AppIdOrPath).

Send Keys Examples

  • Pattern: "press <keys>".

  • Strip "press " and keep the rest as KeysText:

    • "press control shift b"SendKeysAction("control shift b").
    • "press alt f4"SendKeysAction("alt f4").

The KeySimulator will parse and execute these.

Open Folder Examples

  • "open downloads"OpenFolderAction("Downloads").
  • "open documents"OpenFolderAction("Documents").

Fallback

If nothing matches:

  • The system now falls back to an AI-based interpreter (OpenAI) that attempts to parse the command and return the closest matching action type.
  • Logging for unhandled commands is still performed.

Execution Layer

Implement ExecuteActionAsync(ActionBase action) to dispatch to helper classes.

MoveWindowAction → WindowAutomation.MoveActiveWindow

Responsibilities:

  • Get active window handle (GetForegroundWindow).

  • For full screen:

    • Call ShowWindow(hWnd, SW_MAXIMIZE).
  • For left/right half on current monitor:

    • Use MonitorFromWindow + GetMonitorInfo to get working area.
    • Compute new rectangle (half width, full height).
    • Use SetWindowPos to resize/move the window.
  • (Future) For Monitor == "next":

    • Enumerate monitors and move the window rect to the “next” one.

LaunchAppAction → AppLauncher.Launch

  • Use ProcessStartInfo + Process.Start.
  • UseShellExecute = true so Windows can resolve paths/short names.

SendKeysAction → KeySimulator.Send

  • Parse KeysText like "control shift b":
    • Map to the key codes you already use (e.g. InputSimulator).
    • Support modifiers: control, shift, alt, windows.
    • Support letters / function keys: b, f4, etc.

OpenFolderAction → FolderOpener.OpenKnownFolder

  • Map KnownFolder to paths:
    • "Downloads"<UserProfile>\Downloads
    • "Documents"Environment.GetFolderPath(MyDocuments)
  • Process.Start("explorer.exe", path).

Talon Wiring (Phase 1)

In Talon:

  • Define a command:

    • natural <dictation>

    • Action: run your .NET app with mode + dictation:

      • Example pseudo-action:
        run("C:\\path\\to\\myassistant.exe", "natural", "{dictation}")
  • Keep the existing sharp <dictation> wiring unchanged.

Result:

  • You say:
    "natural move this window to the other screen"
  • Talon runs:
    myassistant.exe natural move this window to the other screen
  • Program:
    • mode = "natural", text = "move this window to the other screen".
    • InterpretAsyncMoveWindowAction(...).
    • ExecuteActionAsyncWindowAutomation.MoveActiveWindow(...).

Visual Studio MVP Integration – Status & Next Steps

Current Status

  • VisualStudioHelper for COM/EnvDTE command execution is implemented.
  • Interpreter can detect if Visual Studio is the active window.
  • New action type ExecuteVSCommandAction is defined.

Next Steps

  • Map natural language like "build solution" to ExecuteVSCommandAction when VS is active.
  • Update executor to call VisualStudioHelper.ExecuteCommand for ExecuteVSCommandAction.
  • Add logging and error handling for VS command execution.
  • Test with "build solution" and other canonical VS commands.

Visual Studio Command Export & Dynamic Lookup (Nov 2025)

To support the vast number of Visual Studio commands without hardcoding them, we implemented a dynamic lookup system.

1. Export Command

A new CLI command was added to export all available Visual Studio commands and their keyboard shortcuts to a JSON file.

Command:

dotnet run --project NaturalCommands.csproj -- export-vs-commands

Output: Generates vs_commands.json in the current directory. This file contains an array of command objects:

[
  {
    "Name": "File.NewProject",
    "Bindings": ["Global::Ctrl+Shift+N"]
  },
  ...
]

2. Dynamic Command Loader

  • Created Helpers/VisualStudioCommandLoader.cs to load vs_commands.json.
  • Implemented a fuzzy matching algorithm (FindCommand) to map natural language input (e.g., "build solution") to canonical command names (e.g., "Build.BuildSolution").

3. Interpreter Integration

  • NaturalLanguageInterpreter.cs was updated to use the command loader when Visual Studio is the active window.
  • It looks for vs_commands.json in the execution directory or project root.
  • If a match is found, it returns an ExecuteVSCommandAction, which is then executed via DTE.

4. Fixes

  • SendKeysAction: Fixed parsing to support + as a separator (e.g., "control+d").
  • Path Resolution: Updated the loader to search for vs_commands.json in parent directories during development.

Future Phases (Notes)

  • Phase 2 – Improved AI interpretation

    • The fallback LLM (OpenAI) interpreter is already implemented and active.
    • Future improvements may focus on prompt/tool calling, more robust schema validation, and safety rules (e.g., no destructive actions without explicit confirmation).
  • Phase 3 – Talon callback (if needed)

    • For code editing / Cursorless-type actions, add a RunTalonVoiceCommandAction:
      • Contains a phrase like "sharp select camel foo".
      • Sent to Talon via a small IPC bridge that calls simulate() with that phrase.

For now, the focus is on making direct .NET automation for windows/apps/keys/folders feel smooth and reliable, with both rule-based and AI-based interpretation available.