Integration tests for natural language automation were disabled due to persistent log file path issues and unreliable automation. The tests were not providing actionable value and were causing workflow confusion. All test methods in IntegrationTests.cs have been commented out.
Future testing should focus on manual validation and targeted unit tests for core logic.
If integration tests are needed again, ensure log file paths are unified and automation is robust before re-enabling.
Add a "natural" mode to the existing .NET assistant so that:
- Talon calls the app like:
ExecuteCommands.exe natural <dictation> - The app interprets the text and directly performs:
- Window management
- App launch / switch
- Basic key sequences
- Opening common folders
No callback into Talon for this phase.
-
Talon
- Command:
natural <dictation> - Runs:
NaturalCommands.exe natural "<dictation>"
- Command:
-
.NET app (Program.Main)
- Parses
args[0]as the mode (natural,sharp, etc.). - Joins the remaining args into a single text string.
- If mode is
natural, callsHandleNaturalAsync(text).
- Parses
-
HandleNaturalAsync(text)
- Calls
InterpretAsync(text)→ returns anActionBase(one of several action types). - Calls
ExecuteActionAsync(action)to actually perform the automation.
- Calls
-
Automation layer
- Handles actions such as:
- Move active window (via Win32)
- Launch app (via
Process.Start) - Send key sequences (via existing input simulator)
- Open known folders (via
Environment.GetFolderPath)
- Handles actions such as:
-
Existing:
NaturalCommands.exe sharp <dictation>→ existing “sharp” behaviour. -
New:
NaturalCommands.exe natural <dictation>→ natural language mode.
- Read
args[0]asmode. text = string.Join(" ", args.Skip(1)).switch(mode):"natural"→HandleNaturalAsync(text)"sharp"→ existing handler- default → treat as natural for now.
Create a small set of action types (records) to represent what the interpreter can do:
-
MoveWindowActionTarget("active")Monitor("current" | "next" | "previous")Position("left" | "right" | "top" | "bottom" | "center" | null)WidthPercent(1–100, nullable)HeightPercent(1–100, nullable)
-
LaunchAppActionAppIdOrPath(e.g."msedge.exe","code.exe", or full path)
-
SendKeysActionKeysText– raw text like"control shift b"to be parsed by the input simulator layer.
-
OpenFolderActionKnownFolder– e.g."Downloads","Documents".
These are used as the “internal API” between the interpreter and executor.
Implement InterpretAsync(string text) as a simple, deterministic rules engine first, with fallback to an AI-based interpreter (OpenAI) for unhandled commands.
-
"move this window to the other screen"- Detect:
textcontainsmove,window,other screen. - Return
MoveWindowAction(Target: "active", Monitor: "next", Position: null, WidthPercent: null, HeightPercent: null).
- Detect:
-
"make this window full screen"/"maximize this window"- Detect:
textcontainswindowand eitherfull screenormaximize. - Return
MoveWindowAction(Target: "active", Monitor: "current", Position: "center", WidthPercent: 100, HeightPercent: 100).
- Detect:
-
"put this window on the left half"- Detect:
textcontainswindow,left,half. - Return
MoveWindowAction(Target: "active", Monitor: "current", Position: "left", WidthPercent: 50, HeightPercent: 100).
- Detect:
-
"put this window on the right half"- Similar to above but
Position: "right".
- Similar to above but
-
Pattern:
"open <something>". -
Extract
<something>and map to known EXEs:"edge"/"microsoft edge"→"msedge.exe""chrome"→"chrome.exe""visual studio"→"devenv.exe"(or full path)"visual studio code"/"code"→"code.exe""outlook"→"outlook.exe"- default: use the raw string as
AppIdOrPath.
Return LaunchAppAction(AppIdOrPath).
-
Pattern:
"press <keys>". -
Strip
"press "and keep the rest asKeysText:"press control shift b"→SendKeysAction("control shift b")."press alt f4"→SendKeysAction("alt f4").
The KeySimulator will parse and execute these.
"open downloads"→OpenFolderAction("Downloads")."open documents"→OpenFolderAction("Documents").
If nothing matches:
- The system now falls back to an AI-based interpreter (OpenAI) that attempts to parse the command and return the closest matching action type.
- Logging for unhandled commands is still performed.
Implement ExecuteActionAsync(ActionBase action) to dispatch to helper classes.
Responsibilities:
-
Get active window handle (
GetForegroundWindow). -
For full screen:
- Call
ShowWindow(hWnd, SW_MAXIMIZE).
- Call
-
For left/right half on current monitor:
- Use
MonitorFromWindow+GetMonitorInfoto get working area. - Compute new rectangle (half width, full height).
- Use
SetWindowPosto resize/move the window.
- Use
-
(Future) For
Monitor == "next":- Enumerate monitors and move the window rect to the “next” one.
- Use
ProcessStartInfo+Process.Start. UseShellExecute = trueso Windows can resolve paths/short names.
- Parse
KeysTextlike"control shift b":- Map to the key codes you already use (e.g. InputSimulator).
- Support modifiers:
control,shift,alt,windows. - Support letters / function keys:
b,f4, etc.
- Map
KnownFolderto paths:"Downloads"→<UserProfile>\Downloads"Documents"→Environment.GetFolderPath(MyDocuments)
Process.Start("explorer.exe", path).
In Talon:
-
Define a command:
-
natural <dictation> -
Action: run your .NET app with mode + dictation:
- Example pseudo-action:
run("C:\\path\\to\\myassistant.exe", "natural", "{dictation}")
- Example pseudo-action:
-
-
Keep the existing
sharp <dictation>wiring unchanged.
Result:
- You say:
"natural move this window to the other screen" - Talon runs:
myassistant.exe natural move this window to the other screen - Program:
mode = "natural",text = "move this window to the other screen".InterpretAsync→MoveWindowAction(...).ExecuteActionAsync→WindowAutomation.MoveActiveWindow(...).
- VisualStudioHelper for COM/EnvDTE command execution is implemented.
- Interpreter can detect if Visual Studio is the active window.
- New action type ExecuteVSCommandAction is defined.
- Map natural language like "build solution" to ExecuteVSCommandAction when VS is active.
- Update executor to call VisualStudioHelper.ExecuteCommand for ExecuteVSCommandAction.
- Add logging and error handling for VS command execution.
- Test with "build solution" and other canonical VS commands.
To support the vast number of Visual Studio commands without hardcoding them, we implemented a dynamic lookup system.
A new CLI command was added to export all available Visual Studio commands and their keyboard shortcuts to a JSON file.
Command:
dotnet run --project NaturalCommands.csproj -- export-vs-commandsOutput:
Generates vs_commands.json in the current directory. This file contains an array of command objects:
[
{
"Name": "File.NewProject",
"Bindings": ["Global::Ctrl+Shift+N"]
},
...
]- Created
Helpers/VisualStudioCommandLoader.csto loadvs_commands.json. - Implemented a fuzzy matching algorithm (
FindCommand) to map natural language input (e.g., "build solution") to canonical command names (e.g., "Build.BuildSolution").
NaturalLanguageInterpreter.cswas updated to use the command loader when Visual Studio is the active window.- It looks for
vs_commands.jsonin the execution directory or project root. - If a match is found, it returns an
ExecuteVSCommandAction, which is then executed via DTE.
- SendKeysAction: Fixed parsing to support
+as a separator (e.g., "control+d"). - Path Resolution: Updated the loader to search for
vs_commands.jsonin parent directories during development.
-
Phase 2 – Improved AI interpretation
- The fallback LLM (OpenAI) interpreter is already implemented and active.
- Future improvements may focus on prompt/tool calling, more robust schema validation, and safety rules (e.g., no destructive actions without explicit confirmation).
-
Phase 3 – Talon callback (if needed)
- For code editing / Cursorless-type actions, add a
RunTalonVoiceCommandAction:- Contains a phrase like
"sharp select camel foo". - Sent to Talon via a small IPC bridge that calls
simulate()with that phrase.
- Contains a phrase like
- For code editing / Cursorless-type actions, add a
For now, the focus is on making direct .NET automation for windows/apps/keys/folders feel smooth and reliable, with both rule-based and AI-based interpretation available.