Skip to content

zeittresor/OllamaVibeDesk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OllamaVibeDesk

Modern offline-first PyQt6 desktop GUI for a local Ollama instance, with automatic answer function, different speakers, streaming responses, and optional local TTS playback/export through Windows SAPI or a VibeVoice compatible build-in server.

Main features

  • real Ollama integration via local API
  • streamed assistant output
  • model list refresh from the local Ollama API
  • persistent chat sessions stored inside the app folder
  • theme switching
  • cancel/stop generation
  • per-message Vorlesen action for assistant replies by voice
  • optional WAV export via a local TTS endpoint
  • Windows installer and launcher batch files
  • no cloud requirement for normal chat usage once local services are present
  • multiple language support (currently: german, english, french, spanish, russian and some more..)
  • different voice output speakers for human and AI
  • auto answer function to act like the human based of the ELIZA algorythm to automaticly continue a chat dialog
  • automatic code generating based upon the dialogs in the auto-answer mode (Python, c#, php, html, what ever..) (in subfolder ..\app_data\generated_code)
  • automaticly switching to "thinking" mode in advanced questions like for code generation, jokes, abstract questions..
  • short- and longterm "brain" function added to be available for ANY chat as a knowledge source (base is a local working wiki or something similar if unavailable)
  • RAG function added to add/remove/change/view/modify external knowledge bases
  • Function to add media files and data files to be analysed by a selected LLM added
  • Buttons added in Settings for the "Brain" and Buttons in Chat windows to view generated output code and to view the current "Brain Memory".
ovv_v1_16b

Example with SAPi output (german language)

ui1_1_16

Voice output example (VibeVoice Mode):

followup_answer_DE.mp4

Note: Using VibeVoice Mode takes some time to generate a voice output in contrast to the use of the default Windows Sapi.

Basic usage instructions / Notes

Make sure you have installed the requirements (Ollama, Python 3.13 (also works using 3.12.9), might be PIP if you dont have it already, GIT).

Start Ollama (!) and close / minimize the GUI, make sure you have at least one model to use if you dont have any get a model using the ollama gui, suggestion: Dolphin3 or Qwen3.5:9b

Extract the archive (or clone the repository) to a new folder ex. c:\AI\OllamaVibeDesk and start the install_windows.bat it will install all the required files (for the base setup).

If you dont break it it should start the GUI automaticly after installation (in the other case start the GUI using run_windows.bat)

If you a from germany and want native german language you dont have to do anything for a first test, just go to the prompt field and enter something like "erstell ein 3d Maze erkundungs-Spiel das man mit WSAD und der maus steuern kann aus der ego perspektive" then press send (or ctrl/strg + enter) and listen to the endless automatic dialog.

If you are from a different country or want a different language go to settings (einstellungen) and select language (sprache) to select it. (save the config after that). Now back in the Main window enter something in the prompt box like "How old is the Towerwatch in London."

How ever that all is just the base version (takes ~ 240 mb at all) but if you want to use MS Vibevoice change the speaker mode in settings and change it from TTS/SaPi to Vibevoice.

The App is including a local working VibeVoice Server and is able to get all the requirements including optional voice models for different languages (male/female) in a simple oneclick build-in installer (VibeVoice setup) to make it finally work offline.

Note: Make sure to start the VibeVoice Server after the installation.

The Full installation including vibevoice is taking ~ 3 GB in total (just wait the setup to complete the installation procedure). Note using vibevoice it might take MUCH longer to generate a single Answer / question in VibeVoice quality because the full answer will be sent to the model to render a wave file of it first to get more human like context language. (i have tested this using different systems, it also work using a gen3 (yes, i really mean a intel Gen 3 CPU from 2012) intel i5 / nvidia rtx 4xxx)). You can diable the network connection now if you like and have fun listen to it.

Note: Even if the auto-answer mode is active you can send a prompt of your own to the current dialog, it will be used as the next prompt after the current and the ELIZA / Random Answer mode continue to work after that. Btw. if a dialog is getting too long the App will automaticly switch to a new chat but it will take some dialogs from the previous chat to new one that holds the context straight in flow (there are many options in the settings to modify the behave).

btw. the idea and description for the new project https://github.com/zeittresor/dark_matrix (Firefox Matrix-Syle extension working locally) is 99% created by the output of a (single) whole night while sleeping :-)

Requirements

  • Windows 10/11
  • Python 3.10+ for the GUI
  • A running local Ollama instance
  • One or more Ollama models already pulled locally

Folder layout

  • install_windows.bat – creates a local virtual environment and installs Python packages for the GUI
  • run_windows.bat – starts the GUI from the local venv
  • requirements.txt – Python requirements for the GUI
  • app/ – application source
  • app_data/ – created on first run; contains config, chats, audio, cache, optional TTS helper data

Install

Double-click:

install_windows.bat

Then start with:

run_windows.bat

Ollama setup

Default base URL:

http://127.0.0.1:11434

The app reads models from Ollama and lets you pick one from the dropdown.

TTS / "Vorlesen"

The GUI supports these modes:

  • Microsoft SAPI-compatible
  • Microsoft VibeVoice OpenAI-compatible
  • Disabled

If you enable vibevoice_openai, the GUI talks to a local endpoint like:

http://127.0.0.1:8880/v1

Generated audio is stored in:

app_data/audio/

New: TTS setup assistant

The setup assistant is now opened from Einstellungen so the main header stays cleaner.

The assistant can:

  • run one combined automatic setup flow (status check, FFmpeg check, wrapper download, setup where possible)
  • download the VibeVoice OpenAI-compatible wrapper into app_data/tts/vibevoice_openai/
  • create a dedicated wrapper venv when the Python version is suitable
  • install wrapper requirements
  • start and stop the wrapper server
  • open the TTS folder and its log file

Important behavior

  • The GUI itself still does not embed VibeVoice directly.
  • It manages a separate local wrapper server for cleaner isolation.
  • On the first actual server start, the wrapper may download model files and voice presets into the local models folder. This can take quite a while.
  • The setup assistant writes server output to:
    • app_data/tts/vibevoice_openai/server.log

Expected local wrapper paths

  • wrapper repo: app_data/tts/vibevoice_openai/repo/
  • wrapper models: app_data/tts/vibevoice_openai/models/
  • wrapper log: app_data/tts/vibevoice_openai/server.log

Data placement

To respect low-space system drives and portable usage, the app stores its own data next to the application:

  • config: app_data/config.json
  • chat history: app_data/chats/
  • audio files: app_data/audio/
  • cache env vars: HF_HOME, TRANSFORMERS_CACHE point into app_data/cache/ when launched through run_windows.bat
  • TTS helper files: app_data/tts/

Practical notes for VibeVoice on Windows

The currently targeted wrapper expects a fairly specific environment. In practice that means:

  • Python 3.13 is recommended/expected for the wrapper itself
  • ffmpeg should be available
  • an NVIDIA GPU / CUDA-capable setup is the intended fast path
  • first-run model downloads can be large

The assistant tries to make that visible and less confusing, but depending on the machine, manual adjustments may still be necessary.

For a zero-download local fallback, switch the TTS backend in Einstellungen to windows_sapi.

Stability notes in this build

  • Ollama chat now forces think: false by default for a simpler chat flow with thinking-capable models.
  • If the streaming endpoint returns no visible text, the GUI performs one non-stream fallback request instead of leaving an empty bubble.
  • Windows SAPI helper calls now decode PowerShell output more defensively on Windows systems with non-UTF-8 console output.
  • After install_windows.bat, the GUI auto-starts after 10 seconds unless you cancel it.

Windows-SAPI Aussprache-Optimierung

Wenn in den Einstellungen das Backend windows_sapi aktiv ist, kann optional die Checkbox Windows-SAPI Aussprache-Optimierung verwenden aktiviert werden.

Dann wird vor dem Vorlesen das lokale JSON-Lexikon unter app_data/tts/sapi_lexicon.json auf den bereits bereinigten TTS-Text angewendet. So lassen sich problematische Wörter, Abkürzungen oder Produktnamen gezielt anders aussprechen lassen.

Beispiel-Eintrag:

{
  "type": "word",
  "from": "Ollama",
  "to": "Olama",
  "case_sensitive": false
}

Unterstützt werden word (ganze Wörter) und phrase (ganze Wortfolgen).

Source

https://github.com/zeittresor/OllamaVibeDesk/