🦜 VieNeu-TTS

VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning and English-Vietnamese bilingual support.

Important

🚀 VieNeu-TTS-v2 Turbo: Optimized for edge devices and extremely fast inference (CPU & Low-end devices).
Note: Quality is lower than the Standard VieNeu-TTS and may struggle with very short segments (< 5 words).
Version VieNeu-TTS-v2 (Non-Turbo) is coming soon!

✨ Key Features

Bilingual (English-Vietnamese): Smooth and natural transitions between languages powered by sea-g2p.
Instant Voice Cloning: Clone any voice with just 3-5 seconds of reference audio (Turbo v2 & GPU modes).
Ultra-Fast Turbo Mode: Optimized for both CPU (GGUF) and GPU (LMDeploy), offering the fastest inference in the VieNeu family.
AI Identification: Built-in audio watermarking for responsible AI content creation.
Production-Ready: High-quality 24 kHz waveform generation, fully offline.

Demo-VieNeu-TTS.mp4

🦜 1. Installation & Web UI

Setup with `uv` (Recommended)

uv is the fastest way to manage dependencies.

# Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Linux/macOS:
curl -LsSf https://astral.sh/uv/install.sh | sh

Clone the Repo:

git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS

Install Dependencies:
- Option 1: Minimal (Turbo/CPU) - Fast & Lightweight
  
  ⚠️ Note: This mode only supports VieNeu-TTS-v2-Turbo (CPU) — runs on any machine without a GPU, but audio quality is lower than Standard VieNeu-TTS (especially for short phrases < 5 words). Recommended for quick testing or deployment on low-end devices.
```
uv sync
```
- Option 2: Full (GPU/Standard) - High Quality & Cloning (For GPU users)
  
  💡 Note: Requires a CUDA-compatible NVIDIA GPU (CUDA version >= 12.8) or Apple Silicon MPS. NVIDIA Toolkit is required for maximum speed. Enables the full Standard VieNeu-TTS backbone for maximum audio quality and high-fidelity voice cloning.
```
uv sync --group gpu
```
Start the Web UI:
```
uv run vieneu-web
```
Access the UI at http://127.0.0.1:7860. The Turbo v2 model is selected by default for immediate use.

📦 2. Using the Python SDK (vieneu)

The vieneu SDK defaults to Turbo mode when used locally to prioritize extreme speed and real-time performance. To achieve maximum audio quality (Standard VieNeu-TTS), you should set up a Remote Server and use the SDK in remote mode.

Quick Start

# Minimal installation (Builds llama-cpp from source - may take a while)
pip install vieneu

# Optional: For Windows users (CPU pre-built)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/

# Optional: For macOS users (ARM64/Apple Silicon - Enables Metal GPU acceleration)
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/

from vieneu import Vieneu

# Initialize in Turbo mode (Default - Minimal dependencies)
tts = Vieneu()

# 1. Simple synthesis (uses default Southern Male voice 'Xuân Vĩnh')
text = "Hệ thống điện chủ yếu sử dụng alternating current because it is more efficient."
audio = tts.infer(text=text)

# Save to file
tts.save(audio, "output_Xuân Vĩnh.wav")
print("💾 Saved to output_Xuân Vĩnh.wav")

# 2. Using a specific Preset Voice
voices = tts.list_preset_voices()
for desc, voice_id in voices:
    print(f"Voice: {desc} (ID: {voice_id})")

my_voice_id = voices[1][1] if len(voices) > 1 else voices[0][1] # Giọng Phạm Tuyên
voice_data = tts.get_preset_voice(my_voice_id)

audio_custom = tts.infer(text="Tôi đang nói bằng giọng của Bác sĩ Tuyên.", voice=voice_data)

# 3. Save to file
tts.save(audio_custom, "output_Phạm Tuyên.wav")
print("💾 Saved to output_Phạm Tuyên.wav")

🦜 Zero-shot Voice Cloning (SDK)

Clone any voice with only 3-5 seconds of audio using the local Turbo engine:

from vieneu import Vieneu

tts = Vieneu() # Defaults to Turbo mode

# 1. Encode the reference audio
# Supported formats: .wav, .mp3, .flac (5-10 seconds recommended)
my_voice = tts.encode_reference("examples/audio_ref/example.wav")

# 2. Synthesize with the cloned voice
# No reference text required for Turbo v2!
audio = tts.infer(
    text="Đây là giọng nói được clone trực tiếp bằng SDK của VieNeu-TTS.", 
    voice=my_voice  # accepts numpy array from encode_reference() or preset dict from get_preset_voice()
)

tts.save(audio, "cloned_voice.wav")

🐳 3. High-Quality Server (Standard Mode)

Deploy VieNeu-TTS as a high-performance API Server (powered by LMDeploy) with a single command.

1. Run with Docker (Recommended)

Requirement: NVIDIA Container Toolkit is required for GPU support.

Start the Server with a Public Tunnel (No port forwarding needed):

docker run --gpus all -p 23333:23333 pnnbao/vieneu-tts:serve --tunnel

Default: The server loads the VieNeu-TTS model for maximum quality.
Tunneling: The Docker image includes a built-in bore tunnel. Check the container logs to find your public address (e.g., bore.pub:31631).

2. Using the SDK (Remote Mode)

Once the server is running, you can connect from anywhere (Colab, Web Apps, etc.) without loading heavy models locally:

from vieneu import Vieneu
import os

# Configuration
REMOTE_API_BASE = 'http://your-server-ip:23333/v1'  # Or bore tunnel URL
REMOTE_MODEL_ID = "pnnbao-ump/VieNeu-TTS"

# Initialization (LIGHTWEIGHT - only loads small codec locally)
tts = Vieneu(mode='remote', api_base=REMOTE_API_BASE, model_name=REMOTE_MODEL_ID)
os.makedirs("outputs", exist_ok=True)

# List remote voices
available_voices = tts.list_preset_voices()
for desc, name in available_voices:
    print(f"   - {desc} (ID: {name})")

# Use specific voice (dynamically select second voice)
if available_voices:
    _, my_voice_id = available_voices[1]
    voice_data = tts.get_preset_voice(my_voice_id)
    audio_spec = tts.infer(text="Chào bạn, tôi đang nói bằng giọng của bác sĩ Tuyên.", voice=voice_data)
    tts.save(audio_spec, f"outputs/remote_{my_voice_id}.wav")
    print(f"💾 Saved synthesis to: outputs/remote_{my_voice_id}.wav")

# Standard synthesis (uses default voice)
text_input = "Chế độ remote giúp tích hợp VieNeu vào ứng dụng Web hoặc App cực nhanh mà không cần GPU tại máy khách."
audio = tts.infer(text=text_input)
tts.save(audio, "outputs/remote_output.wav")
print("💾 Saved remote synthesis to: outputs/remote_output.wav")

# Zero-shot voice cloning (encodes audio locally, sends codes to server)
if os.path.exists("examples/audio_ref/example_ngoc_huyen.wav"):
    cloned_audio = tts.infer(
        text="Đây là giọng nói được clone và xử lý thông qua VieNeu Server.",
        ref_audio="examples/audio_ref/example_ngoc_huyen.wav",
        ref_text="Tác phẩm dự thi bảo đảm tính khoa học, tính đảng, tính chiến đấu, tính định hướng."
    )
    tts.save(cloned_audio, "outputs/remote_cloned_output.wav")
    print("💾 Saved remote cloned voice to: outputs/remote_cloned_output.wav")

For full implementation details, see: examples/main_remote.py

Voice Preset Specification (v1.0)

VieNeu-TTS uses the official vieneu.voice.presets specification to define reusable voice assets. Only voices.json files following this spec are guaranteed to be compatible with VieNeu-TTS SDK ≥ v1.x.

3. Advanced Configuration

Customize the server to run specific versions or your own fine-tuned models.

Run the 0.3B Model (Faster):

docker run --gpus all pnnbao/vieneu-tts:serve --model pnnbao-ump/VieNeu-TTS-0.3B --tunnel

Serve a Local Fine-tuned Model: If you have merged a LoRA adapter, mount your output directory to the container:

# Linux / macOS
docker run --gpus all \
  -v $(pwd)/finetune/output:/workspace/models \
  pnnbao/vieneu-tts:serve \
  --model /workspace/models/merged_model --tunnel

🔬 4. Model Overview

Model	Format	Device	Bilingual	Cloning	Speed
VieNeu-v2-Turbo	GGUF/ONNX	CPU/Edge	✅	✅ Yes	Extreme (Fastest)
VieNeu-TTS-v2	PyTorch	GPU	✅	✅ Yes	Standard (Coming soon)
VieNeu-TTS 0.3B	PyTorch	GPU/CPU	❌	✅ Yes	Very Fast
VieNeu-TTS	PyTorch	GPU/CPU	❌	✅ Yes	Standard

Tip

Use Turbo v2 for AI assistants, chatbots, and real-time edge applications where speed is critical. Note: It may have stability issues with very short phrases (< 5 words). Use GPU/Standard (VieNeu-TTS v1/v2) for maximum audio quality and high-fidelity voice cloning.

🚀 5. Roadmap

VieNeu-TTS-v2 Turbo: English-Vietnamese code-switching support.
VieNeu-Codec: Optimized neural codec for Vietnamese (ONNX).
VieNeu-TTS-v2 (Non-Turbo): Full high-fidelity bilingual architecture with instant Voice Cloning and LMDeploy GPU acceleration support.
Turbo Voice Cloning: Bringing instant cloning to the lightweight Turbo engine.
Mobile SDK: Official support for Android/iOS deployment.

🤝 6. Support & Contact

Hugging Face: pnnbao-ump
Discord: Join our community
Facebook: Pham Nguyen Ngoc Bao
License: Apache 2.0 (Free to use).

📑 7. Citation

@misc{vieneutts2026,
  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
  author       = {Pham Nguyen Ngoc Bao},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}

🌟 Star History

🤝 Contributors

Thanks to all the amazing people who have contributed to this project!

🙏 Acknowledgements

This project uses neucodec for audio decoding and sea-g2p for text normalization and phonemization.

Made with ❤️ for the Vietnamese TTS community

Name		Name	Last commit message	Last commit date
Latest commit History 469 Commits
.github		.github
apps		apps
client		client
docker		docker
docs		docs
examples		examples
finetune		finetune
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README.vi.md		README.vi.md
README_PYPI.md		README_PYPI.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements_xpu.txt		requirements_xpu.txt
run_xpu.bat		run_xpu.bat
setup_xpu_uv.bat		setup_xpu_uv.bat
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦜 VieNeu-TTS

✨ Key Features

📌 Table of Contents

🦜 1. Installation & Web UI

Setup with `uv` (Recommended)

📦 2. Using the Python SDK (vieneu)

Quick Start

🦜 Zero-shot Voice Cloning (SDK)

🐳 3. High-Quality Server (Standard Mode)

1. Run with Docker (Recommended)

2. Using the SDK (Remote Mode)

Voice Preset Specification (v1.0)

3. Advanced Configuration

🔬 4. Model Overview

🚀 5. Roadmap

🤝 6. Support & Contact

📑 7. Citation

🌟 Star History

🤝 Contributors

🙏 Acknowledgements

About

Uh oh!

Releases 11

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🦜 VieNeu-TTS

✨ Key Features

📌 Table of Contents

🦜 1. Installation & Web UI

Setup with uv (Recommended)

📦 2. Using the Python SDK (vieneu)

Quick Start

🦜 Zero-shot Voice Cloning (SDK)

🐳 3. High-Quality Server (Standard Mode)

1. Run with Docker (Recommended)

2. Using the SDK (Remote Mode)

Voice Preset Specification (v1.0)

3. Advanced Configuration

🔬 4. Model Overview

🚀 5. Roadmap

🤝 6. Support & Contact

📑 7. Citation

🌟 Star History

🤝 Contributors

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Setup with `uv` (Recommended)

Packages