Clone the ChatTTS repository
This downloads all the project files to your computer. You'll run every command from inside the new folder it creates.
'git' is not recognized
Install Git from git-scm.com, then reopen your terminal and try again.
ChatTTS converts written text into natural-sounding spoken audio, with support for English and Chinese. Run it locally via a web interface — no coding required.
This downloads all the project files to your computer. You'll run every command from inside the new folder it creates.
'git' is not recognized
Install Git from git-scm.com, then reopen your terminal and try again.
A virtual environment keeps ChatTTS's dependencies separate from everything else on your machine. This prevents version conflicts.
conda command not found
If you don't have conda, use plain Python instead: python -m venv chattts_env then source chattts_env/bin/activate (Mac/Linux) or chattts_env\Scripts\activate (Windows).
This installs PyTorch, torchaudio, and every other library ChatTTS needs. The download can be large — give it time.
ERROR: Could not find a version that satisfies the requirement
Make sure you are inside the ChatTTS folder and your environment is activated. Run 'pip install --upgrade pip' first, then retry.
This starts a local web app in your browser. The very first launch downloads the speech model from HuggingFace — about 1-2 GB. Subsequent launches are fast.
Port 7860 is already in use
Another app is using that port. Stop it, or add '--server-port 7861' to the command.
Open http://127.0.0.1:7860 in your browser. Type or paste English or Chinese text into the input box, then click the Generate button. The model will synthesize speech and play it back. Generation time depends on text length and whether you have a GPU.
Generation is very slow (several minutes)
Without a GPU, CPU inference is slow. Keep text short (1-2 sentences) for faster results. A CUDA-capable NVIDIA GPU speeds things up significantly.
A locally running web UI where you type text and get a downloadable MP3 audio file of natural-sounding speech in English or Chinese.
The open-source model is licensed for academic and research use only — not commercial projects. Audio quality is intentionally limited (MP3 compression, added noise) as an anti-misuse measure. Only English and Chinese are supported right now. CPU-only machines will be slow; a GPU is strongly recommended for anything beyond short test clips.