// Before you start

What you need

Python 3.11 installed (python.org)
pip available in your terminal
4 GB+ free disk space for model download
A working internet connection for first run
Windows, Mac, or Linux machine (GPU optional but faster)

Step 1 of 5

Clone the ChatTTS repository

2 min

This downloads all the project files to your computer. You'll run every command from inside the new folder it creates.

Terminal · mac

$ git clone https://github.com/2noise/ChatTTS

$ cd ChatTTS

What you should see

A folder called ChatTTS appears. Your terminal prompt now ends with ChatTTS.

This might happen

'git' is not recognized

Install Git from git-scm.com, then reopen your terminal and try again.

Step 2 of 5

Create an isolated Python environment

3 min

A virtual environment keeps ChatTTS's dependencies separate from everything else on your machine. This prevents version conflicts.

Terminal · mac

$ conda create -n chattts python=3.11 -y

$ conda activate chattts

What you should see

Your terminal prompt changes to show (chattts) at the start.

This might happen

conda command not found

If you don't have conda, use plain Python instead: python -m venv chattts_env then source chattts_env/bin/activate (Mac/Linux) or chattts_env\Scripts\activate (Windows).

Step 3 of 5

Install all required packages

5-10 min

This installs PyTorch, torchaudio, and every other library ChatTTS needs. The download can be large — give it time.

Terminal · mac

$ pip install --upgrade -r requirements.txt

What you should see

Lines of 'Successfully installed ...' scroll by. No red ERROR lines at the end.

This might happen

ERROR: Could not find a version that satisfies the requirement

Make sure you are inside the ChatTTS folder and your environment is activated. Run 'pip install --upgrade pip' first, then retry.

Step 4 of 5

Launch the web interface

3-8 min (first run downloads the model)

This starts a local web app in your browser. The very first launch downloads the speech model from HuggingFace — about 1-2 GB. Subsequent launches are fast.

Terminal · mac

$ python examples/web/webui.py

What you should see

Terminal shows 'Running on local URL: http://127.0.0.1:7860'. A browser tab may open automatically.

This might happen

Port 7860 is already in use

Another app is using that port. Stop it, or add '--server-port 7861' to the command.

Step 5 of 5

Generate your first audio clip

1-3 min

Open http://127.0.0.1:7860 in your browser. Type or paste English or Chinese text into the input box, then click the Generate button. The model will synthesize speech and play it back. Generation time depends on text length and whether you have a GPU.

Terminal · mac

$ open http://127.0.0.1:7860

What you should see

A web page with a text input and a Generate button. After clicking Generate, an audio player appears with your spoken output.

This might happen

Generation is very slow (several minutes)

Without a GPU, CPU inference is slow. Keep text short (1-2 sentences) for faster results. A CUDA-capable NVIDIA GPU speeds things up significantly.

// Status

cooked. baked. worked.

A locally running web UI where you type text and get a downloadable MP3 audio file of natural-sounding speech in English or Chinese.

// the honest bit

The honest part

The open-source model is licensed for academic and research use only — not commercial projects. Audio quality is intentionally limited (MP3 compression, added noise) as an anti-misuse measure. Only English and Chinese are supported right now. CPU-only machines will be slow; a GPU is strongly recommended for anything beyond short test clips.

Turn Text into Natural Speech with ChatTTS

// Build stats

What you need

Clone the ChatTTS repository

Create an isolated Python environment

Install all required packages

Launch the web interface

Generate your first audio clip

cooked. baked. worked.

The honest part