LIVEReading: Turn Text into Natural Speech with ChatTTSTotal time: 8 minSteps: 5Worked first time: 65% LIVEReading: Turn Text into Natural Speech with ChatTTSTotal time: 8 minSteps: 5Worked first time: 65%
CBW
Mediumgithub.com/2noise/ChatTTS2026-05-19

Turn Text into Natural Speech with ChatTTS

ChatTTS converts written text into natural-sounding spoken audio, with support for English and Chinese. Run it locally via a web interface — no coding required.

// Build stats

  • Total time8 min
  • Number of steps5
  • DifficultyMedium
  • Worked first time65%
// Before you start

What you need

  • Python 3.11 installed (python.org)
  • pip available in your terminal
  • 4 GB+ free disk space for model download
  • A working internet connection for first run
  • Windows, Mac, or Linux machine (GPU optional but faster)
01
Step 1 of 5

Clone the ChatTTS repository

2 min

This downloads all the project files to your computer. You'll run every command from inside the new folder it creates.

Terminal · mac
$ git clone https://github.com/2noise/ChatTTS
$ cd ChatTTS
What you should see
A folder called ChatTTS appears. Your terminal prompt now ends with ChatTTS.
This might happen

'git' is not recognized

Install Git from git-scm.com, then reopen your terminal and try again.

02
Step 2 of 5

Create an isolated Python environment

3 min

A virtual environment keeps ChatTTS's dependencies separate from everything else on your machine. This prevents version conflicts.

Terminal · mac
$ conda create -n chattts python=3.11 -y
$ conda activate chattts
What you should see
Your terminal prompt changes to show (chattts) at the start.
This might happen

conda command not found

If you don't have conda, use plain Python instead: python -m venv chattts_env then source chattts_env/bin/activate (Mac/Linux) or chattts_env\Scripts\activate (Windows).

03
Step 3 of 5

Install all required packages

5-10 min

This installs PyTorch, torchaudio, and every other library ChatTTS needs. The download can be large — give it time.

Terminal · mac
$ pip install --upgrade -r requirements.txt
What you should see
Lines of 'Successfully installed ...' scroll by. No red ERROR lines at the end.
This might happen

ERROR: Could not find a version that satisfies the requirement

Make sure you are inside the ChatTTS folder and your environment is activated. Run 'pip install --upgrade pip' first, then retry.

04
Step 4 of 5

Launch the web interface

3-8 min (first run downloads the model)

This starts a local web app in your browser. The very first launch downloads the speech model from HuggingFace — about 1-2 GB. Subsequent launches are fast.

Terminal · mac
$ python examples/web/webui.py
What you should see
Terminal shows 'Running on local URL: http://127.0.0.1:7860'. A browser tab may open automatically.
This might happen

Port 7860 is already in use

Another app is using that port. Stop it, or add '--server-port 7861' to the command.

05
Step 5 of 5

Generate your first audio clip

1-3 min

Open http://127.0.0.1:7860 in your browser. Type or paste English or Chinese text into the input box, then click the Generate button. The model will synthesize speech and play it back. Generation time depends on text length and whether you have a GPU.

Terminal · mac
$ open http://127.0.0.1:7860
What you should see
A web page with a text input and a Generate button. After clicking Generate, an audio player appears with your spoken output.
This might happen

Generation is very slow (several minutes)

Without a GPU, CPU inference is slow. Keep text short (1-2 sentences) for faster results. A CUDA-capable NVIDIA GPU speeds things up significantly.

// Status

cooked. baked. worked.

A locally running web UI where you type text and get a downloadable MP3 audio file of natural-sounding speech in English or Chinese.

// the honest bit

The honest part

The open-source model is licensed for academic and research use only — not commercial projects. Audio quality is intentionally limited (MP3 compression, added noise) as an anti-misuse measure. Only English and Chinese are supported right now. CPU-only machines will be slow; a GPU is strongly recommended for anything beyond short test clips.