Add HTTP service, MCP adapter, systemd autostart; fix bugs and docs

- chatterbox_cli_v4.py: cooperative stop/interrupt via threading.Event; fix force_split_sentence (word boundary instead of mid-word cut); fix synthesize_streaming normalization order (split before preprocess) - tts_service.py: FastAPI service with job queue, model cache, worker thread; LAN-accessible on 0.0.0.0:9999; audio_device default None (auto) - mcp_adapter.py: MCP adapter (stdio + streamable-http) wrapping REST API; update docstring and default TTS_URL to port 9999 - requirements.txt: add fastapi, uvicorn, httpx, mcp - README.md, BEDIENUNGSANLEITUNG.md: document service, MCP, AI integrations (Claude, Ollama, Open WebUI, llama.cpp, Home Assistant), systemd autostart - CLAUDE.md: reflect current architecture (service + adapter now implemented) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:19:00 +02:00 · 2026-05-16 10:19:00 +02:00 · d1971049ce
commit d1971049ce
parent bcf6374c29
7 changed files with 494 additions and 146 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -31,56 +31,105 @@ python chatterbox_cli_v4.py --lang de --pronunciation-dict aussprache.json --inp

 No build step, no test suite, no linter configuration — this is a single-file script.

+## Running the HTTP Service
+
+```bash
+# Läuft als systemd-User-Service (Autostart beim Login):
+systemctl --user status chatterbox-tts
+systemctl --user restart chatterbox-tts
+journalctl --user -u chatterbox-tts -f
+
+# Manuell starten (Port 9999, LAN-weit erreichbar):
+uvicorn tts_service:app --host 0.0.0.0 --port 9999
+
+# Health-Check:
+curl http://127.0.0.1:9999/health
+```
+
+Endpunkte: `POST /speak`, `POST /stop`, `GET /health`, `GET /status`, `GET /voices`
+
+## Running the MCP Adapter
+
+```bash
+# stdio (Claude Code / Claude Desktop) — bereits in ~/.claude.json konfiguriert:
+python mcp_adapter.py --stdio
+
+# HTTP-Transport (Port 8001):
+python mcp_adapter.py
+
+# Anderen TTS-Service ansprechen:
+TTS_URL=http://192.168.1.10:9999 python mcp_adapter.py --stdio
+```
+
 ## Architecture

-Everything lives in `chatterbox_cli_v4.py`. The processing pipeline is:
+### Files

-**Text input → normalization → chunking → TTS generation → audio output**
+| Datei | Funktion |
+|-------|----------|
+| `chatterbox_cli_v4.py` | Kern-CLI und alle Hilfsfunktionen; wird von `tts_service.py` importiert |
+| `tts_service.py` | FastAPI-Service mit Job-Queue und Worker-Thread |
+| `mcp_adapter.py` | MCP-Wrapper über die REST-API |
+
+### CLI pipeline (`chatterbox_cli_v4.py`)
+
+**Text input → `clean_raw_text` → chunking → `preprocess_tts_text` per chunk → TTS generation → audio output**
+
+Reihenfolge ist kritisch: erst splitten (Satzgrenzen auf Rohtext erkennen), dann normalisieren (Akronym-Punkte würden sonst falsche Satzgrenzen erzeugen).
+
+### Stop/Interrupt
+
+Modul-globales `threading.Event`:
+```python
+STOP_REQUESTED = threading.Event()
+request_stop()   # setzt das Event
+clear_stop()     # löscht es vor jedem neuen Job
+stop_requested() # abfragen
+```
+`PlaybackWorker` und beide Synthesize-Funktionen prüfen das Event an Chunk-Grenzen. Ein laufendes `model.generate()` kann nicht mid-call abgebrochen werden (Python-Thread-Grenzen) — der Abbruch greift am nächsten Chunk.

 ### Text normalization (`preprocess_tts_text`)
-Applied per chunk before synthesis. Order matters:
-1. Pronunciation dict substitutions (before acronym expansion, so proper names are caught first)
+
+1. Pronunciation dict (vor Akronym-Expansion, damit Eigennamen zuerst greifen)
 2. Unit normalization (120 km/h → "120 Kilometer pro Stunde")
 3. Time normalization (14:58 → "vierzehn Uhr achtundfünfzig")
 4. Year normalization (2026 → "zweitausendsechsundzwanzig")
-5. Acronym spelling (ARD → "Ah Er De"; skips entries in `NON_SPELLED_ACRONYMS`)
+5. Acronym spelling (ARD → "Ah Er De"; `NON_SPELLED_ACRONYMS` ausgenommen)

-`DEFAULT_PRONUNCIATION_DE` contains built-in German phonetic approximations (e.g. Xi → "Schi").
+`DEFAULT_PRONUNCIATION_DE` enthält eingebaute deutsche Lautschrift-Näherungen (z. B. Xi → "Schi").

 ### Text chunking
-Three modes (chosen by CLI flags):
- **sentence_mode** (default): `split_into_sentences()` — one sentence per TTS call, lowest latency to first audio
- **conversation_mode**: `split_for_conversation()` — first chunk is small (`--first-chunk-len`, default 80 chars), rest up to `--len` (400)
- **plain**: `split_long_text()` — paragraph-aware chunking up to `--len`

-`SENTENCE_END_RE` handles edge cases like ordinal numbers, ellipses, and CJK punctuation. `SEPARATOR_LINE_RE` silently drops lines like `--- Ende ---`.
+Drei Modi (CLI-Flags):
+- **sentence_mode** (default): `split_into_sentences()` — ein Satz pro TTS-Call, geringste Latenz
+- **conversation_mode**: `split_for_conversation()` — erster Chunk klein (`--first-chunk-len`, default 80), Rest bis `--len` (400)
+- **plain**: `split_long_text()` — absatzbasiertes Chunking bis `--len`
+
+`force_split_sentence` sucht bei Überlänge erst vorwärts zum nächsten Wortende — kein Schneiden mitten im Wort.

 ### Model loading (`load_model`)
- `--lang en` → `ChatterboxTTS` (mono, always available)
- Other languages → `ChatterboxMultilingualTTS` (requires multilingual package; `HAS_MULTILINGUAL` flag guards import)
- `--t3-model v3` (default) or `v2` selects the multilingual T3 checkpoint
- Models are downloaded to `~/.cache/huggingface/` on first use (~2–3 GB)
- **Critical**: `attn_implementation = "eager"` is forced at import time because SDPA returns `None` attention weights, breaking the `AlignmentStreamAnalyzer` hook
+
+- `--lang en` → `ChatterboxTTS` (mono, immer verfügbar)
+- Andere Sprachen → `ChatterboxMultilingualTTS` (`HAS_MULTILINGUAL`-Flag bewacht Import)
+- `--t3-model v3` (default) oder `v2` wählt den multilingualen T3-Checkpoint
+- Modelle werden in `~/.cache/huggingface/` gecacht (~2–3 GB)
+- **Kritisch**: `attn_implementation = "eager"` wird beim Import erzwungen — SDPA gibt `None`-Attention-Weights zurück und bricht den `AlignmentStreamAnalyzer`-Hook

 ### Audio output (`PlaybackWorker`)
- Uses `sounddevice.OutputStream` with a callback at 48 kHz (PipeWire/PulseAudio standard)
- Internal producer thread converts Torch tensors → `CALLBACK_BLOCK`-sized (2048 samples) numpy arrays
- If `--speed != 1.0`: pyrubberband R3-Engine (`--fine` flag) stretches time without pitch change before resampling
- Resampling: `torchaudio.functional.resample(chunk, model_sr, 48000)`
- `PlaybackWorker.stop()` sends `None` sentinel into the queue and joins the thread
+
+- `sounddevice.OutputStream` mit Callback bei 48 kHz (PipeWire/PulseAudio-Standard)
+- Interner Producer-Thread: Torch-Tensoren → `CALLBACK_BLOCK`-große (2048 Samples) numpy-Arrays
+- `--speed != 1.0`: pyrubberband R3-Engine (`--fine`) streckt Zeit ohne Pitch-Änderung, dann Resampling via `torchaudio.functional.resample(chunk, model_sr, 48000)`
+- `PlaybackWorker.stop()` schickt `None`-Sentinel in die Queue und jointed den Thread

 ### Two synthesis paths
- **`synthesize_non_streaming`**: generates each chunk fully, feeds finished tensors to `PlaybackWorker`, concatenates all wavs for `--save`
- **`synthesize_streaming`**: calls `model.generate_stream()` with `chunk_size`; each yielded audio sub-chunk goes directly to `PlaybackWorker`; marked experimental in docs

-## Planned extensions (Ideen/)
+- **`synthesize_non_streaming`**: generiert jeden Chunk vollständig, füttert fertige Tensoren in `PlaybackWorker`, concateniert alle WAVs für `--save`
+- **`synthesize_streaming`**: ruft `model.generate_stream()` mit `chunk_size` auf; jeder Audio-Sub-Chunk geht direkt in `PlaybackWorker`; experimentell

-The `Ideen/` folder documents a planned **REST/MCP bridge**:
- `tts_service.py` (FastAPI): `POST /speak`, `POST /stop`, `GET /health`, `GET /voices`
- `mcp_adapter.py`: thin MCP wrapper calling the REST API
- `chatterbox_backend.py`: imports `chatterbox_cli_v4.py` via `importlib` and calls `synthesize_non_streaming()` directly
+### HTTP Service (`tts_service.py`)

-Key gaps to address before building the service:
-1. **Stop/interrupt**: `PlaybackWorker.stop()` drains the audio queue, but a blocking `model.generate()` call cannot be interrupted mid-run. A `threading.Event`-based cancel token threaded through `synthesize_non_streaming` is the planned approach.
-2. **Model caching**: `load_model()` reloads from disk on every call; a service needs a per-language singleton.
-3. **Status object**: progress is `print()`-based; a service needs structured state.
+- **Modell-Cache**: `_model_cache: dict[(lang, t3_model), (model, kind, sr)]` — einmal laden, halten; Thread-sicher via `_model_lock`
+- **Job-Queue**: `queue.Queue[SpeakJob]` mit einzelnem Worker-Thread; verhindert parallelen GPU/Audio-Zugriff
+- **`SpeakRequest.interrupt`**: ruft `request_stop()` + `_drain_queue()` vor dem Einreihen auf
+- **Status**: `_current_job`, `_recent_jobs` (max. 20) via `_state_lock` thread-safe lesbar