Musiksammlung/CLAUDE.md
dschlueter 3e073250ca Add project skeleton: CLI pipeline for CD digitization
Modular Python package with Typer CLI (scan/apply/process commands),
Pydantic data models, OCR via Tesseract, LLM-based tracklist parsing,
mutagen audio tagging, M3U playlist generation, and cover processing.
Includes 8 passing tests and ruff lint config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:00:12 +01:00

2 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Musiksammlung is a Python CLI tool that automates digitizing physical CD collections for use with Jellyfin. It orchestrates: CD ripping (via abcde), OCR of cover/back images (via Tesseract), LLM-based tracklist extraction, file renaming/tagging, and M3U playlist generation.

Build & Development Commands

pip install -e ".[dev]"          # Install in editable mode with dev deps
pytest tests/ -v                 # Run all tests
pytest tests/test_models.py -v   # Run a single test module
ruff check src/ tests/           # Lint
musiksammlung --help             # CLI entry point

Architecture

The pipeline flows: OCR → LLM → Organize → Tag → Playlist

  • models.py — Pydantic models (Album, Disc, Track) shared across all modules; the LLM JSON output validates directly into Album
  • cli.py — Typer CLI with three commands: scan (OCR+LLM→JSON), apply (JSON→files), process (full pipeline)
  • ocr.py — Tesseract wrapper with Pillow-based image preprocessing
  • llm_parser.py — Sends OCR text to LLM (Ollama or OpenAI-compatible), enforces JSON output, retries on parse failure
  • organizer.py — Builds source→target file mapping, handles single-disc and multi-disc layouts
  • tagger.py — Sets audio tags via mutagen (format-agnostic), optional cover embedding for FLAC/MP3
  • playlist.py — Generates M3U playlists with relative paths
  • ripper.py — Drives abcde via subprocess for CD ripping
  • cover.py — Resizes/converts cover images to JPEG for Jellyfin

Conventions

  • Python 3.11+, German variable names and comments are acceptable
  • Pydantic for data models, Typer for CLI, mutagen for audio tagging
  • External tools required at runtime: tesseract, abcde
  • The two-step workflow (scan → review JSON → apply) is the recommended default over the one-shot process command