Commit graph

46 commits

Author SHA1 Message Date
7ced974279 Add CDDB confirmation, cd-discid hint, CD-counter increment, check command
- interactive_rip: after CDDB lookup, show album name + tracklist and ask
  'Treffer korrekt? (j/n)' before renaming files; rip_disc gains rename=False
  option for deferred renaming
- interactive_rip: CD number prompt now shows disc_counter as default
  instead of always showing [1]
- _rip_with_abcde: when CDDB fails and cd-discid is not installed, print
  a visible hint with install command instead of silently doing nothing
- _stream_abcde: extract album name from CDDB header line (---- DTITLE ----)
  and return it as part of the result tuple
- _rename_files: early return when output directory does not exist
- check command (cli.py): already present from previous session

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 10:02:01 +01:00
2f80cb2693 Remove MusicBrainz retry logic — HTTP 200 means no data, not transient error
MusicBrainz always returns HTTP 200; an empty result set is definitive.
Retrying would never yield a different outcome.

- lookup_by_barcode(): retries parameter removed, random import removed
- Removed 3 retry-related tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:48:07 +01:00
09c01c9370 Fix CDDB parser for compilations and add grab-progress fallback
- _parse_cddb_lines now handles both 'Artist - Title' and 'Artist / Title'
  (slash separator used by abcde for compilation albums like Various Artists)
- _stream_abcde collects grab-progress lines (track N: Artist / Title)
  as a fallback TrackInfo source when no CDDB lines are found
- New _parse_grab_tracks() splits grab titles on ' / ' into artist+title
- 5 new tests (TestParseCddbLines.test_compilation_slash_separator,
  TestParseGrabTracks.*)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:42:03 +01:00
e75e5d7de0 feat: GnuDB fallback with retries when abcde CDDB lookup returns nothing
- New module cddb.py: direct GnuDB/FreeDB HTTP lookup using CDDB protocol,
  with same retry+random-delay logic as MusicBrainz barcode lookup
- get_discid() reads disc fingerprint via cd-discid before ripping
- If abcde returns no CDDB track data, lookup_by_discid() queries GnuDB
  directly (up to 3 retries, 2-6 s random pause between attempts)
- TrackInfo moved from ripper.py to models.py to break circular import
  (cddb.py and ripper.py both use TrackInfo)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 07:24:16 +01:00
65164d428c feat: retry MusicBrainz barcode lookup with random delay on empty result
Up to 3 retries with 2–6 s random wait between attempts, as MusicBrainz
occasionally returns no results on the first try. retries parameter is
configurable (default: 3).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 07:16:32 +01:00
468fac6d2b docs: document auto-rename of album directory in apply --in-place
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 06:53:24 +01:00
7554cade50 feat: auto-rename album directory after in-place apply
After all file operations (rename, tag, cover, playlist), apply now
renames the album root directory to match album.json metadata:

- input_dir = CD1/CD2 etc.: parent directory is renamed automatically
  e.g. Kärntner_Doppelsextett/ → Du_Berührst_Mi_20_Jahre_Kärntner_Doppelsextett/
- input_dir = album root: a hint with the mv command is printed instead
  (avoids renaming an actively used path)
- Existing directory with target name: warning, no rename

Also: _sanitize_filename() in organizer.py made public (sanitize_filename),
used consistently across organizer, playlist and cli modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 06:49:17 +01:00
c0e4d2aa85 fix: show clean error message when MusicBrainz barcode lookup fails
Catch ValueError (barcode not found) and httpx.HTTPError (network error)
in _scan_to_album and print a user-friendly message with hint instead of
a raw Python traceback. Also remove unused `call` import in test_ripper.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 06:18:12 +01:00
b70127e838 docs: document MusicBrainz barcode lookup in README and Bedienungsanleitung
- README: Schnellstart shows --barcode as fastest option
- Bedienungsanleitung:
  - Workflow diagram updated (EAN path, Varianten A-D)
  - Interactive rip example shows EAN prompt with MusicBrainz output
  - New Variante D: scan --barcode (no image, no OCR, no local LLM)
  - Variante C: corrected default model to qwen3-vl:235b-cloud
  - Tipps: barcode as first/fastest option, updated CDDB fallback hints

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 06:16:18 +01:00
b30aaa617d Add MusicBrainz barcode lookup (scan --barcode and interactive rip)
- New module musicbrainz.py: lookup_by_barcode() via EAN-13/UPC-12,
  two-step API (barcode search → release detail with recordings),
  respects 1 req/s rate limit with User-Agent header
- cli.py: scan command gets --barcode option as highest-priority mode
  (no images needed); _scan_to_album() dispatches to MusicBrainz first
- ripper.py: interactive_rip() prompts for optional EAN after album name;
  MusicBrainz data (incl. year) takes priority over CDDB for album.json;
  album_root.mkdir() added so JSON can be written even when MB changes dir
- tests: test_musicbrainz.py (16 tests), test_ripper.py +6 barcode tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 06:13:10 +01:00
6aba30c0e5 Default Vision-Model auf qwen3-vl:235b-cloud gesetzt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 05:56:44 +01:00
db47aa4456 Fix: Album.album akzeptiert null-Werte vom LLM
Wenn das LLM keinen Albumtitel erkennt (z.B. nur Ensemblename auf
dem Backcover), gibt es "album": null zurück. Statt mit
ValidationError abzubrechen, wird null jetzt in "" konvertiert.
Der Nutzer kann den leeren Titel in album.json manuell ergänzen.

Geändert:
- Album.album: str = "" (statt str ohne Default)
- field_validator mode="before", None → ""

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 05:49:37 +01:00
795be8609a Opus/M4A-Cover-Embedding, cover.py-Tests und OCR-Tests
- tagger.py: embed_cover() unterstützt jetzt .opus (Vorbis-Comment
  METADATA_BLOCK_PICTURE) und .m4a (MP4Cover); imports ergänzt
- test_tagger.py: 2 neue Tests für Opus/M4A; minimale Audio-Fixtures
  als base64-Konstanten (176 B Opus, 856 B M4A)
- test_cover.py: TestPrepareCover (5 Tests) und TestCopyCovers (6 Tests)
  für prepare_cover() und copy_covers()
- test_ocr.py: 13 Tests für run_ocr(), _detect_and_fix_rotation()
  und ocr_images(); Tesseract via subprocess.run gemockt

144 Tests, 0 Fehler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 04:50:13 +01:00
cfc2a2018e Tagger- und CLI-Tests; Bugfix embed_cover für MP3 ohne ID3-Header
- tests/test_tagger.py: 20 Tests für tag_file, tag_album,
  _scale_cover_for_embed, embed_cover (FLAC + MP3), embed_album_cover
- tests/test_cli.py: 14 Tests für apply (in-place, disc-mismatch,
  dry-run, playlist, multi-disc), check und scan (via Mock)
- tagger.py: embed_cover für MP3 fängt ID3NoHeaderError ab und
  erstellt einen neuen ID3-Tag wenn keiner vorhanden ist

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 04:37:07 +01:00
70c096cde4 Lint-Fixes, process-Disc-Validierung und Forgejo-CI
- ruff: Import-Sortierung, unused imports, Zeilenlängen behoben
- cli.py: _check_disc_counts_or_exit() extrahiert; auch process-Befehl
  prüft jetzt Disc-Anzahlen vor dem Umbenennen
- .forgejo/workflows/ci.yml: ruff + pytest auf push/PR gegen main

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 00:51:14 +01:00
88b89fbb50 LLM-Parser-Tests, check-Befehl und Cover-Doku
tests/test_llm_parser.py: 13 Tests für _call_ollama, _call_openai_compatible
  und parse_tracklist (Retry-Logik, Markdown-Block, Track-Artist, Mock)

cli: neuer check-Befehl zeigt Tags und Cover-Status aller Audiodateien;
  ♪ markiert Dateien mit eingebettetem Cover

BEDIENUNGSANLEITUNG: neuer Abschnitt 7 (check-Befehl), Cover-Konvention
  (frontcover.jpg/backcover.jpg, Embedding, 500px) in Schritt 3

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 00:45:49 +01:00
256be0ae33 Cover-Embedding: Auflösung auf 500px reduzieren vor dem Einbetten
Neue Hilfsfunktion _scale_cover_for_embed() skaliert das Coverbild
mit Pillow auf max. 500px (EMBED_COVER_MAX_SIZE) und kodiert es
als JPEG quality=85 in-memory. embed_cover() liest nicht mehr die
rohen Bytes der Originaldatei, sondern nutzt das skalierte Bild.

Ergebnis: eingebettete Cover ~40–100 KB statt 200–500 KB des
1200px-Originals, auf HiDPI-Displays noch scharf erkennbar.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 00:40:11 +01:00
3fa6237f94 Cover-Embedding: frontcover.jpg/backcover.jpg als Standard-Konvention
cover: find_cover() sucht frontcover.jpg/.png und backcover.jpg/.png;
  copy_covers() speichert als frontcover.jpg / backcover.jpg
tagger: embed_album_cover() bettet Frontcover in alle Tracks ein
cli: apply und process rufen embed_album_cover() nach copy_covers() auf
tests: TestFindCover mit 7 Tests (jpg, png, Symlink, Priorität, Negativ)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 00:35:28 +01:00
c9152cf19f Bedienungsanleitung aktualisieren: neues Dateinamen-Schema und CDDB→album.json
- Workflow-Diagramm: CDDB speichert album.json automatisch
- Rip-Ergebnis: korrektes Schema 01_-_Titel_-_Kuenstler.flac
- apply-Ergebnisse: Dateinamen angepasst
- album.json: optionales Track-artist-Feld erklärt
- Dateinamen-Schema-Abschnitt: vollständige Beschreibung

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 00:26:09 +01:00
4e6d82a41d Einheitliches Dateinamen-Schema: 01_-_Titel_-_Kuenstler.flac
organizer: Separator vor Titel angleichen (war: 01_Titel_-_K., neu: 01_-_Titel_-_K.)
playlist: Glob-Pattern und Fallback auf neues Schema angepasst
tests: Assertions aktualisiert

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 00:22:54 +01:00
bafea5f335 CDDB→album.json + LLM-Prompts mit Track-Künstler
ripper: nach erfolgreichem CDDB-Rip album.json im Album-Verzeichnis
speichern (Artist, Titel, alle Discs mit Track-Künstlern) — Workflow-
Lücke zwischen rip und apply geschlossen.

llm_parser, vision_llm: Prompts erklären das optionale Track-artist-
Feld; LLM setzt es nur wenn Track-Interpret vom Album-Künstler abweicht.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 00:16:44 +01:00
255496bd1b Add per-track artist to filename: 09_Titel_-_Kuenstler.flac
- Track model: add optional artist field (None = fall back to album artist)
- organizer: append _-_<artist> to each filename
- tagger: use track.artist over album.artist for the 'artist' tag
- playlist: widen glob pattern to match new _-_<artist> suffix
- tests: update assertions + add test for track-artist override

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 00:11:07 +01:00
775f274d02 Fix and expand tests: 63 tests passing, covers all core modules
Remove tests/ from .gitignore (was accidentally excluded).

- test_ripper.py: rewrite for current API (_parse_cddb_lines,
  _extract_tracks, _rename_files, _clean_input); fix default quality
- test_organizer.py: update filename assertions (spaces→underscores);
  add TestSanitizeFilename, TestCheckDiscCounts, in-place mode
- test_playlist.py: fix dummy filenames to underscore scheme;
  add multi-disc, filename sanitization, EXTINF and fallback tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 00:00:44 +01:00
734bc80b79 Update Bedienungsanleitung: in-place mode, Unterstrich-Schema, Disc-Validierung
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 23:52:17 +01:00
070a0573ae Add --in-place mode to apply: rename and tag without moving files
When no output_dir is given (or --in-place is set), files are renamed
and tagged directly in the source directory instead of being moved into
a separate Jellyfin library hierarchy.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 23:41:29 +01:00
67b8653b3c Fix tagger and playlist to use underscore filename pattern
After the organizer was updated to use underscores in filenames,
the tagger (glob pattern "01 *") and playlist generator (pattern
"01 title.*") still used spaces and failed to find any files.
Updated both to use "01_*" / "01_title.*" patterns.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 23:34:16 +01:00
8a39f7c41e Sanitize filenames: replace spaces/punctuation with underscores
Replace all non-word characters (spaces, punctuation, brackets, etc.)
with underscores in track titles, album names and artist names.
Collapse consecutive underscores to one, strip leading/trailing ones.
Umlauts (ä, ö, ü, ß) and digits are preserved.

Also use underscore instead of space between track number and title.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 23:25:28 +01:00
f4e49a3df6 Add disc count validation before apply
Check audio file count vs JSON track count per disc before processing.
Aborts with a clear error showing which discs have discrepancies and
whether tracks are missing from the JSON or audio files are missing
from the directory.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 23:08:24 +01:00
b599c9eb8a Fix default model, increase timeout, improve multi-column prompt
- Change default text-LLM from llama3 (not installed) to gemma3:12b
- Increase LLM timeout from 120s to 300s (large models need longer)
- Add explicit multi-column layout instruction to vision prompt to
  prevent skipping columns on dense CD back-cover tracklists

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 22:56:02 +01:00
8765e991b0 Fix broken abcde command: build output_fmt before cmd construction
cmd[-2] was overwriting the -a action value instead of the -o format
value when -P was appended last. Now output_fmt is computed upfront and
the cmd list is built cleanly without post-hoc index manipulation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 21:10:11 +01:00
430775adf8 Add live progress, progress bar and CDDB logging to ripper
- Replace capture_output=True with Popen+live streaming (_stream_abcde)
- Show track counter:  "Track 3/14  Title..."
- Show cdparanoia progress bar: [████████░░░░░░░░░░░░░░░░░░░░░░]  45.2%  12.3 MB
- Show CDDB album header and track list as they appear
- Show tagging progress: "Tagging 14/14"
- Print abcde command for full transparency
- Collect CDDB track lines while streaming for later parsing
- Log warnings when CDDB returns no data
- Print full renamed file list after ripping

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 20:25:30 +01:00
d0d64da12e Change default quality from medium to high
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 20:17:58 +01:00
f2d3684956 Fix file extraction: don't use abcde move, extract from temp dir ourselves
abcde's move action + OUTPUTFORMAT config failed because shell variables
like ${TRACKNUM} are evaluated immediately when the config is sourced.
Instead: skip move, search abcde's internal temp dir (abcde.XXXX/trackNN.flac)
and move files flat into output_dir ourselves.

- Replace _get_audio_files/_write_abcde_config with _extract_tracks()
- _rename_files() now matches track01.flac pattern (abcde naming)
- Fallback rename to 01.flac etc. when no CDDB data available

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 20:15:50 +01:00
096be97ba8 Fix three ripper bugs: input cleanup, file move, recursive search
- Add _clean_input(): strips ANSI escape codes (^[[D from arrow keys),
  control characters and surrounding quotes from user input
- Add _write_abcde_config(): writes temp abcde config with OUTPUTDIR
  and flat OUTPUTFORMAT so files land in the right directory
- Add 'move' action to abcde so encoded files are actually placed in
  OUTPUTDIR instead of staying in abcde's internal temp directory
- Change _get_audio_files() to use rglob() (recursive) so files in
  abcde subdirectories are found
- Improve error messages: include abcde output on failure

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 19:58:59 +01:00
3b6c37a32d Add work-in-progress warning to README and BEDIENUNGSANLEITUNG
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 17:47:41 +01:00
92af4eeb9c Add BEDIENUNGSANLEITUNG.md and update README.md
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 17:39:24 +01:00
851dbf3a46 Remove tests/ from repo, update .gitignore, improve ripper
- Remove tests/ directory from version control (added to .gitignore)
- Add .idea/ to .gitignore
- Ripper: CDDB lookup, non-interactive mode, English UI, file renaming
- Config: abcde format mapping, per-format quality options
- CLI: English help texts, new --no-cddb / --pipes / --parallel / --quality options

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 17:35:34 +01:00
8ecade5cdc Add --from-text mode and improve LLM parser robustness
- Add --from-text/-t option to scan and process commands for
  pre-formatted tracklists (e.g. from Perplexity)
- Refactor llm_parser to use Chat API instead of Generate API
- Reuse _extract_json() from vision_llm for robust JSON extraction
- Improve SYSTEM_PROMPT with strict rules (Various Artists, no
  invented years, no composer info in titles, /no_think)
- Remove format:"json" constraint that caused empty responses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 02:26:58 +01:00
3d91614e66 Add testdata/ to .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:35:35 +01:00
1753ab204f Add Vision-LLM mode for direct image-to-JSON extraction
Tesseract OCR fails on rotated/low-contrast CD back covers.
New vision_llm module sends images directly to qwen3-vl via
Ollama chat API, bypassing OCR entirely. Robust JSON extraction
handles thinking tags, markdown blocks, and empty responses.
CLI scan/process commands gain --vision flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:35:05 +01:00
686c4317d1 Remove CLAUDE.md from version control
File is now in .gitignore and kept only locally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:02:49 +01:00
3e073250ca Add project skeleton: CLI pipeline for CD digitization
Modular Python package with Typer CLI (scan/apply/process commands),
Pydantic data models, OCR via Tesseract, LLM-based tracklist parsing,
mutagen audio tagging, M3U playlist generation, and cover processing.
Includes 8 passing tests and ruff lint config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:00:12 +01:00
225f6b3dbf initial commit 2026-02-15 01:00:12 +01:00
a55bd8eabb initial commit 2026-02-15 01:00:12 +01:00
c7d9a3f0dc initial commit 2026-02-15 01:00:12 +01:00
036678cd07 Initial commit 2026-02-15 00:53:30 +01:00