Commit graph

10 commits

Author SHA1 Message Date
cd6c0ae185 Fix crash on vinyl track positions like 'A1', 'B2' from MusicBrainz
MusicBrainz returns vinyl track numbers as 'A1', 'B3' etc. instead of
plain integers. int('A1') raised ValueError crashing the entire album.

metadata_resolver.py: parse vinyl positions with regex before int():
- 'A1' → track 1, disc 1 (side A)
- 'B3' → track 3, disc 1 (side B)
- 'C1' → track 1, disc 2 (side C)
- Non-vinyl: extract first digit group via re.search

hint_extractor.py: guard int(tl_track) in tracklist matching with
try/except + re.search so any non-numeric track position is skipped
gracefully instead of crashing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:12:31 +02:00
aaa32622b2 Fix 'Unknown' track artists leaking from bad ID3 tags and classical schema
- hint_extractor: filter existing tags through _is_good() so 'Unknown',
  'Unknown Artist' etc. in existing ID3 tags don't override filename-parsed
  artist names
- executor: _is_classical() now returns False when track_artist is a placeholder
  ('unknown', 'unknown artist') — prevents pop albums from getting the
  Performer-Composer-Work filename schema
- executor/music_enricher: pass albumartist to _proposed_filename() so fallback
  works when track artist is missing; fix display to use albumartist fallback too

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:15:46 +02:00
701b05a75d Fix Jellyfin playlist integration and tracklist matching for single-CD albums
- hint_extractor: add _normalize_vertical_tracklist() to handle bare-number/
  title/duration format (Tufaranka-style tracklists)
- hint_extractor: fix level-1 tracklist match — allow disc_num=None (single-CD)
  by assuming disc=1; previously no tracklist title was ever applied to single-
  CD tracks because the guard required disc_num to be set
- music_enricher: register module in sys.modules before exec_module() so
  @dataclass definitions in jellyfin_playlist_generator work correctly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 07:58:41 +02:00
1960989eef Fix YouTube ID detection: use last _-token instead of broken lookbehind regex
The previous regex lookbehind (?<![A-Za-z0-9_-]) excluded _ as valid preceding
character, so IDs after underscores were never matched. New approach: split stem
by _ and check if the last token is an 11-char YouTube ID (mixed case + digit).
Also strips the ID token from the stem before _parse_filename() to prevent it
from leaking into the track title or being misread as an artist-title separator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:57:27 +02:00
b6abfae16c Add YouTube ID detection and metadata lookup via yt-dlp
- Extract 11-char YouTube video IDs from audio filenames
- Fetch title, uploader, chapters via yt-dlp (--dump-json)
- Use chapters as tracklist when no .txt tracklist is available
- Store yt_title / yt_uploader in AlbumHints for LLM prompt context
- Fall back to YouTube video title as track title for single-file albums

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
40a2ef3fb6 Add OCR fallback via Ollama Vision for albums without tracklist text
hint_extractor: _ocr_back_cover() sends back/inlay images to Ollama Vision
  when no tracklist .txt/.htm/.nfo is present. Model priority:
  qwen3-vl:latest → minicpm-v:latest → deepseek-ocr:latest (configurable
  via OLLAMA_OCR_MODEL env var). Timeout 180s. OCR text is fed into the
  same _parse_tracklist() pipeline as regular text files.

music_enricher: extract_hints(use_ocr=not args.no_api) — OCR is skipped
  with --no-api to allow fully offline/fast runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
28f716f8f2 Fix disc numbering consistency and false tracklist matches
executor: disc=1 now generates '1-01' prefix (same as disc=2 → '2-01'),
  so multi-disc albums have consistent D-TT scheme throughout.
  Single-disc tracks without disc tag stay as plain 'TT'.

hint_extractor: tracklist pattern 2 now requires '.' ')' or ':' as separator
  (not bare whitespace) — prevents false-positive matches like
  '2 x CD, Compilation, Remastered' being parsed as track 2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
d1391fc36a Robust tracklist matching: fuzzy titles, catalog numbers, correct disc/track
hint_extractor:
- _norm_for_match(): strips all non-alnum for punctuation-agnostic comparison
- _catalog_key(): extracts BWV/Op./K./HWV/... catalog number for matching
  (fixes abbreviated filenames like "Fantasia_Cm_BWV_562" vs "Fantasia In C Minor, BWV 562")
- Matching priority: exact number+disc → exact title → fuzzy title → catalog number
- Tracklist disc+track OVERRIDE M3U position when a match is found
  (M3U is only used as last fallback; fixes wrong alphabetical ordering)

metadata_resolver:
- LLM prompt now defines artist/albumartist roles explicitly
  (artist = composer for classical; albumartist = performer/interpreter)
- LLM albumartist can override dir_artist when confidence < 0.4
- _build_track_proposals: when track artist == albumartist (performer from filename),
  composer (album-level artist) is used as track artist instead
- Tracklist header (first lines before tracks) included in LLM prompt
  for label/year/album-title discovery
- import re added (was missing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
d91eb36007 fix: korrekte Track-Nummerierung, Scanner-Rekursion, M3U-Reihenfolge
scanner: nicht in Unterordner wenn Root Audio-Dateien enthält (verhindert
  Doppel-Scan bei versehentlichen Unterordner-Kopien); nur Disc-Ordner
  (CD1, Disc 2…) werden bei Multi-CD-Alben rekursiert.

hint_extractor: M3U/Playlist-Dateien als Track-Reihenfolge-Quelle; BOM-
  Bereinigung; Tracklist-Matching auch per Titel (nicht nur per Nummer);
  tracknumber=0 wird als 'keine Nummer' gewertet.

metadata_resolver: sequenzielle Fallback-Nummerierung (1,2,3…) für Tracks
  ohne Tracknummer — verhindert '00'-Präfix beim --rename; dir_artist hat
  Vorrang vor 'Various Artists'-Heuristik; LLM darf bei Konfidenz <0.3
  auch bestehende Werte korrigieren (Tippfehler im Verzeichnisnamen).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
f7cf520dbe Initial implementation of Music Metadata Enricher
AI-powered per-album pipeline: scan → local hints → MusicBrainz/Discogs/Claude
resolve → cover art → interactive or auto review → tag write + rename + report.
All external dependencies optional; 17/17 unit tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00