Commit graph

32 commits

Author SHA1 Message Date
cd6c0ae185 Fix crash on vinyl track positions like 'A1', 'B2' from MusicBrainz
MusicBrainz returns vinyl track numbers as 'A1', 'B3' etc. instead of
plain integers. int('A1') raised ValueError crashing the entire album.

metadata_resolver.py: parse vinyl positions with regex before int():
- 'A1' → track 1, disc 1 (side A)
- 'B3' → track 3, disc 1 (side B)
- 'C1' → track 1, disc 2 (side C)
- Non-vinyl: extract first digit group via re.search

hint_extractor.py: guard int(tl_track) in tracklist matching with
try/except + re.search so any non-numeric track position is skipped
gracefully instead of crashing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:12:31 +02:00
388a9ffd08 Add --skip-complete: skip already-enriched albums in batch runs
- _album_is_complete(album_dir): checks cover presence + sampled tag quality
  (first/last/middle files); returns (bool, problems_list)
  Sampling strategy: covers first, last and up to 3 middle files to catch
  albums where only some tracks were tagged
- _print_status() now uses _album_is_complete() internally (DRY)
- --skip-complete flag: filters album_dirs before the main loop, prints
  how many were skipped upfront
- Typical batch command:
    python3 music_enricher.py --auto --confidence 0.1 --rename --embed-cover \
        --no-fingerprint --skip-complete ~/nvme2n1p7_home/Musik

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:05:51 +02:00
80472653b4 Add 4 new cover/tracklist sources: MB back cover, iTunes, Last.fm, Discogs tracklist
cover_handler.py:
- _download_image(): shared helper replaces duplicated download logic
- download_back_cover(): fetches back cover from MusicBrainz CAA (/back endpoint),
  saves as back.jpg; skips if already present
- _itunes_cover_url() / download_itunes_cover(): iTunes Search API (no auth),
  requests 600x600 artwork; fallback after Discogs
- _lastfm_cover_url() / download_lastfm_cover(): Last.fm album.getinfo
  (LASTFM_API_KEY env var); last cover fallback, skips placeholder images
- resolve_cover(): extended with iTunes → Last.fm fallback chain

metadata_resolver.py:
- _discogs_get_tracklist(): fetches full Discogs release via REST API,
  parses tracklist[] including heading-based disc detection
- _lastfm_tracklist(): fetches Last.fm album.getinfo tracks (LASTFM_API_KEY)
- resolve(): uses Discogs tracklist → Last.fm tracklist as fallback when
  MusicBrainz returns no tracks; LASTFM_API_KEY added to env var block

music_enricher.py:
- process_album(): calls download_back_cover() after execute_album() when MBID known

New cover priority:  local → MusicBrainz front → Discogs → iTunes → Last.fm
New tracklist priority: local → YouTube → MusicBrainz → Discogs → Last.fm → OCR
Test suite: 29 → 33 tests (all pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:55:17 +02:00
071f4c5e1d Expand test suite from 17 to 29 tests covering all new features
New test cases:
- UNIT_18-20: vertical tracklist parser (basic, no-duration, no false-positives)
- UNIT_21:    single-CD tracklist match without disc_number
- UNIT_22-24: genre normalization (German, English variants, titlecase)
- UNIT_25-28: _is_classical() — correct triggers and false-positive prevention
- UNIT_29:    cover normalization (Front.jpg → folder.jpg rename)

All 29/29 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:45:28 +02:00
ec8a37f313 Improve _is_classical(): genre keywords + composer list as primary signals
Previously any albumartist≠track_artist triggered classical naming, causing
false positives for jazz compilations, folk samplers, pop albums with
multiple featured artists. Now requires explicit confirmation:
- Genre contains a classical keyword (classical, baroque, opera, symphon …)
- OR track_artist name contains a known composer (Bach, Mozart, Beethoven …)
Pure name-inequality alone no longer triggers the Performer-Composer-Work schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:44:34 +02:00
031e595ff7 Add Discogs as cover source fallback after MusicBrainz
- _discogs_cover_url(): searches Discogs database/search API by artist+album,
  returns primary image URL; uses DISCOGS_TOKEN if set, else anonymous
- download_discogs_cover(): downloads and saves as folder.jpg (PNG→JPEG via PIL)
- resolve_cover() priority: local → MusicBrainz → Discogs
- music_enricher: pass artist/album to resolve_cover() for Discogs lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:44:04 +02:00
8212a918dd Add genre normalization: German/variant genres → canonical Jellyfin names
_GENRE_MAP translates common German genre names (Volksmusik, Schlager,
Marschmusik, Klassik …) and English variants (rhythm and blues, swing music …)
to consistent Jellyfin-friendly labels. All-upper or all-lower genres without
a mapping entry are title-cased. Applied in resolve() before building AlbumProposal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:43:12 +02:00
b54d83ecb5 Add --status flag: library health report (missing covers, bad tags)
Scans all album directories and reports:
- Albums without any cover image
- Albums where the first 3 audio files have missing/placeholder tags
  (title or artist empty, 'Unknown', 'AudioTrack')
Exits without writing anything.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:42:26 +02:00
d06c0bbcc9 Normalize all cover art to folder.jpg (Jellyfin standard)
- Add normalize_cover_to_folder_jpg(): renames/converts any local cover
  (Front.jpg, front.webp, cover.jpg, …) to folder.jpg in-place; WebP/PNG
  are converted to JPEG via PIL
- resolve_cover() now calls normalize_cover_to_folder_jpg() automatically
  after finding a local cover, so future enrichment runs always produce folder.jpg
- One-time batch: 38 existing library covers renamed/converted to folder.jpg

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:41:43 +02:00
7516de439f Save downloaded covers as folder.jpg (Jellyfin standard), PNG→JPEG on download
- download_cover() now writes folder.jpg instead of _cover_download{ext}
- PNG responses are converted to JPEG via PIL during download (avoids PNG
  in the album directory entirely)
- find_local_cover() priority: folder > front > cover > album (folder.jpg
  is now the canonical name for both downloaded and manually placed covers)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:26:33 +02:00
aaa32622b2 Fix 'Unknown' track artists leaking from bad ID3 tags and classical schema
- hint_extractor: filter existing tags through _is_good() so 'Unknown',
  'Unknown Artist' etc. in existing ID3 tags don't override filename-parsed
  artist names
- executor: _is_classical() now returns False when track_artist is a placeholder
  ('unknown', 'unknown artist') — prevents pop albums from getting the
  Performer-Composer-Work filename schema
- executor/music_enricher: pass albumartist to _proposed_filename() so fallback
  works when track artist is missing; fix display to use albumartist fallback too

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:15:46 +02:00
b64a4d0922 Fix disc prefix for single-CD albums (disc=1 must not produce '1-TT' filenames)
disc_number=1 is now treated identical to disc_number=None: no 'D-' prefix in
filenames, no discnumber tag written. The D-TT prefix and discnumber tag are
only applied for genuine multi-CD albums (disc_number > 1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:05:49 +02:00
701b05a75d Fix Jellyfin playlist integration and tracklist matching for single-CD albums
- hint_extractor: add _normalize_vertical_tracklist() to handle bare-number/
  title/duration format (Tufaranka-style tracklists)
- hint_extractor: fix level-1 tracklist match — allow disc_num=None (single-CD)
  by assuming disc=1; previously no tracklist title was ever applied to single-
  CD tracks because the guard required disc_num to be set
- music_enricher: register module in sys.modules before exec_module() so
  @dataclass definitions in jellyfin_playlist_generator work correctly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 07:58:41 +02:00
776c977573 Recursive album discovery + Jellyfin Playlist Generator integration
scanner.py: collect_album_dirs() now recursively finds album dirs
- Dirs with audio files at root → album
- Dirs with disc subdirs (CD1/CD2) and no root audio → multi-CD album
- Container dirs without audio → recurse into subdirs

music_enricher.py:
- After execute_album(), auto-discovers jellyfin_playlist_generator.py
  in ../Jellyfin_Playlist_Generator/ (or via --playlist-generator PATH)
- Calls generate_playlist() directly via importlib — no subprocess,
  no destructive cleanup_all_playlists, targeted to the enriched album
- New --playlist-generator CLI option for custom generator path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 07:07:55 +02:00
1cb5a8fb8d Add BEDIENUNGSANLEITUNG.md (German user manual)
Covers: album directory structure, YouTube ID handling, typical workflows
(dry-run → live), all CLI options, filename schemas, confidence levels,
error handling, backup/restore, and environment variables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 06:52:44 +02:00
5c7b6759ff Update README: Ollama/OpenRouter LLM, OCR, YouTube, WebP, underscore schema
- Replace Claude API references with Ollama → OpenRouter chain
- Add YouTube ID detection, OCR back cover, WebP cover support
- Fix filename schema examples (spaces → underscores, _-_ separator)
- Add classical naming schema with Performer/Composer distinction
- Add Ollama env vars (OLLAMA_HOST, OLLAMA_RESOLVE_MODEL, OLLAMA_OCR_MODEL)
- Update pipeline diagram with OCR and YouTube steps
- Add yt-dlp to prerequisites

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 06:03:39 +02:00
787803bb7b Fix file permissions after rebase (644 → 755)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 06:01:38 +02:00
1960989eef Fix YouTube ID detection: use last _-token instead of broken lookbehind regex
The previous regex lookbehind (?<![A-Za-z0-9_-]) excluded _ as valid preceding
character, so IDs after underscores were never matched. New approach: split stem
by _ and check if the last token is an 11-char YouTube ID (mixed case + digit).
Also strips the ID token from the stem before _parse_filename() to prevent it
from leaking into the track title or being misread as an artist-title separator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:57:27 +02:00
f86db982a5 Support WebP cover images: convert to JPEG via PIL, correct MIME type fallback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:50:46 +02:00
b2dd0df052 Add project-specific .gitignore entries
Exclude: pycache dirs, unnamed.git, report.csv, backups, generated playlists, temp files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:44:05 +02:00
b6abfae16c Add YouTube ID detection and metadata lookup via yt-dlp
- Extract 11-char YouTube video IDs from audio filenames
- Fetch title, uploader, chapters via yt-dlp (--dump-json)
- Use chapters as tracklist when no .txt tracklist is available
- Store yt_title / yt_uploader in AlbumHints for LLM prompt context
- Fall back to YouTube video title as track title for single-file albums

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
888464b4d0 Regenerate M3U playlist after rename with correct order and durations
_update_m3u(): writes #EXTM3U + #EXTINF:seconds,Artist - Title + filename
per track, in disc/track order (same order as the renamed files).
Duration is read from mutagen; -1 if unavailable.

execute_album(): after renaming, finds existing *.m3u / *.m3u8 in the
album directory and overwrites it. Only triggered when files_renamed > 0
and a playlist file exists — never creates a new one from scratch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
40a2ef3fb6 Add OCR fallback via Ollama Vision for albums without tracklist text
hint_extractor: _ocr_back_cover() sends back/inlay images to Ollama Vision
  when no tracklist .txt/.htm/.nfo is present. Model priority:
  qwen3-vl:latest → minicpm-v:latest → deepseek-ocr:latest (configurable
  via OLLAMA_OCR_MODEL env var). Timeout 180s. OCR text is fed into the
  same _parse_tracklist() pipeline as regular text files.

music_enricher: extract_hints(use_ocr=not args.no_api) — OCR is skipped
  with --no-api to allow fully offline/fast runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
28f716f8f2 Fix disc numbering consistency and false tracklist matches
executor: disc=1 now generates '1-01' prefix (same as disc=2 → '2-01'),
  so multi-disc albums have consistent D-TT scheme throughout.
  Single-disc tracks without disc tag stay as plain 'TT'.

hint_extractor: tracklist pattern 2 now requires '.' ')' or ':' as separator
  (not bare whitespace) — prevents false-positive matches like
  '2 x CD, Compilation, Remastered' being parsed as track 2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
d1391fc36a Robust tracklist matching: fuzzy titles, catalog numbers, correct disc/track
hint_extractor:
- _norm_for_match(): strips all non-alnum for punctuation-agnostic comparison
- _catalog_key(): extracts BWV/Op./K./HWV/... catalog number for matching
  (fixes abbreviated filenames like "Fantasia_Cm_BWV_562" vs "Fantasia In C Minor, BWV 562")
- Matching priority: exact number+disc → exact title → fuzzy title → catalog number
- Tracklist disc+track OVERRIDE M3U position when a match is found
  (M3U is only used as last fallback; fixes wrong alphabetical ordering)

metadata_resolver:
- LLM prompt now defines artist/albumartist roles explicitly
  (artist = composer for classical; albumartist = performer/interpreter)
- LLM albumartist can override dir_artist when confidence < 0.4
- _build_track_proposals: when track artist == albumartist (performer from filename),
  composer (album-level artist) is used as track artist instead
- Tracklist header (first lines before tracks) included in LLM prompt
  for label/year/album-title discovery
- import re added (was missing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
5011cef4db Underscore filename schema, classical detection, NameToUnix post-processing
Pop schema:      TT_-_Artist_-_Title.ext
Classical schema: TT_-_Performer_-_Komponist_-_Werk[-_Orchester_Dirigent].ext
  triggered when albumartist ≠ track artist (pianist vs composer)

All spaces in names → underscores; separator _-_ between parts.
Missing parts (orchestra, conductor) are omitted.

models.py: added conductor/orchestra optional fields to TrackProposal.
executor.py: sanitize_dir_names() tries NameToUnix first, falls back to detox.
  Called after all renames in a directory are complete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
8bd48cf166 Include albumartist in filename; remove Claude API from LLM chain
Filename schema now: TT - AlbumArtist - TrackArtist - Title when albumartist
differs from track artist (e.g. pianist vs. composer). Identical artist → old
two-part format unchanged.

metadata_resolver: removed Claude API fallback entirely from _claude_resolve.
Chain is now Ollama (local, free) → OpenRouter (DeepSeek V3, cheap) only.

music_enricher: updated status line and use_claude flag accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
460b92aab3 Fix Invalid ID3TimeStamp error when writing date tags
Strip non-timestamp characters (BOM, invisible chars) from date/year values
both when reading existing tags in metadata_resolver and when writing in
executor. Also harden the EasyID3 except block to not wipe existing tags
when adding a missing ID3 header, and add per-field try/except in MP3 tag
writing so one bad field doesn't abort the entire track.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
d91eb36007 fix: korrekte Track-Nummerierung, Scanner-Rekursion, M3U-Reihenfolge
scanner: nicht in Unterordner wenn Root Audio-Dateien enthält (verhindert
  Doppel-Scan bei versehentlichen Unterordner-Kopien); nur Disc-Ordner
  (CD1, Disc 2…) werden bei Multi-CD-Alben rekursiert.

hint_extractor: M3U/Playlist-Dateien als Track-Reihenfolge-Quelle; BOM-
  Bereinigung; Tracklist-Matching auch per Titel (nicht nur per Nummer);
  tracknumber=0 wird als 'keine Nummer' gewertet.

metadata_resolver: sequenzielle Fallback-Nummerierung (1,2,3…) für Tracks
  ohne Tracknummer — verhindert '00'-Präfix beim --rename; dir_artist hat
  Vorrang vor 'Various Artists'-Heuristik; LLM darf bei Konfidenz <0.3
  auch bestehende Werte korrigieren (Tippfehler im Verzeichnisnamen).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
c205fa8943 feat: Ollama + OpenRouter als LLM-Reasoning-Backends
_claude_resolve() nutzt jetzt Ollama lokal (kostenlos, RTX 3090) als
erste Wahl, dann OpenRouter/DeepSeek V3 (sehr günstig) und zuletzt
Claude API. Neue ENV-Variablen: OPENROUTER_API_KEY, OLLAMA_RESOLVE_MODEL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
f7cf520dbe Initial implementation of Music Metadata Enricher
AI-powered per-album pipeline: scan → local hints → MusicBrainz/Discogs/Claude
resolve → cover art → interactive or auto review → tag write + rename + report.
All external dependencies optional; 17/17 unit tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
b273052f68 first commit 2026-04-29 05:26:59 +02:00