Commit graph

27 commits

Author SHA1 Message Date
031e595ff7 Add Discogs as cover source fallback after MusicBrainz
- _discogs_cover_url(): searches Discogs database/search API by artist+album,
  returns primary image URL; uses DISCOGS_TOKEN if set, else anonymous
- download_discogs_cover(): downloads and saves as folder.jpg (PNG→JPEG via PIL)
- resolve_cover() priority: local → MusicBrainz → Discogs
- music_enricher: pass artist/album to resolve_cover() for Discogs lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:44:04 +02:00
8212a918dd Add genre normalization: German/variant genres → canonical Jellyfin names
_GENRE_MAP translates common German genre names (Volksmusik, Schlager,
Marschmusik, Klassik …) and English variants (rhythm and blues, swing music …)
to consistent Jellyfin-friendly labels. All-upper or all-lower genres without
a mapping entry are title-cased. Applied in resolve() before building AlbumProposal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:43:12 +02:00
b54d83ecb5 Add --status flag: library health report (missing covers, bad tags)
Scans all album directories and reports:
- Albums without any cover image
- Albums where the first 3 audio files have missing/placeholder tags
  (title or artist empty, 'Unknown', 'AudioTrack')
Exits without writing anything.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:42:26 +02:00
d06c0bbcc9 Normalize all cover art to folder.jpg (Jellyfin standard)
- Add normalize_cover_to_folder_jpg(): renames/converts any local cover
  (Front.jpg, front.webp, cover.jpg, …) to folder.jpg in-place; WebP/PNG
  are converted to JPEG via PIL
- resolve_cover() now calls normalize_cover_to_folder_jpg() automatically
  after finding a local cover, so future enrichment runs always produce folder.jpg
- One-time batch: 38 existing library covers renamed/converted to folder.jpg

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:41:43 +02:00
7516de439f Save downloaded covers as folder.jpg (Jellyfin standard), PNG→JPEG on download
- download_cover() now writes folder.jpg instead of _cover_download{ext}
- PNG responses are converted to JPEG via PIL during download (avoids PNG
  in the album directory entirely)
- find_local_cover() priority: folder > front > cover > album (folder.jpg
  is now the canonical name for both downloaded and manually placed covers)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:26:33 +02:00
aaa32622b2 Fix 'Unknown' track artists leaking from bad ID3 tags and classical schema
- hint_extractor: filter existing tags through _is_good() so 'Unknown',
  'Unknown Artist' etc. in existing ID3 tags don't override filename-parsed
  artist names
- executor: _is_classical() now returns False when track_artist is a placeholder
  ('unknown', 'unknown artist') — prevents pop albums from getting the
  Performer-Composer-Work filename schema
- executor/music_enricher: pass albumartist to _proposed_filename() so fallback
  works when track artist is missing; fix display to use albumartist fallback too

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:15:46 +02:00
b64a4d0922 Fix disc prefix for single-CD albums (disc=1 must not produce '1-TT' filenames)
disc_number=1 is now treated identical to disc_number=None: no 'D-' prefix in
filenames, no discnumber tag written. The D-TT prefix and discnumber tag are
only applied for genuine multi-CD albums (disc_number > 1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:05:49 +02:00
701b05a75d Fix Jellyfin playlist integration and tracklist matching for single-CD albums
- hint_extractor: add _normalize_vertical_tracklist() to handle bare-number/
  title/duration format (Tufaranka-style tracklists)
- hint_extractor: fix level-1 tracklist match — allow disc_num=None (single-CD)
  by assuming disc=1; previously no tracklist title was ever applied to single-
  CD tracks because the guard required disc_num to be set
- music_enricher: register module in sys.modules before exec_module() so
  @dataclass definitions in jellyfin_playlist_generator work correctly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 07:58:41 +02:00
776c977573 Recursive album discovery + Jellyfin Playlist Generator integration
scanner.py: collect_album_dirs() now recursively finds album dirs
- Dirs with audio files at root → album
- Dirs with disc subdirs (CD1/CD2) and no root audio → multi-CD album
- Container dirs without audio → recurse into subdirs

music_enricher.py:
- After execute_album(), auto-discovers jellyfin_playlist_generator.py
  in ../Jellyfin_Playlist_Generator/ (or via --playlist-generator PATH)
- Calls generate_playlist() directly via importlib — no subprocess,
  no destructive cleanup_all_playlists, targeted to the enriched album
- New --playlist-generator CLI option for custom generator path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 07:07:55 +02:00
1cb5a8fb8d Add BEDIENUNGSANLEITUNG.md (German user manual)
Covers: album directory structure, YouTube ID handling, typical workflows
(dry-run → live), all CLI options, filename schemas, confidence levels,
error handling, backup/restore, and environment variables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 06:52:44 +02:00
5c7b6759ff Update README: Ollama/OpenRouter LLM, OCR, YouTube, WebP, underscore schema
- Replace Claude API references with Ollama → OpenRouter chain
- Add YouTube ID detection, OCR back cover, WebP cover support
- Fix filename schema examples (spaces → underscores, _-_ separator)
- Add classical naming schema with Performer/Composer distinction
- Add Ollama env vars (OLLAMA_HOST, OLLAMA_RESOLVE_MODEL, OLLAMA_OCR_MODEL)
- Update pipeline diagram with OCR and YouTube steps
- Add yt-dlp to prerequisites

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 06:03:39 +02:00
787803bb7b Fix file permissions after rebase (644 → 755)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 06:01:38 +02:00
1960989eef Fix YouTube ID detection: use last _-token instead of broken lookbehind regex
The previous regex lookbehind (?<![A-Za-z0-9_-]) excluded _ as valid preceding
character, so IDs after underscores were never matched. New approach: split stem
by _ and check if the last token is an 11-char YouTube ID (mixed case + digit).
Also strips the ID token from the stem before _parse_filename() to prevent it
from leaking into the track title or being misread as an artist-title separator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:57:27 +02:00
f86db982a5 Support WebP cover images: convert to JPEG via PIL, correct MIME type fallback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:50:46 +02:00
b2dd0df052 Add project-specific .gitignore entries
Exclude: pycache dirs, unnamed.git, report.csv, backups, generated playlists, temp files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:44:05 +02:00
b6abfae16c Add YouTube ID detection and metadata lookup via yt-dlp
- Extract 11-char YouTube video IDs from audio filenames
- Fetch title, uploader, chapters via yt-dlp (--dump-json)
- Use chapters as tracklist when no .txt tracklist is available
- Store yt_title / yt_uploader in AlbumHints for LLM prompt context
- Fall back to YouTube video title as track title for single-file albums

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
888464b4d0 Regenerate M3U playlist after rename with correct order and durations
_update_m3u(): writes #EXTM3U + #EXTINF:seconds,Artist - Title + filename
per track, in disc/track order (same order as the renamed files).
Duration is read from mutagen; -1 if unavailable.

execute_album(): after renaming, finds existing *.m3u / *.m3u8 in the
album directory and overwrites it. Only triggered when files_renamed > 0
and a playlist file exists — never creates a new one from scratch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
40a2ef3fb6 Add OCR fallback via Ollama Vision for albums without tracklist text
hint_extractor: _ocr_back_cover() sends back/inlay images to Ollama Vision
  when no tracklist .txt/.htm/.nfo is present. Model priority:
  qwen3-vl:latest → minicpm-v:latest → deepseek-ocr:latest (configurable
  via OLLAMA_OCR_MODEL env var). Timeout 180s. OCR text is fed into the
  same _parse_tracklist() pipeline as regular text files.

music_enricher: extract_hints(use_ocr=not args.no_api) — OCR is skipped
  with --no-api to allow fully offline/fast runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
28f716f8f2 Fix disc numbering consistency and false tracklist matches
executor: disc=1 now generates '1-01' prefix (same as disc=2 → '2-01'),
  so multi-disc albums have consistent D-TT scheme throughout.
  Single-disc tracks without disc tag stay as plain 'TT'.

hint_extractor: tracklist pattern 2 now requires '.' ')' or ':' as separator
  (not bare whitespace) — prevents false-positive matches like
  '2 x CD, Compilation, Remastered' being parsed as track 2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
d1391fc36a Robust tracklist matching: fuzzy titles, catalog numbers, correct disc/track
hint_extractor:
- _norm_for_match(): strips all non-alnum for punctuation-agnostic comparison
- _catalog_key(): extracts BWV/Op./K./HWV/... catalog number for matching
  (fixes abbreviated filenames like "Fantasia_Cm_BWV_562" vs "Fantasia In C Minor, BWV 562")
- Matching priority: exact number+disc → exact title → fuzzy title → catalog number
- Tracklist disc+track OVERRIDE M3U position when a match is found
  (M3U is only used as last fallback; fixes wrong alphabetical ordering)

metadata_resolver:
- LLM prompt now defines artist/albumartist roles explicitly
  (artist = composer for classical; albumartist = performer/interpreter)
- LLM albumartist can override dir_artist when confidence < 0.4
- _build_track_proposals: when track artist == albumartist (performer from filename),
  composer (album-level artist) is used as track artist instead
- Tracklist header (first lines before tracks) included in LLM prompt
  for label/year/album-title discovery
- import re added (was missing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
5011cef4db Underscore filename schema, classical detection, NameToUnix post-processing
Pop schema:      TT_-_Artist_-_Title.ext
Classical schema: TT_-_Performer_-_Komponist_-_Werk[-_Orchester_Dirigent].ext
  triggered when albumartist ≠ track artist (pianist vs composer)

All spaces in names → underscores; separator _-_ between parts.
Missing parts (orchestra, conductor) are omitted.

models.py: added conductor/orchestra optional fields to TrackProposal.
executor.py: sanitize_dir_names() tries NameToUnix first, falls back to detox.
  Called after all renames in a directory are complete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
8bd48cf166 Include albumartist in filename; remove Claude API from LLM chain
Filename schema now: TT - AlbumArtist - TrackArtist - Title when albumartist
differs from track artist (e.g. pianist vs. composer). Identical artist → old
two-part format unchanged.

metadata_resolver: removed Claude API fallback entirely from _claude_resolve.
Chain is now Ollama (local, free) → OpenRouter (DeepSeek V3, cheap) only.

music_enricher: updated status line and use_claude flag accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
460b92aab3 Fix Invalid ID3TimeStamp error when writing date tags
Strip non-timestamp characters (BOM, invisible chars) from date/year values
both when reading existing tags in metadata_resolver and when writing in
executor. Also harden the EasyID3 except block to not wipe existing tags
when adding a missing ID3 header, and add per-field try/except in MP3 tag
writing so one bad field doesn't abort the entire track.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
d91eb36007 fix: korrekte Track-Nummerierung, Scanner-Rekursion, M3U-Reihenfolge
scanner: nicht in Unterordner wenn Root Audio-Dateien enthält (verhindert
  Doppel-Scan bei versehentlichen Unterordner-Kopien); nur Disc-Ordner
  (CD1, Disc 2…) werden bei Multi-CD-Alben rekursiert.

hint_extractor: M3U/Playlist-Dateien als Track-Reihenfolge-Quelle; BOM-
  Bereinigung; Tracklist-Matching auch per Titel (nicht nur per Nummer);
  tracknumber=0 wird als 'keine Nummer' gewertet.

metadata_resolver: sequenzielle Fallback-Nummerierung (1,2,3…) für Tracks
  ohne Tracknummer — verhindert '00'-Präfix beim --rename; dir_artist hat
  Vorrang vor 'Various Artists'-Heuristik; LLM darf bei Konfidenz <0.3
  auch bestehende Werte korrigieren (Tippfehler im Verzeichnisnamen).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
c205fa8943 feat: Ollama + OpenRouter als LLM-Reasoning-Backends
_claude_resolve() nutzt jetzt Ollama lokal (kostenlos, RTX 3090) als
erste Wahl, dann OpenRouter/DeepSeek V3 (sehr günstig) und zuletzt
Claude API. Neue ENV-Variablen: OPENROUTER_API_KEY, OLLAMA_RESOLVE_MODEL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
f7cf520dbe Initial implementation of Music Metadata Enricher
AI-powered per-album pipeline: scan → local hints → MusicBrainz/Discogs/Claude
resolve → cover art → interactive or auto review → tag write + rename + report.
All external dependencies optional; 17/17 unit tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:42:03 +02:00
b273052f68 first commit 2026-04-29 05:26:59 +02:00