dschlueter/Music_Metadata_Enricher

Author	SHA1	Message	Date
dschlueter	1960989eef	Fix YouTube ID detection: use last _-token instead of broken lookbehind regex The previous regex lookbehind (?<![A-Za-z0-9_-]) excluded _ as valid preceding character, so IDs after underscores were never matched. New approach: split stem by _ and check if the last token is an 11-char YouTube ID (mixed case + digit). Also strips the ID token from the stem before _parse_filename() to prevent it from leaking into the track title or being misread as an artist-title separator. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:57:27 +02:00
dschlueter	f86db982a5	Support WebP cover images: convert to JPEG via PIL, correct MIME type fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:50:46 +02:00
dschlueter	b2dd0df052	Add project-specific .gitignore entries Exclude: pycache dirs, unnamed.git, report.csv, backups, generated playlists, temp files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:44:05 +02:00
dschlueter	b6abfae16c	Add YouTube ID detection and metadata lookup via yt-dlp - Extract 11-char YouTube video IDs from audio filenames - Fetch title, uploader, chapters via yt-dlp (--dump-json) - Use chapters as tracklist when no .txt tracklist is available - Store yt_title / yt_uploader in AlbumHints for LLM prompt context - Fall back to YouTube video title as track title for single-file albums Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	888464b4d0	Regenerate M3U playlist after rename with correct order and durations _update_m3u(): writes #EXTM3U + #EXTINF:seconds,Artist - Title + filename per track, in disc/track order (same order as the renamed files). Duration is read from mutagen; -1 if unavailable. execute_album(): after renaming, finds existing .m3u / .m3u8 in the album directory and overwrites it. Only triggered when files_renamed > 0 and a playlist file exists — never creates a new one from scratch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	40a2ef3fb6	Add OCR fallback via Ollama Vision for albums without tracklist text hint_extractor: _ocr_back_cover() sends back/inlay images to Ollama Vision when no tracklist .txt/.htm/.nfo is present. Model priority: qwen3-vl:latest → minicpm-v:latest → deepseek-ocr:latest (configurable via OLLAMA_OCR_MODEL env var). Timeout 180s. OCR text is fed into the same _parse_tracklist() pipeline as regular text files. music_enricher: extract_hints(use_ocr=not args.no_api) — OCR is skipped with --no-api to allow fully offline/fast runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	28f716f8f2	Fix disc numbering consistency and false tracklist matches executor: disc=1 now generates '1-01' prefix (same as disc=2 → '2-01'), so multi-disc albums have consistent D-TT scheme throughout. Single-disc tracks without disc tag stay as plain 'TT'. hint_extractor: tracklist pattern 2 now requires '.' ')' or ':' as separator (not bare whitespace) — prevents false-positive matches like '2 x CD, Compilation, Remastered' being parsed as track 2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	d1391fc36a	Robust tracklist matching: fuzzy titles, catalog numbers, correct disc/track hint_extractor: - _norm_for_match(): strips all non-alnum for punctuation-agnostic comparison - _catalog_key(): extracts BWV/Op./K./HWV/... catalog number for matching (fixes abbreviated filenames like "Fantasia_Cm_BWV_562" vs "Fantasia In C Minor, BWV 562") - Matching priority: exact number+disc → exact title → fuzzy title → catalog number - Tracklist disc+track OVERRIDE M3U position when a match is found (M3U is only used as last fallback; fixes wrong alphabetical ordering) metadata_resolver: - LLM prompt now defines artist/albumartist roles explicitly (artist = composer for classical; albumartist = performer/interpreter) - LLM albumartist can override dir_artist when confidence < 0.4 - _build_track_proposals: when track artist == albumartist (performer from filename), composer (album-level artist) is used as track artist instead - Tracklist header (first lines before tracks) included in LLM prompt for label/year/album-title discovery - import re added (was missing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	5011cef4db	Underscore filename schema, classical detection, NameToUnix post-processing Pop schema: TT_-_Artist_-_Title.ext Classical schema: TT_-_Performer_-_Komponist_-_Werk[-_Orchester_Dirigent].ext triggered when albumartist ≠ track artist (pianist vs composer) All spaces in names → underscores; separator _-_ between parts. Missing parts (orchestra, conductor) are omitted. models.py: added conductor/orchestra optional fields to TrackProposal. executor.py: sanitize_dir_names() tries NameToUnix first, falls back to detox. Called after all renames in a directory are complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	8bd48cf166	Include albumartist in filename; remove Claude API from LLM chain Filename schema now: TT - AlbumArtist - TrackArtist - Title when albumartist differs from track artist (e.g. pianist vs. composer). Identical artist → old two-part format unchanged. metadata_resolver: removed Claude API fallback entirely from _claude_resolve. Chain is now Ollama (local, free) → OpenRouter (DeepSeek V3, cheap) only. music_enricher: updated status line and use_claude flag accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	460b92aab3	Fix Invalid ID3TimeStamp error when writing date tags Strip non-timestamp characters (BOM, invisible chars) from date/year values both when reading existing tags in metadata_resolver and when writing in executor. Also harden the EasyID3 except block to not wipe existing tags when adding a missing ID3 header, and add per-field try/except in MP3 tag writing so one bad field doesn't abort the entire track. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	d91eb36007	fix: korrekte Track-Nummerierung, Scanner-Rekursion, M3U-Reihenfolge scanner: nicht in Unterordner wenn Root Audio-Dateien enthält (verhindert Doppel-Scan bei versehentlichen Unterordner-Kopien); nur Disc-Ordner (CD1, Disc 2…) werden bei Multi-CD-Alben rekursiert. hint_extractor: M3U/Playlist-Dateien als Track-Reihenfolge-Quelle; BOM- Bereinigung; Tracklist-Matching auch per Titel (nicht nur per Nummer); tracknumber=0 wird als 'keine Nummer' gewertet. metadata_resolver: sequenzielle Fallback-Nummerierung (1,2,3…) für Tracks ohne Tracknummer — verhindert '00'-Präfix beim --rename; dir_artist hat Vorrang vor 'Various Artists'-Heuristik; LLM darf bei Konfidenz <0.3 auch bestehende Werte korrigieren (Tippfehler im Verzeichnisnamen). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	c205fa8943	feat: Ollama + OpenRouter als LLM-Reasoning-Backends _claude_resolve() nutzt jetzt Ollama lokal (kostenlos, RTX 3090) als erste Wahl, dann OpenRouter/DeepSeek V3 (sehr günstig) und zuletzt Claude API. Neue ENV-Variablen: OPENROUTER_API_KEY, OLLAMA_RESOLVE_MODEL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	f7cf520dbe	Initial implementation of Music Metadata Enricher AI-powered per-album pipeline: scan → local hints → MusicBrainz/Discogs/Claude resolve → cover art → interactive or auto review → tag write + rename + report. All external dependencies optional; 17/17 unit tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 05:42:03 +02:00
dschlueter	b273052f68	first commit	2026-04-29 05:26:59 +02:00

15 commits