dschlueter/Musiksammlung

Fork 0

Commit graph

Author	SHA1	Message	Date
dschlueter	b599c9eb8a	Fix default model, increase timeout, improve multi-column prompt - Change default text-LLM from llama3 (not installed) to gemma3:12b - Increase LLM timeout from 120s to 300s (large models need longer) - Add explicit multi-column layout instruction to vision prompt to prevent skipping columns on dense CD back-cover tracklists Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 22:56:02 +01:00
dschlueter	1753ab204f	Add Vision-LLM mode for direct image-to-JSON extraction Tesseract OCR fails on rotated/low-contrast CD back covers. New vision_llm module sends images directly to qwen3-vl via Ollama chat API, bypassing OCR entirely. Robust JSON extraction handles thinking tags, markdown blocks, and empty responses. CLI scan/process commands gain --vision flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 01:35:05 +01:00

Author

SHA1

Message

Date

dschlueter

b599c9eb8a

Fix default model, increase timeout, improve multi-column prompt

- Change default text-LLM from llama3 (not installed) to gemma3:12b
- Increase LLM timeout from 120s to 300s (large models need longer)
- Add explicit multi-column layout instruction to vision prompt to
  prevent skipping columns on dense CD back-cover tracklists

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-17 22:56:02 +01:00

dschlueter

1753ab204f

Add Vision-LLM mode for direct image-to-JSON extraction

Tesseract OCR fails on rotated/low-contrast CD back covers.
New vision_llm module sends images directly to qwen3-vl via
Ollama chat API, bypassing OCR entirely. Robust JSON extraction
handles thinking tags, markdown blocks, and empty responses.
CLI scan/process commands gain --vision flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 01:35:05 +01:00

2 commits