Add Vision-LLM mode for direct image-to-JSON extraction

Tesseract OCR fails on rotated/low-contrast CD back covers.
New vision_llm module sends images directly to qwen3-vl via
Ollama chat API, bypassing OCR entirely. Robust JSON extraction
handles thinking tags, markdown blocks, and empty responses.
CLI scan/process commands gain --vision flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This commit is contained in:

Dieter Schlüter

2026-02-15 01:35:05 +01:00

parent 686c4317d1

commit 1753ab204f

5 changed files with 359 additions and 55 deletions

1

.gitignore vendored

View file

 @ -11,6 +11,7 @@ dist/
 *.egg
 idea/
 CLAUDE.md
 # Virtuelle Umgebungen
 .venv/

Rows
Columns

Add Vision-LLM mode for direct image-to-JSON extraction

1 .gitignore vendored Unescape Escape View file

1

.gitignore vendored

View file