Add Vision-LLM mode for direct image-to-JSON extraction
Tesseract OCR fails on rotated/low-contrast CD back covers. New vision_llm module sends images directly to qwen3-vl via Ollama chat API, bypassing OCR entirely. Robust JSON extraction handles thinking tags, markdown blocks, and empty responses. CLI scan/process commands gain --vision flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
686c4317d1
commit
1753ab204f
5 changed files with 359 additions and 55 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -11,6 +11,7 @@ dist/
|
|||
*.egg
|
||||
|
||||
idea/
|
||||
CLAUDE.md
|
||||
|
||||
# Virtuelle Umgebungen
|
||||
.venv/
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue