feat: /plan, /cancel, /continue, /discard + Context 262144 + KV-Cache q4_0

- Neue Befehle: /plan (Planungsmodus, nur PLAN.md), /cancel (Loop-Abbruch), /continue (Resume nach Unterbrechung), /discard (PLAN.md verwerfen) - contextWindow in models.json und llama.cpp-Servern: 131072 → 262144 - KV-Cache: q8_0 → q4_0 (weniger VRAM, passt zu 262k-Kontext auf 2× 3090) - parallel: 2 → 1 beim Coder (stabiler bei großem Kontext) - Optimize-Status mit ASCII-Fortschrittsbalken + Blocker-Preview - cancelRequested-Flag prüft nach jedem Loop-Schritt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 20:02:20 +02:00 · 2026-05-20 20:02:20 +02:00 · 4a31535b76
commit 4a31535b76
parent b19c189e2e
4 changed files with 126 additions and 26 deletions
--- a/start-coder.sh
+++ b/start-coder.sh
@ -31,7 +31,7 @@ docker run -d \
  "$IMAGE" \
    -m "/hf_home/${MODEL_REL_PATH}" \
    --alias "${MODEL_ALIAS}" \
-    -c 131072 \
+    -c 262144 \
    -n 16384 \
    --jinja \
    --no-context-shift \
@ -45,11 +45,11 @@ docker run -d \
    -ngl 999 \
    -fa on \
    --kv-unified \
-    --cache-type-k q8_0 \
-    --cache-type-v q8_0 \
+    --cache-type-k q4_0 \
+    --cache-type-v q4_0 \
    --batch-size 1024 \
    --ubatch-size 512 \
-    --parallel 2 \
+    --parallel 1 \
    --cont-batching \
    --host 0.0.0.0 \
    --port "$CONTAINER_PORT"