phantom-ship: revert bon to 3B model (7B too slow on CPU)

A/B-tested 7B vs 3B on a real NETTO receipt. 7B took 3.6 min/receipt
vs ~30s for 3B. Accuracy gain was minimal — 7B still picked a line
item ('ARLA SEOMELK 1.') as merchant when the OCR header was missing,
just a different one than 3B picked ('REJESALAT'). The merchant
problem isn't a model-size problem; it's an OCR problem (Tesseract
missed the NETTO logo entirely on this receipt).

Keeping both models in loadModels so we can flip back via env var
without a fresh pull.
This commit is contained in:
dannydannydanny 2026-05-08 20:39:31 +02:00
parent ccf9eb2859
commit 814993e66b

View file

@ -397,16 +397,17 @@ in
# Ollama — local LLM runtime, used by bon's structured-data extraction
# step. Listens on 127.0.0.1:11434 only (not exposed over ZT).
# We pre-pull both 3B and 7B Qwen2.5; bon currently runs 7B for better
# column-parsing accuracy on receipts (3B mis-conflates qty/price
# columns and over-eagerly nominates line items as merchants).
# 3B is bon's default — 7B was tested but ran ~3.6 min/receipt vs ~30s
# for 3B on phantom-ship CPU, with no real accuracy gain (still picked
# line items as merchant on header-less OCR; that's an OCR problem,
# not a model problem). Both kept loaded so we can A/B without a pull.
services.ollama = {
enable = true;
host = "127.0.0.1";
port = 11434;
loadModels = [
"qwen2.5:3b-instruct" # ~2.5 GB — kept as fast fallback
"qwen2.5:7b-instruct" # ~4.7 GB — current default, slower but better
"qwen2.5:3b-instruct" # ~2.5 GB — current default
"qwen2.5:7b-instruct" # ~4.7 GB — A/B testing only
];
};
@ -441,7 +442,7 @@ in
BON_DB_PATH = "/home/danny/.local/share/bon/bon.db";
BON_IMAGES_DIR = "/home/danny/.local/share/bon/images";
BON_OLLAMA_URL = "http://127.0.0.1:11434";
BON_OLLAMA_MODEL = "qwen2.5:7b-instruct";
BON_OLLAMA_MODEL = "qwen2.5:3b-instruct";
};
serviceConfig = {
WorkingDirectory = "/home/danny/bon";