medium severityGuidance templates with llama.cpp Phi-3 backend

When using Guidance templates with Phi-3 models via llama.cpp backend, repeated text insertion (e.g., deterministic template parts) causes growing leading whitespaces in text/tokens, leading to performance regression from KV-cache invalidation and incorrect rendering.

Root cause

Bug in llama.cpp implementation of Phi-3 tokenizer (around commit fd5ea0f) causes accumulating whitespace tokens during repeated tokenize/detokenize cycles, mismatching Hugging Face Transformers behavior. Affects Guidance library's template insertion workflow, invalidating KV-cache and degrading performance.

llama.cppguidance-aiphi-3tokenizerwhitespacetokenizedetokenizekv-cache

Citations