high severityMemGPT (now letta-ai/letta)

Agent crashes with "Exception: Request exceeds maximum context length (e.g., 8465 > 8192 tokens)" during summarization after conversation grows (10-15+ mins). Retries fail in loop, agent becomes unusable. Seen especially with local LLMs like KoboldCPP.

Root cause

During automatic message summarization triggered by context overflow (e.g., at 70-75% of LLM context limit), the summarization prompt exceeded the LLM's context window because: 1) context_window parameter was not passed to summarization completion calls; 2) function_call=None raised ValueError in proxy; 3) persistence_manager had empty messages list; 4) summarization evicted too few messages, causing loops.

MemGPTcontext windowoverflowsummarizationlocal LLMkoboldcppValueError function_call

Citations