high severityMemGPT (now letta-ai/letta)
Agent crashes with "Exception: Request exceeds maximum context length (e.g., 8465 > 8192 tokens)" during summarization after conversation grows (10-15+ mins). Retries fail in loop, agent becomes unusable. Seen especially with local LLMs like KoboldCPP.
Root cause
During automatic message summarization triggered by context overflow (e.g., at 70-75% of LLM context limit), the summarization prompt exceeded the LLM's context window because: 1) context_window parameter was not passed to summarization completion calls; 2) function_call=None raised ValueError in proxy; 3) persistence_manager had empty messages list; 4) summarization evicted too few messages, causing loops.
MemGPTcontext windowoverflowsummarizationlocal LLMkoboldcppValueError function_call