high severityGoogle Gemini API (generate_content_stream, Live API)

Streaming response (text/audio) cuts off mid-sentence mid-generation; stream ends with turnComplete (Live API) or finish_reason STOP/MAX_TOKENS; response incomplete (e.g., malformed JSON field, no final period); occurs randomly/ consistently in previews, worse with long prompts/tools/JSON mode.

Root cause

Model or API sends premature turnComplete / finish_reason STOP / MAX_TOKENS in streaming (generate_content_stream, Live API) despite room in limits. Often due to default 'auto' thinking_budget exhausting allocation on internal reasoning, especially in gemini-2.5-flash previews with structured/JSON outputs, tools, or long inputs. Client libs (langchain) may omit configs; preview model bugs in audio/text cutoff.

Gemini APIstreaminggenerate_content_streamLive APItruncationthinking_budgetgemini-2.5-flashjson schemalangchain

Citations