medium severityTogether AI Batch API
Batch job status remains 'IN_PROGRESS' for extended periods (beyond 24 hours), no progress updates, job does not complete or fail, appearing stalled in the queue despite valid input.
Root cause
Batch jobs are processed asynchronously using spare capacity during off-peak times on a best-effort basis within ~24 hours. Delays/stalls occur with large/complex batches, popular models with high queue loads, or temporary capacity constraints, causing IN_PROGRESS to exceed expected times.
Together AIbatch inferencequeue stallIN_PROGRESSrate limitsasync processing