medium severityHugging Face Inference Endpoints
502 Bad Gateway or 503 errors on requests to idle (scale-to-zero) endpoints; requests fail or timeout during cold start (minutes).[Hugging Face Autoscaling Docs](https://huggingface.co/docs/inference-endpoints/en/guides/autoscaling)
Root cause
Scale-to-zero feature shuts down replicas after inactivity to save costs. Requests trigger auto-scale-up, but model loading/container init takes minutes; no queuing, proxy errors (502/503) during cold start.[Hugging Face Autoscaling Docs](https://huggingface.co/docs/inference-endpoints/en/guides/autoscaling)
huggingfaceinference-endpointsautoscalingcold-start502503scale-to-zero