For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Critical] Issues with Multiple Box Services

Incident Report for Box

Postmortem

We recently addressed issues affecting folder and collection browsing, and related APIs. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between 05:22 PM PDT and 07:04 PM PDT on May 6, 2026, some users may have experienced difficulties while working in Box. During this time, users reported failures and elevated errors when loading folders and collections, intermittent service errors, and degraded AI-related functionality.

The issue was caused by runtime saturation in a backend API service that had been gradually approaching its capacity limits over several weeks. During peak traffic hours, multiple service instances simultaneously became unable to respond to internal health checks within the configured timeout. As those instances were restarted, traffic shifted to the remaining instances, increasing load on them and creating a cascading failure. We mitigated the issue by stopping the restart cycle, routing traffic to alternate paths, and increasing service capacity.

‌

Analysis

This incident revealed several contributing causes:

Insufficient scaling signals — autoscaling was primarily configured around CPU utilization. For this service, CPU utilization did not accurately reflect runtime saturation, so the fleet did not scale early enough in response to the actual bottleneck.
Health check sensitivity under load — health checks were configured with timeouts that did not account for the service’s behavior under sustained runtime saturation. Once instances became slow to respond, restarts amplified the problem by shifting more traffic to the remaining instances.
Gradual capacity erosion without alerting — the service had shown intermittent health check failures during prior peak traffic periods. Those earlier events were narrow enough to self-recover, but they were not surfaced as a clear trend requiring action.
Retry amplification — as service instances became unavailable, upstream retry behavior added additional load to the remaining healthy instances, accelerating the cascade.

This incident highlighted opportunities for improvement in how we detect and respond to runtime saturation that is not always visible from CPU metrics alone. It also reinforced the need for stronger end-to-end observability, clearer health-check visibility, and validated mitigation paths for service instability.

‌

Corrective Actions

Box has initiated the following corrective actions:

Improved runtime monitoring and alerting — we added alerts for runtime saturation metrics that detect this class of failure independently of CPU utilization, providing earlier warning before customer impact occurs.
Revised autoscaling strategy — we reconfigured autoscaling to trigger on signals that reflect actual service capacity and validate that the fleet scales correctly under realistic load conditions.
Health check and resilience tuning — we are reviewing health check configurations and retry behavior to reduce the risk that transient runtime saturation can turn into cascading restarts.
Capacity validation and load testing — we are establishing load testing against production-representative traffic patterns and maintaining capacity headroom until the service's efficiency improvements are validated.
Runtime performance improvements — we identified and are implementing changes to the service framework that significantly reduce per-request overhead, providing more headroom against future saturation.

‌

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter.

Sincerely,
The Box Team

Posted May 15, 2026 - 10:48 PDT

Resolved

After further monitoring, this incident is now considered resolved. If you continue to experience any issues, please contact Box Support at https://support.box.com.

Posted May 06, 2026 - 20:07 PDT

Monitoring

Our team has taken steps to remediate this issue and is seeing improvement in Box Folder, Collections, Box AI, Hubs, and Classification. We are continuing to monitor for any additional impact.

Posted May 06, 2026 - 19:32 PDT

Update

Our team has identified the underlying cause of this issue and is working to take remediating steps. We will provide additional updates as they become available.

Posted May 06, 2026 - 19:12 PDT

Identified

Our team has identified the underlying cause of this issue and is working to take remediating steps. We will provide additional updates as they become available.

Posted May 06, 2026 - 18:52 PDT

Investigating

We are investigating an ongoing issue affecting browsing folders, collections, Box AI, Classifications, and Hubs. Customers may experience seeing errors when loading folders, collections, Box AI and Hubs. We will provide more information as soon as it is available.

Posted May 06, 2026 - 18:20 PDT

This incident affected: Box Web Application (Box Shield (Auto Classification), Box Hubs, Box AI).