For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Major] Some Users Unable to View All Files and API Pages

Incident Report for Box

Postmortem

We recently addressed issues affecting Box services. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between 1:48 AM PT and 2:25 AM PT on March 3, 2025, some users may have experienced difficulties while working in Box. Additionally, starting at 8:44 PM PT that same day, some users may have once again encountered issues. The disruption ended before 9:59 PM PST. During these time periods, a subset of users experienced slowness and intermittent errors with Notes, Public API, logins and uploads/downloads. The issue occurred as a result of a fragmented system table on a database cluster which ultimately led to the database crashing. The first instance was caused by increased traffic while the second occurred due to our manual remediation process putting additional load on the database. Our database remediation service attempted to resolve the issue both times but was unsuccessful due to the thread_cache_size setting being set too low. We were able to address the short-term problem by manually redirecting traffic to a healthy database node. To maintain medium-term stability of the database, the team rebuilt the cluster to eliminate the fragmented table. Additionally, we will be splitting the database cluster into smaller databases to prevent future overloads and improving our database remediation service to better handle this type of case.

Analysis

The database cluster at issue experienced gradual performance degradation before the issue became apparent. This degradation was caused by the growing fragmented system table due to increasing database size and traffic. However, this degradation went unnoticed because the existing alerting system did not flag any problems.

In addition, the auto-remediation system was unsuccessful because it hit a case where two database configurations were incompatible. Specifically, the max_connections setting was increased without adjusting the thread_cache_size, resulting in frequent thread cache misses and preventing the failover procedure from having the resources needed to succeed.

Corrective Actions

Box has initiated the following corrective actions:

  • Rebuilding the database cluster to eliminate the table fragmentation and prevent medium-term performance degradation
  • Adding metrics and alerting for table fragmentation to proactively monitor issues
  • Adjusting database configurations such that thread_cache_size dynamically adjusts with the max_connections database configuration settings
  • Improving the database remediation process by adjusting timeouts to accommodate large database clusters
  • Accelerating the database split process to quickly divide large clusters, reducing traffic overload and improving routine maintenance success

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,
The Box Team

Posted Apr 24, 2025 - 05:51 PDT

Resolved

No additional impact has been observed and this issue is considered fully resolved. If you are still experiencing any issues, please contact us via https://support.box.com.
Posted Mar 03, 2025 - 03:03 PST

Monitoring

We have taken action to remediate this incident and are no longer seeing the issue occurring. We are continuing monitoring to ensure there is no additional impact.
Posted Mar 03, 2025 - 02:41 PST

Investigating

We are currently investigating an issue where some users may be unable to view their All Files and API pages. We will provide additional information as it becomes available.
Posted Mar 03, 2025 - 02:27 PST
This incident affected: Box Platform / API (Content API) and Box Web Application (Login/SSO).