For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Major] Issues with Uploads, Downloads and Logins
Incident Report for Box
Postmortem

We recently addressed issues affecting the Box web application. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

On March 16th, 2023 between 4:30pm and 7:00pm PDT, some users may have experienced difficulties while working in Box. During this time, some users may have experienced higher than usual latencies and slightly elevated error rates when interacting with the Box website. The issue occurred due to a mistake made as part of a maintenance operation aimed at improving the efficiency and reliability of one of our foundational services. This maintenance operation resulted in fewer servers being left to serve the Box web application than were necessary to serve all incoming traffic without any customer impact. We were able to resolve the issue by restoring the necessary capacity for the impacted service. In addition, we are implementing autoscaling for this service to prevent similar issues from occurring in the future.

Analysis

The problem occurred over the course of a maintenance operation during which a legacy virtual-machine-based server fleet serving the Box web application was being replaced with a container-based fleet. Both fleets were sharing the load before the maintenance took place, and as the maintenance operation completed, only the container-based fleet remained. Although a load projection had been made prior to the maintenance and the container-based fleet was expected to be able to handle the full load of the site, it eventually became clear that our container-based fleet was struggling to keep up with demand. This resulted in higher latencies and slightly elevated error rates across the Box web application. In order to recover application performance, we put the virtual-machine-based fleet back in service to share the load with the container-based one.

Corrective Actions

The following corrective actions have been completed or are planned:

  • The capacity projection model for the container-based fleet has been adjusted to reflect the outcome of this maintenance. This means the more capacity will be available in the container-based fleet as we retire the virtual-machine-based fleet.
  • We are improving our observability to detect early indicators of insufficient capacity, so we can react to them before customer experience is impacted.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,

The Box Team

Posted Apr 12, 2023 - 15:38 PDT

Resolved
From approximately 4:30PM to 7:30PM US Pacific time, we had an issue impacting downloads, uploads and logins. During this time users would have experienced increased latency navigating Box and timeouts for some uploads, downloads and login requests. No further impact has been observed and we are considering this issue to be resolved. If you are currently seeing any issues, please let us know at https://support.box.com.
Posted Mar 16, 2023 - 20:30 PDT