For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Critical] Issues with Multiple Box Services
Incident Report for Box
Postmortem

We recently addressed an issue affecting the public API, Downloads, Uploads, Login, and Sign. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between 12:40 am PDT, on September 28, 2023, and 03:20 am PDT, on September 28, 2023, some users may have experienced difficulties while working in Box. During this time, users may have had a degraded experience or difficulty accessing Public API, Downloads, Uploads, Login, and Sign. The issue occurred due to an observed latency in one of our internal services responsible for providing security recommendations to the application. We were able to resolve the issue by deploying more instances of the impacted service. In addition, we are implementing improvements for enhanced control over the flow of traffic within our backend systems. We are also working to establish new alerts on the impacted processes that will decrease time to detect and mitigate if a similar issue take place in the future. 

Analysis

Box services are underpinned by a common service communication layer called “Service Mesh” providing Box’s services with the abilities to securely communicate and automatically adapt the scale of the services to traffic bursts. During this incident we observed that the Service Mesh layer did not behave as expected. Specifically, the service did not auto-scale as planned and as a result, the service became overwhelmed leading to the issue. A deeper analysis of the issue demonstrated that the “Service Mesh” configuration, while optimum for the on-premise deployments, was no longer adapted to the traffic profile experienced in the public cloud.

We have been working with the technology vendor for our Service Mesh implementation to tune the configuration according to the new environment and have been rolling out the changes incrementally. We have also increased the capacity on the critical services while rolling out the changes to not rely on auto-scaling during that period.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Configure limits for the number of retries within internal systems to improve performance during peak traffic times.
  • Implement additional back pressure “circuit breakers” within our backend systems to protect the system from overloading in case of similar issues.
  • Improve error handling to better differentiate Service Mesh errors from application errors.
  • Improve monitoring measures to automatically detect, diagnose, and remediate such issues before customers are actually impacted.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,
The Box Team

Posted Sep 29, 2023 - 12:57 PDT

Resolved
After further monitoring, this incident is now considered resolved. Box services have been restored to full functionality. Please contact Box Support at https://support.box.com/ if you continue to experience any issues.
Posted Sep 28, 2023 - 04:55 PDT
Monitoring
A fix has been implemented and we are currently monitoring the results. Customers may experience slowness while interacting with Box, as the backlog is getting cleared.
Posted Sep 28, 2023 - 04:33 PDT
Identified
A fix has been implemented and we are currently monitoring the results. Customers may experience slowness while interacting with Box.
Posted Sep 28, 2023 - 03:29 PDT
Update
We are still continuing to investigate this issue, as it is affecting Login (Web/SSO), All Files Page, Uploads, Downloads, Content API, Box Notes, Box Sign and Box Mobile. Users may see errors or slowness with the affected services. We will provide more information as soon as it is available.
Posted Sep 28, 2023 - 02:24 PDT
Update
We are continuing to investigate this issue, as it is affecting Login (Web/SSO), All Files Page, Uploads, Downloads, Content API, Box Notes, Box Sign and Box Mobile. Users may see errors or slowness with the affected services. We will provide more information as soon as it is available.
Posted Sep 28, 2023 - 01:33 PDT
Investigating
We are currently investigating an issue in regards to login issues on Box. We will provide more information as soon as it is available.
Posted Sep 28, 2023 - 01:16 PDT
This incident affected: Box Web Application (Login/SSO, Uploads/Downloads, Collaboration, Search, Preview, Sharing (Shared Links), Email Notifications, Admin Console & Functionality, Governance (Retention), Governance (Legal Holds), Workflows and Automations, Comments and Tasks, Accessible Site (a.box.com), Box Sign, Box Canvas, Box Shield (Threat Detection), Box Shield (Virus Detection), Box Shield (Auto Classification), Box Shuttle, Watermarking), Mobile Applications (Login/SSO), and Box Notes (Web Application).