For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Critical] Issues Opening Box Notes
Incident Report for Box
Postmortem

We recently addressed issues affecting Box Notes. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

On Monday, Sept 18, 2023, between 07:26 AM PST and 07:33 AM PST, some users may have experienced difficulties while working in Box. During this time, users may have encountered errors while loading Box Notes documents. The issue occurred due to a slow leadership election in a coordination service. The issue resolved itself, due to health automation processes recreating the unhealthy service. In addition, we plan to adjust our health check automation to resolve similar issues more quickly should they occur again in the future.

Analysis

Box Notes leverages multiple Apache Zookeeper ensembles to coordinate distributed compute services. Each ensemble is composed of a leader and several followers. When a leader fails, the remaining members quickly elect a new leader, who ensures that the other followers’ data is up to date, and continue running without issue, a process that usually takes less than a second and causes no impact.

In this incident, the leader failed, but the election of a new leader took an unusually long period of time, due to an unexpectedly large amount of data that needed to be synchronized. Due to the amount of time this election took, our health checks triggered an automated process that rebuilds unhealthy ensemble members, which disrupted the election process and further impacted the time taken for the election. Upon investigation, we determined that our health checks should have waited longer before deciding that corrective action was required.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Tune our health checks to be less aggressive in deciding when a resource needs to be recreated.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,
The Box Team

Posted Oct 06, 2023 - 14:54 PDT

Resolved
After further monitoring, this incident is now considered resolved at 7:38am US Pacific Time. Box services have been restored to full functionality. Please contact Box Support at https://support.box.com/ if you continue to experience any issues.
Posted Sep 18, 2023 - 07:53 PDT
Monitoring
We are Monitoring an issue affecting the loading of Box Notes. We will provide more information as soon as it is available.
Posted Sep 18, 2023 - 07:15 PDT
This incident affected: Box Notes (Web Application).