For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Critical] Issue with Box User Events API
Incident Report for Box
Postmortem

We recently addressed issues affecting the internal Monolith service. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between 9:22am PDT and 9:50am PDT on August 20th 2021, some users may have experienced difficulties while working in Box. During this time, the Box User Events API endpoint may have been delayed. The issue occurred due to internal coordination and configuration system being in a degraded state.  We were able to resolve the issue by restarting the leader of the coordination system manually. To further prevent similar issues from occurring in the future we are implementing an automatic restart mechanism for this condition.

Analysis 

Our internal messaging system at Box utilizes a common orchestration system to control various aspects of the service.  This orchestration system became degraded due to resource contention issues, which caused other systems to be negatively impacted.  As a result, our database cluster that powers the User Events API was delayed in returning responses to user requests.  Upon restarting the impacted systems we were able to successfully restore service.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Automatic detection and remediation of the impacted coordination service.

  • Reduction of extraneous logging to reduce resource contention.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

 Sincerely,

The Box Team

Posted Sep 20, 2021 - 20:00 PDT

Resolved
The User Events API metrics are now recovered and no additional impact has been observed. We are considering the issue to be fully resolved. If you are encountering any issues, please contact Box Support at https://support.box.com.
Posted Aug 20, 2021 - 11:28 PDT
Update
We are seeing a full recovery for the User Events API. We will continue to monitor the situation and provide an update for resolution soon.
Posted Aug 20, 2021 - 10:29 PDT
Monitoring
From 9:22-9:50 am PST we have identified an issue affecting the User Events API endpoint and have taken steps to remediate at this time. About 25% of User Events API users would have been affected. We currently are seeing recovery for the User Events API and will continue to monitor the situation.
Posted Aug 20, 2021 - 10:05 PDT
This incident affected: Box Platform / API (Content API).