[Minor] Customers may have experienced issues using Public API

Incident Report for Box

Issue Summary

We recently addressed issues affecting Box API. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between 10:00 PM PST and 10:23 PM PST on April 24, 2024, some users may have experienced difficulties while working in Box. During this time, some Box customers might have experienced elevated error rate or increased latency with Box API requests. The issue occurred due to an unexpected failure during a MySQL Instance failover that was executed in response to high load on one of the database instances. We were able to resolve the issue by manually executing the instance failover. In addition, we have implemented a new discovery mechanism that improves the time to detect and propagate a MySQL topology change to prevent similar issues from occurring in the future. 

Analysis

During the incident, a failover was executed for a MySQL leader that had high CPU usage. The failover process failed to propagate the topology change and caused the impact to Box API success rate. The issue was remediated by executing the rest of the failover steps manually and the impact recovered once the new topology was successfully propagated to other components. The root cause of high CPU usage was later determined to be organic increase in week over week traffic and the impacted instance was upsized to add more CPU cores.

Corrective Actions

The following corrective actions have been completed or are planned:

  • The Database instance that had high CPU Usage was scaled up to add more CPU cores
  • Additionally, we are looking into remapping the Database instances to normalize the load on the impacted instance.
  • A MySQL Topology Discovery mechanism that significantly improves time to detect and propagate topology updates has been rolled out. This new mechanism improves mean time to recovery (MTTR) for similar issues in the future.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,

The Box Team

Posted 11 months ago. May 15, 2024 - 11:06 PDT

Resolved

Between 10:00PM and 10:23PM PDT on April 24th 2024, some users may have experienced issues using Public API. No further impact has been observed and we are considering this issue to be resolved. If you are still experiencing any issues, please let us know at https://support.box.com.
Posted 1 year ago. Apr 24, 2024 - 22:59 PDT
This incident affected: Box Platform / API (Content API).