For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Major] Issues with Uploads, Downloads and metadata
Incident Report for Box
Postmortem

We recently addressed issues affecting Box activity, including Box Metadata, file uploads and downloads, Box Notes, and the public API. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

On January 23rd from 11:48am PT to 1:07pm PT and on January 24th from 7:49am PT to 8:36am PT and 5:07pm PT to 9:02pm PT, some users may have experienced difficulties while working in Box. During this time, some users experienced higher than normal latency and/or timeouts when trying to access content in Box via one of our applications or our Public API. Specifically, some users of the Metadata Query API experienced elevated latencies and timeouts.

Analysis

The issue occurred due to an overload of a localized area of one of our datastores, resulting in the request queue backing up and causing latency degradation for Downloads, Box Notes, Metadata and other related services for some users. This additional load was induced by an internal data migration our engineering team was performing at that time. One of the changes needed to perform this migration caused an unexpected amplification of requests to a localized area of one of our datastores containing schema information. This change was tested prior to deployment, performed as expected after deployment, but caused impact at peak load.

Requests to this server developed a backlog which affected latency of Box Metadata generally, resulting in elevated latencies and some timeouts for dependent Box services. We were able to resolve the issue by reducing load on this datastore via additional caching, shedding some load and rolling back some recent changes. We are working on process and tooling improvements to prevent similar issues from occurring in the future.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Patched a bug in our data migration code that caused read requests amplification and validated it.
  • Closed a gap in observability, in order to identify similar load increases before they reach a tipping point and cause impact.
  • Increased capacity for this localized area that holds schema information, in order to handle any future surges in traffic more gracefully.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,

The Box Team

Posted Jan 30, 2023 - 14:15 PST

Resolved
We have identified and resolved issues impacting uploads, downloads, metadata and Public API. Services should operate as normal at this time.

If you continue to experience issues with uploads, downloads, metadata and Public API, please reach out to our support team for further assistance at http://support.box.com.
Posted Jan 23, 2023 - 13:33 PST
Investigating
At approximately 01/23/2023 11:51 AM, we have started observing issues impacting uploads, downloads, metadata and Public API.
Engineering has identified and resolved the issue. We will continue to monitor for any further issues.
Posted Jan 23, 2023 - 13:21 PST
This incident affected: Box Platform / API (Content API, Uploads/Downloads) and Box Web Application (Uploads/Downloads).