For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[High] Issues with uploads and downloads
Incident Report for Box
Postmortem
Notice and disclaimer: Customer understands and agrees that this information is subject to change. To the best of our knowledge, this is the current state and will be updated as information is confirmed.
Last Updated: November 11, 2019

Summary Impact

Between 8:18 AM and 9:36 AM PST on November 4th, some users may have experienced difficulties while working in Box. During this time, users were unable to upload or download files and experienced intermittent latency and errors with the Content API, Preview (Web UI and Mobile), Box Sync and Drive, and Box Edit. For some users, certain parts of the Admin Console did not load as well.

Analysis

Our analysis concluded that this issue occurred when instances of the Policy Engine service were overloaded and stuck at high CPU levels. This resulted in that service’s inability to serve traffic, and consequently degraded the success rate for uploads and downloads. The issue was remediated by allocating additional capacity to Policy Engine to handle the unexpected surge in traffic to bring CPU usage back down to acceptable levels.

Policy Engine is a service that among other things, returns the appropriate storage policy to use for an upload, and is on the critical path for all upload and download operations at Box.

During the impact, the CPU usage on Policy Engine was pegged at 100%. As a result, it was unable to properly service requests, which resulted in impact to uploads and downloads.

The high CPU usage was the result of being overloaded. The reason it was overloaded was due to an upgrade of our PHP infrastructure, which talks to Policy Engine, from an older version of PHP to PHP7. Due to this upgrade, there was a change to how connections and connection pooling is managed between services. This led to an uneven distribution of load to Policy Engine eventually increasing the CPU load to 100%.

While testing was performed on the PHP7 upgrade, this issue only presents itself under extreme load and could not have been detected before it was deployed to production.

Defect Remediation

The following remediation actions have been completed or are planned:

  • Add additional capacity to Policy Engine (not needed after moving to HTTP endpoint in HAProxy)
  • Move Policy Engine client in PHP to use HTTP endpoint in HAProxy
  • Audit PHP for other services that require moving to use HTTP endpoint in HAProxy
  • Adjust CPU alert levels for Policy Engine

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We are sorry for the disruption this caused for you and your organization. The security and availability of your content is our top priority.

Sincerely,
The Box Team

Posted Nov 06, 2019 - 09:35 PST

Resolved
After further monitoring this incident is now considered resolved. Please reach out to Box Support if you continue to have issues with your Box Account https://support.box.com
Posted Nov 04, 2019 - 10:17 PST
Monitoring
Our teams have validated that affected components are recovered at this time. We are continuing to monitor for any additional impact
Posted Nov 04, 2019 - 09:52 PST
Update
Our Engineering Teams continue to investigate the issue. The next update will be in 15 minutes if not sooner.
Posted Nov 04, 2019 - 09:44 PST
Update
Our Engineering Teams continue to investigate the issue. We have amended this status post to include impact to Upload/Downloads/Preview in Box Webapp, Box Sync, Box Drive, API, Admin Console, and Mobile.
Posted Nov 04, 2019 - 09:22 PST
Update
We are continuing to investigate this issue.
Posted Nov 04, 2019 - 09:12 PST
Update
Our Engineering Teams continue to investigate the issue impacting uploads, downloads, and previews. We have amended this status post to include impact to additional endpoints including Box Sync, Box Drive, and our API.
Posted Nov 04, 2019 - 08:56 PST
Investigating
We are currently investigating an ongoing issue affecting uploads, downloads, and preview.
Posted Nov 04, 2019 - 08:51 PST
This incident affected: Box Notes (Web Application), Box Platform / API (Content Preview, Uploads/Downloads), Desktop Applications (Box Sync, Box Drive, Box Edit / Tools), Box Web Application (Uploads/Downloads, Preview), Mobile Applications (Preview, Uploads/Downloads), and FTP.