We also maintain a list of Known Product Issues separate from this site here.
Between 8:18 AM and 9:36 AM PST on November 4th, some users may have experienced difficulties while working in Box. During this time, users were unable to upload or download files and experienced intermittent latency and errors with the Content API, Preview (Web UI and Mobile), Box Sync and Drive, and Box Edit. For some users, certain parts of the Admin Console did not load as well.
Our analysis concluded that this issue occurred when instances of the Policy Engine service were overloaded and stuck at high CPU levels. This resulted in that service’s inability to serve traffic, and consequently degraded the success rate for uploads and downloads. The issue was remediated by allocating additional capacity to Policy Engine to handle the unexpected surge in traffic to bring CPU usage back down to acceptable levels.
Policy Engine is a service that among other things, returns the appropriate storage policy to use for an upload, and is on the critical path for all upload and download operations at Box.
During the impact, the CPU usage on Policy Engine was pegged at 100%. As a result, it was unable to properly service requests, which resulted in impact to uploads and downloads.
The high CPU usage was the result of being overloaded. The reason it was overloaded was due to an upgrade of our PHP infrastructure, which talks to Policy Engine, from an older version of PHP to PHP7. Due to this upgrade, there was a change to how connections and connection pooling is managed between services. This led to an uneven distribution of load to Policy Engine eventually increasing the CPU load to 100%.
While testing was performed on the PHP7 upgrade, this issue only presents itself under extreme load and could not have been detected before it was deployed to production.
The following remediation actions have been completed or are planned:
We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We are sorry for the disruption this caused for you and your organization. The security and availability of your content is our top priority.
Sincerely,
The Box Team