Between 9:41 PM PST on November 15th and 2:51 AM PST on November 18th, customers were not able to search for newly uploaded or modified documents. This was caused by a confluence of two factors. The first happened on November 8th, when a new version of the search service was deployed and introduced a threading model change from a shared library. The second happened in the hours leading up to the issue, when the search service experienced increased load due to an internal maintenance process. The increased load in conjunction with the threading model change caused the search service to experience thread starvation, slowing down and, ultimately, preventing indexing of newly uploaded documents and modifications to existing documents.
Multiple search service components use a shared library to perform critical actions. There was a change to the threading model used by this library. When this change propagated to multiple search components they all became susceptible to spikes in load. The threading model change operated without issues for a period of time; however, when a maintenance workflow exerted a high load on the search service, the underlying issue with the threading model change manifested. To restore the search service, the team rolled back to an old version of the service which did not contain the threading model change. To address the root cause, the search team fixed the threading model used by this library.
The following remediation actions have been completed or are planned:
We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter.
The Box Team