We also maintain a list of Known Product Issues separate from this site here.
We recently addressed issues affecting the Box webapp. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.
Between 5:00 pm PT on April 16, 2023 and 1:00 am PT on April 17, 2023, as well as between 4:30 pm PT on April 17th and 2:00 am PT on April 18, 2023, some users may have experienced difficulties while working in Box. During this time, some users experienced slowness and intermittent failures interacting with certain parts of the Box webapp and public API, including while applying Box Shield Classifications and sending documents for signature via Box Sign. The issue occurred as a result of a gradual new feature rollout that overwhelmed a cache held inside one of the key middleware services. We were able to resolve the issue by deploying a code change that prevented the feature rollout framework from overwhelming the cache and restarting the impacted service to clear the cache. In addition, we are working on improving monitoring and alerting around the impacted cache to be able to detect and address similar problems before they impact our users in the future.
Analysis
Our investigation shows that the incident was caused by an inefficient configuration of the experimentation framework that was used to begin gradual rollout of a new feature. This framework unexpectedly stored decision results in the cache, regarding whether the feature should be shown to the user. This decision storage overran the cache, causing other entries to be evicted. This meant that the web application ended up having to reconstitute the frequently used and expensive to compute information that would have ordinarily been in this cache. The effective capacity of our web application processing was thus dramatically reduced, which caused exhaustion and subsequent degradation of several of our latency sensitive services. The incident remediation was slowed down significantly by the lack of explicit alerting on the web application server’s cache.
Corrective Actions
The following corrective actions have been completed or are planned:
We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter.
Sincerely,
The Box Team