We also maintain a list of Known Product Issues separate from this site here.
Between 3:11 AM PT and 5:33 AM PT on April 11, 2024, some users may have experienced difficulties while working in Box. During this time, some users might have not been able to access Box Notes. The issue occurred as a result of a recent code change in our ongoing effort to improve user experience and stability of our application. We were able to resolve the issue by releasing a missing configuration value in production. In addition, we are working to improve our safety and efficiency of our release process to prevent similar issues from occurring in the future.
Analysis
The code change responsible for the issue in object was caused by a missing configuration value in production settings. As a result, all instances of Notes using the new version image were unable to open and experienced downtime.
During this incident, we identified certain gaps that contributed to this issue or resulted in longer-than-normal remediation, all of which we are working to address. First, this issue exposed a gap in our validation process for production configurations either before deployment or during server startup. Additionally, this issue did not trigger any alerts and was not visible through tracked metrics like SLO and UA graphs, which resulted in a lack of visibility of this issue. Finally, the deployment process for Notes servers does not allow for immediate rollbacks due to statefulset rollout taking approximately 45 minutes to complete, and our current incremental deployment process leaves us without a fully safe data center option as fallback.
Corrective Actions
The following corrective actions have been completed or are planned:
We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter.
Sincerely,
The Box Team