We also maintain a list of Known Product Issues separate from this site here.
We recently addressed issues affecting Box Logins and Webapp. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.
Between 3:52pm PST and 5:10pm PST on January 15th, 2025, some users may have experienced difficulties while working in Box. During this time, some of the load balancers taking customer traffic suffered from memory exhaustion leading to users experiencing intermittent issues logging in to Box and connecting to the Box Webapp. The issue occurred due to a latent memory exhaustion problem in some of our public load balancer instances and was exacerbated by peak traffic levels. We were able to resolve the issue by performing a rolling restart of the affected instances and increasing the available load balancer instances to support peak traffic levels. In addition, we are working on improving our observability into the latent memory issues and previously unknown signals on these systems to prevent similar issues from occurring in the future.
Analysis
Starting at 5:30am PST on January 15th, some external load balancer instances which are responsible for routing all customer traffic started to exhaust their shared memory allocation. This happened due to organic traffic growth and an increase in the number of backends to which the load balancers were proxying traffic. While the overall memory of these systems remained at an acceptable level throughout the incident window, the shared memory zone was not tracked as a separate metric; as a result, the team was not alerted to this resource exhaustion. As the day went on and traffic levels increased, at 3:52pm PST these instances started to experience intermittent problems passing traffic to their backends, which led to customers experiencing intermittent errors or slowdowns when accessing Box, (including logins as well as the Webapp). Once the problem was identified during the investigation, we performed a rolling restart of the affected load balancer instances and increased the number of available instances. As a result of these efforts, overall site health was immediately improved and was considered recovered at 5:10pm PST.
Corrective Actions
Box has initiated the following corrective actions:
We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter.
Sincerely,
The Box Team