For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Critical] Issues with Search
Incident Report for Box
Postmortem

We recently addressed issues affecting Search. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between 9:33 pm PDT on June 3, 2024 and 1:28 pm PDT on June 4, 2024 and between 6:38 pm PDT and 7:52 pm PDT on June 6, 2024, some users may have experienced difficulties while working in Box. During this time, users were intermittently unable to query Search or receive Search results. This issue impacted a very small number (approximately 2%) of enterprises. During the first period, the issue occurred as a result of a partially rolled out recent code change. This code change was part of our ongoing effort to improve performance and stability. During the second period, the issue occurred a result of a latent bug in the Search Shadow service, which was used to investigate the root cause of the initial issue. We were able to resolve the issue by fully rolling back the change in both cases. In addition, we have added alerting to detect partial rollouts of changes as well as tests in pre-production environments and addressed the latent bug in the Search Shadow service to prevent similar issues from occurring in the future. 

Analysis

On June 3, 2024, Search released a change to how backend Search nodes are queried in order to improve performance and stability. This change had an unintended effect on query patterns that dramatically increased load for a small number of queries and only manifested at scale. Although the number of such queries was small, when a backend Search node processed them it would sporadically run out of memory and impact all traffic to that particular node. The Search release mechanism partially rolled out this change to a fraction of the fleet, but did not progress further. We initiated standard rollback procedures, but we did not detect that the change remained deployed on one of the nodes. This complicated the processes of diagnosing and mitigating the impact. The issue was mitigated when the partial rollback was detected and the change was fully rolled back.

On June 6, 2024, as part of the investigation to identify and verify the root cause of the initial issue, the Search team utilized a so-called Shadow service in production that does not service live traffic. However, this Shadow service contained a latent bug that allowed a small percent of queries to be issued against the live backend Search nodes. Because the issue that occurred on June 3, 2024 could be triggered with just a handful of queries, the live serving nodes were inadvertently impacted by this Shadow service. The Shadow service change was rolled back to mitigate.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Improve release mechanisms and procedures to fully rollout or rollback changes in a bounded period of time
  • Add tooling and alerting to determine whether a change is fully or only partially deployed
  • Add additional tests to pre-production environments to detect unintended Search query pattern changes that sporadically increase load on backend Search nodes
  • Fix latent bug in Shadow service and add additional monitoring to detect queries being routed incorrectly to live backend Search nodes

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,

The Box Team

Posted Jun 10, 2024 - 16:58 PDT

Resolved
Following additional monitoring, we've verified that Search functionality has been fully restored and this incident is now considered resolved. If you continue to experience any issues, please contact Box Support at https://support.box.com.
Posted Jun 04, 2024 - 15:12 PDT
Monitoring
Our team has implemented measures to address this issue and are observing notable improvements to search functionality across Box. We are continuing to monitor for any additional impact.
Posted Jun 04, 2024 - 14:16 PDT
Update
Our teams are diligently addressing this matter with top priority. Expect the next update within 2 hours or upon the next status change.
Posted Jun 04, 2024 - 13:11 PDT
Update
Remediation efforts are still ongoing. Our teams are fully dedicated to resolving this matter as quickly as possible. Next update in 2 hours or at next status change.
Posted Jun 04, 2024 - 10:53 PDT
Update
Our teams are still actively working towards a fix. During this time users will experience a degraded search experience across Box where search attempts may either not return any results, will provide inaccurate results, or the search bar will be missing entirely. Next update in 2 hours or at next status change.
Posted Jun 04, 2024 - 08:34 PDT
Update
We are actively working towards a fix. Further updates to come.
Posted Jun 04, 2024 - 07:47 PDT
Update
We are still investigating a possible cause in order to mitigate the issue.
Posted Jun 04, 2024 - 07:06 PDT
Update
We continue investigating a possible cause in order to mitigate the issue.
Posted Jun 04, 2024 - 06:26 PDT
Update
We continue investigating a possible cause in order to mitigate the issue.
Posted Jun 04, 2024 - 05:49 PDT
Update
We are investigating a possible cause in order to mitigate the issue.
Posted Jun 04, 2024 - 04:59 PDT
Update
We are investigating a possible cause in order to mitigate the issue.
Posted Jun 04, 2024 - 04:17 PDT
Update
We are investigating a possible cause in order to mitigate the issue.
Posted Jun 04, 2024 - 03:47 PDT
Update
We are diligently working on finishing up the fix to be deployed.
Posted Jun 04, 2024 - 03:13 PDT
Update
We are working towards deploying the fix. Thank you for your patience while we work on this.
Posted Jun 04, 2024 - 02:35 PDT
Update
We are actively working towards a fix. Further updates to come.
Posted Jun 04, 2024 - 02:05 PDT
Update
We continue working on a fix.
Posted Jun 04, 2024 - 01:33 PDT
Update
We continue working on a fix to mitigate this issue. Further updates to come as the remediation process takes place.
Posted Jun 04, 2024 - 01:03 PDT
Update
We are continuing to work on a fix for this issue. We will post further updates here as soon as one is available.
Posted Jun 03, 2024 - 23:58 PDT
Identified
Box continues remediation efforts to restore full functionality to Search.

Users of this service may continue to experience latency or failures when trying to search for content while remediation is in process.
Posted Jun 03, 2024 - 22:57 PDT
Investigating
We are investigating an ongoing issue affecting Search. Users may experience latency or failures when trying to search for content. We will provide more information as soon as it is available.
Posted Jun 03, 2024 - 22:49 PDT
This incident affected: Desktop Applications (Box Drive), Mobile Applications (Search), Box Web Application (Search), and Box Platform / API (Search).