Intermittent build errors

Incident Report for CloudCannon

Postmortem

Postmortem: S3 Storage Incident (May 3–5, 2026)

Root Cause

Between May 3–5, 2026, a bug in CloudCannon's site-deletion logic caused a background job to delete a large portion of files from our production S3 bucket.

When a site is deleted, a background job removes its associated files from S3 using the site's storage prefix as the deletion target. The issue was caused when a site with a blank storage prefix — a value that should not be permitted but lacked validation — was deleted. The deletion function had no guard against a blank prefix, so it resolved to match every object in the bucket. The job's automatic retry behavior meant that when it timed out mid-deletion, it resumed on each retry, progressively deleting further through the bucket alphabetically, and the loss of source files caused builds to fail.

All affected files have been fully recovered using S3 versioning.

Impact

  • Sites whose storage prefix fell alphabetically before the job's timeout point lost all source files in S3.
  • All builds for affected sites failed.
  • Syncing diffs were also deleted and were unavailable during this window. These have been recovered.
  • Deployed sites, served from a separate bucket, were not affected.
  • Hosting was relatively unaffected due to caching layers.
  • No customer data was permanently lost — S3 versioning was enabled, so all files were recoverable.

Resolution

Files were recovered by removing the S3 delete markers created by the bulk deletion, restoring all objects in place. No database changes or manual re-syncing were required.

The following fixes have been shipped:

  • The S3 deletion function now returns immediately if given a blank prefix.
  • The site deletion job now validates the storage prefix before running.
  • The site model now enforces a non-blank storage prefix at the data layer.
Posted May 05, 2026 - 22:55 UTC

Resolved

All files have been restored and sites are rebuilt. Thanks for your patience surrounding this issue. If you experience any further issues, please contact support.
Posted May 05, 2026 - 04:39 UTC

Update

Recovery is complete. We are working to rebuild sites with a broken build.
Posted May 05, 2026 - 04:07 UTC

Update

The issue is patched and we are working through the recovery process.
Posted May 05, 2026 - 00:30 UTC

Update

We have identified the issue and we are working on a fix. Sites will be rebuilt when resolved.
Posted May 04, 2026 - 22:11 UTC

Update

Certain sites are stuck trying to sync at the start of a build. This is not as widespread as initially expected. We will continue our investigation during business hours. Thank you for your patience.
Posted May 04, 2026 - 15:12 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted May 04, 2026 - 14:38 UTC

Investigating

We are currently investigating this issue.
Posted May 04, 2026 - 14:07 UTC
This incident affected: API/App (App (app.cloudcannon.com)).