  1. Red Hat Advanced Cluster Management
  2. ACM-11283

thanos-compact pod is stuck in crashloopback and marking blocks for deletion is not working (no deletion-mark.json in block folder)


      Description of problem:

      thanos-compact pod is stuck in crashloopback with invalid magic number error reported on blockids. There are thousands of affected blocks in customer environment and we attempted to mark the blocks for deletion and run the bucket cleanup tool but this doesn't appear to be working. When checking the block there is no deletion-mark.json found in the block folders even though they are being marked.

      level=info ts=2024-04-2214:44:32.462377639Z caller=tools_bucket.go: 1082 msg="marking done" marker=deletion-mark.json IDs=01HVY0BHNHM05K1SAMYVZTE66J
      level=warn ts=2024-04-22T14:44:32.530014271Z caller=block.go: 185 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01HVZQ8VF5PSX09AFY9F6XNQS5/deletion-mark.json already exists in bucket"

      This is currently happening in specific customer environment

            rh-ee-doolivei Douglas Camata
            rhn-support-rspagnol Ryan Spagnola
            Xiang Yin Xiang Yin
