Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-2698

Improve performance of garbage collection for S3 binary storage

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Unresolved
    • Major
    • 5.5
    • None
    • Storage

    Description

      Given that listing all S3 objects in a bucket and checking for an "unused" header is not very efficient, I propose to use another mechanism to regularly clean unused objects.
      It would rely on 2 features provided by S3:

      In practice, such a lifecycle rule would be set-up for/by ModeShape:

      <LifecycleConfiguration>
        <Rule>
          <ID>ModeShape Garbage Collection</ID>
          <Status>Enabled</Status>
          <Filter>
            <Tag>
               <Key>unused</Key>
               <Value>true</Value>
            </Tag>
          </Filter>
        </Rule>
      </LifecycleConfiguration>
      

      The main advantage would be to delegate the clean up process to S3, freeing up ModeShape of iterating over a (possibly) humongous list of objects.
      On the other hand, the interval at which this clean up would take place is handled by S3: in practice, every 24h AFAIK.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dalbani Damiano Albani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: