Uploaded image for project: 'Red Hat Developer Website'
  1. Red Hat Developer Website
  2. DEVELOPER-1164

Purge old content from the DCP

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

    XMLWordPrintable

Details

    • Story
    • Resolution: Obsolete
    • None
    • None
    • Important

    Description

      Currently, when we index data for the DCP, we only add content. Removal of old content has to be done manually and is often forgotten.

      For example, if a quickstart is removed in a new release, it still stays in the DCP. Similarly, if an item is removed from one of the spreadhseets (an `event` or `connector` for example), it doesn't get removed from the DCP.

      The difficulty is in knowing what to remove. We could delete everything before indexing. This would ensure that only the required documents are added and would solve the problem. However, it would create an outage window on every production build upto 15 minutes whilst the data is re-added. Clearly, not acceptable.

      I need to check this, but I'm pretty sure we always recreate all JBoss Developer content with every build. It's only user-created content (like ratings) that don't get added by the build. Therefore, at the end of the build, we could simply remove all JBoss Developer content that was not updated during that build. I see two ways of doing this:

      1) Remove all content with a `sys_updated` field with a timestamp earlier than the build start time. This approach would work with the current data, as all documents have a `sys_updated` field. However, my concern would be that a clock-sync issue could wipe out too much data. This seems too error-prone IMO.

      2) Add a build number to all documents. Here we'd add a integer value that increases with every production build. Pruning the data is simply a matter of removing all documents not containing the latest build number. We would need to update all documents to include this build number and (most likely) have CI specify the actual value. This is my preference as it seems safer.

      Just to be safe, we should make this an opt-in, so that we only prune collections that we know are safe to do in this manor.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              paul.robinson@redhat.com Paul Robinson
              Archiver:
              rhn-support-ceverson Clark Everson

              Dates

                Created:
                Updated:
                Resolved:
                Archived:

                PagerDuty