Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-8078

Alertmanager sent too much information - broke Pagerduty Incident

XMLWordPrintable

    • False
    • None
    • False
    • No
    • ---
    • ---

      During this incident https://issues.redhat.com/browse/OHSS-11275 SRE was paged with multiple alerts. It was observed that the Pagerduty Incident that contained these alerts were broken in the way that it lacked all the expected information about the alerts (e.g. cluster, instance, namespace, SOP, triggered alerts). This caused the SRE on-call to scramble and look for the actual SOP and alerts that fired. This is not a sustainable path for SREs, therefore we need to investigate why the Pagerduty Incident discarded the information.

      Except for the pagerduty incident title, the following is the only information sent.

      Please investigate.

      error	
              Custom details have been removed because the original event exceeds the maximum size of 512KB
      

            Unassigned Unassigned
            jcueto@redhat.com Jose Cueto
            MK - Running the Service
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: