Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-2780

Quay upgrade hangs in quay-postgres-migration-cleanup container

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • z-stream
    • quay-v3.3.4
    • quay
    • False
    • False
    • Quay Enterprise
    • 0

      Upgrading quay from 3.3.4 to 3.4.x or 3.6.x won't complete and POD quay-enterprise-quay-postgres-migration-xxxxxx will remain in 1 out of 2 containers in ready status

      ~~~
      quay-enterprise-quay-postgres-migration-xxxxxx 1/2 Running 0 8m6s

      status:
      conditions:

      • lastProbeTime: null
        lastTransitionTime: '2021-11-03T16:55:55Z'
        status: 'True'
        type: Initialized
      • lastProbeTime: null
        lastTransitionTime: '2021-11-03T16:53:37Z'
        message: 'containers with unready status: [quay-postgres-migration-cleanup]'
        reason: ContainersNotReady
        status: 'False'
        type: Ready
      • lastProbeTime: null
        lastTransitionTime: '2021-11-03T16:53:37Z'
        message: 'containers with unready status: [quay-postgres-migration-cleanup]'
        reason: ContainersNotReady
        status: 'False'
        type: ContainersReady

      containerStatuses:

      • containerID: cri-o://8e7324eac11af69a911404170c43b805758f5d59efc97ccbc43d07bba16eb756
        image: registry.redhat.io/rhel8/postgresql-10@sha256:98ca35fdf08068b49216a35ed4e81507bf91c8babf30c92d5f200cbfb2df35ed
        imageID: registry.redhat.io/rhel8/postgresql-10@sha256:6d97b69c1dd606d5ca679bbae3d6e6a8073bb1c561fa8b0d78e37f0bcac84237
        lastState: {}
        name: quay-postgres-migration
        ready: true
        restartCount: 0
        started: true
        state:
        running:
        startedAt: '2021-11-03T16:55:55Z'
      • containerID: cri-o://67d375f710732a4f9afc4d35832cd9094fb08130d40a4a8450f590c668a72ccc
        image: registry.redhat.io/rhel8/postgresql-10@sha256:98ca35fdf08068b49216a35ed4e81507bf91c8babf30c92d5f200cbfb2df35ed
        imageID: registry.redhat.io/rhel8/postgresql-10@sha256:6d97b69c1dd606d5ca679bbae3d6e6a8073bb1c561fa8b0d78e37f0bcac84237
        lastState: {}
        name: quay-postgres-migration-cleanup
        ready: false
        restartCount: 0
        started: true
        state:
        running:
        startedAt: '2021-11-03T16:55:55Z'
        ~~~

      cleanup container, in the postgres migration POD quay-enterprise-quay-postgres-migration, starts at the same time as the container quay-postgres-migration.

      The command which runs on the cleanup container is:

      ~~~

      • command:
      • /bin/bash
      • -c
      • sleep 20; rm -f /tmp/change-username.sql /tmp/check-user.sql; echo "ALTER ROLE
        \"$OLD_DB_USERNAME\" RENAME TO \"$NEW_DB_USERNAME\"; ALTER DATABASE \"$OLD_DB_NAME\"
        RENAME TO \"$NEW_DB_NAME\";" > /tmp/change-username.sql; echo "SELECT 1 FROM
        pg_roles WHERE rolname = '$NEW_DB_USERNAME';" > /tmp/check-user.sql; psql -h
        localhost -f /tmp/check-user.sql | grep -q 1 || psql -h localhost -f /tmp/change-username.sql;
        sleep 600;
        ~~~

      In this case, postgres did not start within the first 20 seconds in container `quay-postgres-migration` and the cleanup container throws:

      ~~~
      psql: could not connect to server: Connection refused
      Is the server running on host "localhost" (::1) and accepting
      TCP/IP connections on port 5432?
      could not connect to server: Connection refused
      Is the server running on host "localhost" (127.0.0.1) and accepting
      TCP/IP connections on port 5432?
      psql: could not connect to server: Connection refuse 0/1 and accepting
      TCP/IP connections on port 5432?
      could not connect to server: Connection refused
      Is the server running on host "localhost" (127.0.0.1) and accepting
      TCP/IP connections on port 5432?
      ~~~

      In container quay-postgres-migration, we see the DB starts later, but the script from cleanup container waits 600 seconds to start over.

      User stated that waiting 10 minutes won't solve the issue and the cleanup container remains in loop hung and the upgrade process won't complete.

      As a workaround, one can manually access the cleanup container and run the psql scripts, this makes the container to change the DB ownership and the upgrade process to complete.

      Another workaround is to wait for the container quay-postgres-migration to start the DB and then access the node where the POD is scheduled onto and stop the cleanup container to let the POD start it back after the DB is up, this also worked and the upgrade procedure completed

            jonathankingfc Jonathan King
            rhn-support-jcoscia Javier Coscia
            Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: