Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 2.13.2 GA
Component/s: System
Labels:
- support

Blocked:
False
Blocked Reason:
None
Ready:
False
3Scale PT Tested upstream:
Not Started
3scale PT Docs:
Not Started
3scale PT Product Specs:
Not Started
3scale PT Product Update Ready:
Not Started
3scale PT Released In Saas:
Not Started
3scale PT Verified Product:
Not Started

SFDC Cases Counter:
SFDC Cases Links:

Description

If system-app is experiencing slow response times from the database - let's say Postgres - and those responses cause the unicorn worker processes to timeout persitently for many hours how should the connections be handled?

From the code it looks like before we fork the new child process we are in fact killing any previous connection. Is that a correct understanding?

If yes then I assume each worker process will have a pool of 5 connections defined by RAILS_MAX_THREADS right?

So for 1 replica of system-app with 1 CPU share available then we can expect 10 connections per container for a total of 30?

So what other 3scale pods are opening connections against the database? Sidekiq and zync maybe? I assumed these would do so via system by way of the APIs but according to THREESCALE-10157 zync does indeed open connections and potentially too many of them.

The question as stated in the summary is more about how are those connections handled if a worker process is killed?

In this scenario there were many many persitent timeouts and after some hours the following errors started to appear in master and provider containers where those timeouts were happening:

PG::ConnectionBad (FATAL:  remaining connection slots are reserved for non-replication superuser connections
)

From the monitoring it was possible to see over 100 connections were concurrently open and that the maximum query time for some of those was over 90s which is more than double the timeout setting in system-app.

Why would those connections still be idle?

Is that expected and if so is the correct solution simply to increase the max_connections setting on the database?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Kevin Price

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2024/04/12 3:49 PM

Updated:: 2024/04/16 10:20 PM