exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds
Steps to reproduce the behavior:
Hi, I just try to fix this issue with my scheduler which is not running.
The errors I get in Zammad monitoring are:
emails that could not be processed: 5
Channel: Email::Notification out No such file or directory - /usr/sbin/sendmail
scheduler may not run (last execution of CacheClearJob.perform_now 8 days ago) - please contact your system administrator
I use the standard docker compose file but for 8 days the scheduler is not really working correctly anymore with the described errors. I don’t know where to look anymore and I am clueless what the issue is.
I’ve already checked the active connections in postgres which were active 1 and idle 15 which sound okay and I’ve also increased the limit to 500 concurrent connections just in case but that did not help.
Any pointer in the right direction would be of enourmous help. Otherwise I will have to restore from backup and try without docker.
This is the postgres log:
PostgreSQL Database directory appears to contain a database; Skipping initialization
2024-05-14 08:22:28.645 UTC [1] LOG: starting PostgreSQL 15.3 on x86_64-pc-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
2024-05-14 08:22:28.645 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-05-14 08:22:28.645 UTC [1] LOG: listening on IPv6 address "::", port 5432
2024-05-14 08:22:28.660 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-05-14 08:22:28.685 UTC [24] LOG: database system was shut down at 2024-05-14 08:22:25 UTC
2024-05-14 08:22:28.721 UTC [1] LOG: database system is ready to accept connections
2024-05-14 08:22:29.010 UTC [28] LOG: incomplete startup packet
2024-05-14 08:27:28.780 UTC [22] LOG: checkpoint starting: time
2024-05-14 08:27:30.534 UTC [22] LOG: checkpoint complete: wrote 19 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.714 s, sync=0.014 s, total=1.754 s; sync files=17, longest=0.007 s, average=0.001 s; distance=157 kB, estimate=157 kB
2024-05-14 08:32:28.604 UTC [22] LOG: checkpoint starting: time
2024-05-14 08:32:30.957 UTC [22] LOG: checkpoint complete: wrote 24 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.311 s, sync=0.014 s, total=2.353 s; sync files=20, longest=0.007 s, average=0.001 s; distance=55 kB, estimate=147 kB
Same issue which encompasses a wide variety of jobs:
E, [2024-05-14T21:57:46.447411#1-150100] ERROR -- : execute Channel.fetch (try_count 10) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use> in: 5.000439031 seconds.
E, [2024-05-14T21:57:47.446934#1-150120] ERROR -- : execute SessionTimeoutJob.perform_now (try_count 10) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use> in: 5.000486994 seconds.
E, [2024-05-14T21:57:48.447597#1-150140] ERROR -- : execute Job.run (try_count 10) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use> in: 5.000497121 seconds.
E, [2024-05-14T21:57:48.544974#1-150520] ERROR -- : execute ExternalCredential::Exchange.refresh_token (try_count 0) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use> in: 5.00036161 seconds.
E, [2024-05-14T21:57:49.448330#1-150160] ERROR -- : execute Stats.generate (try_count 10) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use> in: 5.000454126 seconds.
E, [2024-05-14T21:57:50.448750#1-150180] ERROR -- : execute Chat.cleanup_close (try_count 10) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use> in: 5.000459974 seconds.
E, [2024-05-14T21:57:51.448111#1-150200] ERROR -- : execute DataPrivacyTaskJob.perform_now (try_count 10) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use> in: 5.000463803 seconds.
E, [2024-05-14T21:57:51.448506#1-150100] ERROR -- : Can't reconnect to database #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.001 seconds); all pooled connections were in use>
E, [2024-05-14T21:57:51.448729#1-150100] ERROR -- : Failed to run Channel.fetch after 11 tries.
E, [2024-05-14T21:57:52.447392#1-150120] ERROR -- : Can't reconnect to database #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use>
E, [2024-05-14T21:57:52.447686#1-150120] ERROR -- : Failed to run SessionTimeoutJob.perform_now after 11 tries.
E, [2024-05-14T21:57:53.447999#1-150140] ERROR -- : Can't reconnect to database #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use>
E, [2024-05-14T21:57:53.448259#1-150140] ERROR -- : Failed to run Job.run after 11 tries.
E, [2024-05-14T21:57:53.473147#1-150220] ERROR -- : execute Ticket.process_escalation (try_count 8) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use> in: 5.000269833 seconds.
E, [2024-05-14T21:57:53.545319#1-150520] ERROR -- : Can't reconnect to database #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use>
E, [2024-05-14T21:57:54.448716#1-150160] ERROR -- : Can't reconnect to database #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use>
E, [2024-05-14T21:57:54.448929#1-150160] ERROR -- : Failed to run Stats.generate after 11 tries.
I also had to modify docker-compose.yml from the one included on the docker-compose github because it’s missing the “/opt/zammad/var” folder which [POSTGRES_VERSION=15.6-alpine] requires. That version is specified in the github .env file.
Thank you very much, this makes sense and would’ve propably solved my issue but in the meantime I’ve migrated to a dedicated machine due to the better support.
I’ve had to reg-register elastics trough the rails console but after that it was a straight forward migration without big issues and it at least feels quicker than dockerized.
https:// community.zammad .org/t/scheduler-fails-with-activerecord-could-not-obtain-a-connection-from-the-pool-within-5-000-seconds-waited-5-000-seconds-all-pooled-connections-were-in-use/14320/3
(Can’t put properly formed links in posts here)
I’ve tried different things as described in that ticket but can’t put my finger on what the actual fix is.