Zammad Worker: could not obtain a connection from the pool within 5.000 seconds

Infos:

  • Used Zammad version: 5.4.1-1681892852.be633550.focal
  • Postgres Version: 12.14-0ubuntu0.20.04.1
  • Elasticsearch Version: 7.10.1 (Docker Image)
  • Used Zammad installation type: Debian Package
  • Operating system: Ubuntu 20.04
  • Browser + version: Chrome

Actual behavior:

The Zammad Monitoring page outputs:

scheduler may not run (last execution of Ticket.process_pending 24 minutes ago) - please contact your system administrator

We already set the max_connections from the documentation:

grep max_connections  /etc/postgresql/12/main/postgresql.conf 
max_connections = 2000			# (change requires restart)

Error log from journald of zammad-worker-1:

May 02 11:14:27 zammad zammad-worker-1.service[525475]: bundler: failed to load command: script/background-worker.rb (script/background-worker.rb)
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions.rb:534:in `block (3 levels) in jobs'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions.rb:579:in `thread_client'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions.rb:579:in `new'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions/client.rb:9:in `initialize'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions/client.rb:25:in `fetch'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions/client.rb:25:in `loop'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions/client.rb:36:in `block in fetch'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/app/models/application_model/can_lookup.rb:29:in `lookup'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/core.rb:386:in `find_by'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/core.rb:490:in `cached_find_by_statement'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_handling.rb:283:in `connection'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_handling.rb:327:in `retrieve_connection'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:1128:in `retrieve_connection'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:428:in `connection'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:588:in `checkout'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:870:in `acquire_connection'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:155:in `poll'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:164:in `synchronize'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/ruby-3.1.3/lib/ruby/3.1.0/monitor.rb:202:in `mon_synchronize'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/ruby-3.1.3/lib/ruby/3.1.0/monitor.rb:202:in `synchronize'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:155:in `block in poll'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:286:in `internal_poll'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:160:in `internal_poll'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:199:in `wait_poll'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:199:in `loop'
May 02 11:14:27 zammad zammad-worker-1.service[525475]: /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:210:in `block in wait_poll': could not obtain a connection from the pool within 5.000 seconds (waited 5.176 seconds); all pooled connections were in use (ActiveRecord::ConnectionTimeoutError)
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions.rb:534:in `block (3 levels) in jobs'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions.rb:578:in `thread_client'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions.rb:601:in `rescue in thread_client'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions.rb:579:in `thread_client'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions.rb:579:in `new'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions/client.rb:9:in `initialize'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions/client.rb:25:in `fetch'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions/client.rb:25:in `loop'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/lib/sessions/client.rb:36:in `block in fetch'
May 02 11:14:27 zammad zammad-worker-1.service[525475]:         from /opt/zammad/app/models/application_model/can_lookup.rb:29:in `lookup'

Steps to reproduce the behavior:

Currently we have no idea how to reproduce that error. We have 4-6 agents working in our Zammad instance and the error is sporadic. On some days we have to restart the worker 2-3 time and sometimes we have a week with no error at all.

Our Zammad instance is pretty small:

du -sh /var/lib/postgresql/12/main/
6.8G	/var/lib/postgresql/12/main/

Attachments are currently saved into the database.

We have only one channel active: a Google Account with an mailadress.

We don’t know how to debug that error. Do you have any idea where we can take a look?

The scheduler_err.log is empty and the production.log file contains the same information as journald outputs.

I was not able to upload the pg_stat_activity, I have created a nopaste: https://nopaste.net/AC1hL2Slj5

My guess would be that your database requires a bit of tuning, especially PostgreSQL needs to be configured/tailored properly. Just setting max_connections to a really high value usually isn’t the solution.

Could you post your PostgreSQL logfile from a period in which you encountered the behaviour? Do you have I/O and CPU statistics on your instance? Metrics collected on your disk I/O as well as CPU load/usage during the problems might provide to be useful.

Also, could you verify you have the following configured in postgresql.conf and if not, please configure it:

log_lock_waits = on
log_temp_files = 0

The output of this query might be helpful as well, or as an alternative, post the entries in your postgresql.conf that you have changed manually:

SELECT name, setting, source, boot_val, reset_val FROM pg_settings WHERE boot_val != reset_val;

You could also start logging slow queries, but this might produce a lot of log output, slowing down your system even more. You could start by logging queries which take longer than 5 seconds, which is a really long time:

log_min_duration_statement = 5000

The output of SELECT * FROM pg_stat_activity; is very much appreciated, but it needs to have been run during the error conditions for it to show possible problem areas. It is a snapshot of current database activity and this just show a lot of idle client connections, waiting for something to do.

1 Like