Scheduler stuck in endless "ThreadError: deadlock; recursive locking" loop

Infos:

  • Used Zammad version: 3.2
  • Used Zammad installation source: docker-compose
  • Operating system: Debian 10
  • Browser + version: Chrome 79

Expected behavior:

  • Scheduler shouldn’t get stuck in an endless loop

Actual behavior:

Colleagues noticed today that it seemed to get rather quiet in Zammad. The monitoring revealed serious trouble:

Channel: Email::Account in Can't use Channel::Driver::Imap: #<ThreadError: deadlock; recursive locking>
Channel: Email::Account in channel is active but not fetched for about 2 hours
unprocessable mails: 2
Failed to run scheduled job 'Check Channels'. Cause: Failed to run Channel.fetch after 10 tries #<ThreadError: deadlock; recursive locking>

I checked the scheduler log and noticed that it was running in an endless loop:

zammad ~ # tail b143f5d45b2304d57cf0dea6c4f3f57a079d564494046afafcc23dd6137fe865-json.log 
{"log":"I, [2020-01-07T14:32:26.863616 #1-47223853094220]  INFO -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) RUNNING\n","stream":"stdout","time":"2020-01-07T14:32:26.863887941Z"}
{"log":"E, [2020-01-07T14:32:26.888004 #1-47223853094220] ERROR -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) FAILED (0 prior attempts) with ThreadError: deadlock; recursive locking\n","stream":"stdout","time":"2020-01-07T14:32:26.888163408Z"}
{"log":"E, [2020-01-07T14:32:26.888080 #1-47223853094220] ERROR -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) FAILED permanently because of 1 consecutive failures\n","stream":"stdout","time":"2020-01-07T14:32:26.888179272Z"}
{"log":"I, [2020-01-07T14:32:26.891901 #1-47223853094220]  INFO -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) RUNNING\n","stream":"stdout","time":"2020-01-07T14:32:26.892187003Z"}
{"log":"E, [2020-01-07T14:32:26.912909 #1-47223853094220] ERROR -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) FAILED (0 prior attempts) with ThreadError: deadlock; recursive locking\n","stream":"stdout","time":"2020-01-07T14:32:26.913105321Z"}
{"log":"E, [2020-01-07T14:32:26.912999 #1-47223853094220] ERROR -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) FAILED permanently because of 1 consecutive failures\n","stream":"stdout","time":"2020-01-07T14:32:26.913128349Z"}
{"log":"I, [2020-01-07T14:32:26.918182 #1-47223853094220]  INFO -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) RUNNING\n","stream":"stdout","time":"2020-01-07T14:32:26.918426813Z"}
{"log":"E, [2020-01-07T14:32:26.937791 #1-47223853094220] ERROR -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) FAILED (0 prior attempts) with ThreadError: deadlock; recursive locking\n","stream":"stdout","time":"2020-01-07T14:32:26.937975203Z"}
{"log":"E, [2020-01-07T14:32:26.937882 #1-47223853094220] ERROR -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) FAILED permanently because of 1 consecutive failures\n","stream":"stdout","time":"2020-01-07T14:32:26.937992784Z"}
{"log":"I, [2020-01-07T14:32:26.941200 #1-47223853094220]  INFO -- : 2020-01-07T14:32:26+0000: [Worker(host:b143f5d45b23 pid:1)] Job SearchIndexJob [1a042da0-02d4-4560-9f13-c90163fceed3] from DelayedJob(default) with arguments: [\"Ticket\", 46647] (id=3745853) (queue=default) RUNNING\n","stream":"stdout","time":"2020-01-07T14:32:26.941438874Z"}
zammad ~ # grep '1a042da0-02d4-4560-9f13-c90163fceed3' b143f5d45b2304d57cf0dea6c4f3f57a079d564494046afafcc23dd6137fe865-json.log | wc -l
445508

This is the section from the log that immediately precedes the loop: https://pastebin.com/sNfw5neK

Not sure if and how the unprocessable mails caused the issue; several of those were actually processed (at least I found the articles in the database), and the two that didn’t make it were processed without issues after I had restarted the scheduler and the database.

Steps to reproduce the behavior:

  • don’t know :confused:

That’s very odd. Maybe the scheduler temporary died or aborted at a specific point.
Restarting Zammad should solve this issue if it still persists (and is a burden).

Maybe @thorsteneckel can help better than I do.

There are still ongoing problems. The scheduler hasn’t become stuck again, but today we encountered a ticket that hasn’t had any articles, and the scheduler reported this mail and some other ~30 mails as unprocessable:

2020-01-08T13:47:40.597743530Z "ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/14fdebfba70e44e9b02c37baac010345.eml, please create an issue at https://github.com/zammad/zammad/issues"
2020-01-08T13:47:40.597770777Z "ERROR: #<Timeout::Error: execution expired>"
2020-01-08T13:47:40.613479449Z E, [2020-01-08T13:47:40.613080 #1-70216144869880] ERROR -- : ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/14fdebfba70e44e9b02c37baac010345.eml, please create an issue at https://github.com/zammad/zammad/issues
2020-01-08T13:47:40.613935299Z E, [2020-01-08T13:47:40.613242 #1-70216144869880] ERROR -- : execution expired (Timeout::Error)
2020-01-08T13:47:40.613961236Z /usr/local/lib/ruby/2.5.0/monitor.rb:185:in `lock'
2020-01-08T13:47:40.613969785Z /usr/local/lib/ruby/2.5.0/monitor.rb:185:in `mon_enter'
2020-01-08T13:47:40.613977124Z /usr/local/lib/ruby/2.5.0/monitor.rb:224:in `mon_synchronize'
2020-01-08T13:47:40.614011259Z /usr/local/bundle/gems/dalli-2.7.10/lib/dalli/options.rb:24:in `alive?'
2020-01-08T13:47:40.614020024Z /usr/local/bundle/gems/dalli-2.7.10/lib/dalli/ring.rb:43:in `server_for_key'
2020-01-08T13:47:40.614026292Z /usr/local/bundle/gems/dalli-2.7.10/lib/dalli/client.rb:367:in `perform'
2020-01-08T13:47:40.614032603Z /usr/local/bundle/gems/dalli-2.7.10/lib/dalli/client.rb:148:in `delete'
2020-01-08T13:47:40.614047375Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:333:in `block in delete_entry'
2020-01-08T13:47:40.614056553Z /usr/local/bundle/gems/dalli-2.7.10/lib/dalli/client.rb:266:in `with'
2020-01-08T13:47:40.614064057Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:87:in `with'
2020-01-08T13:47:40.614072352Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:333:in `delete_entry'
2020-01-08T13:47:40.614079566Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/cache/strategy/local_cache.rb:155:in `delete_entry'
2020-01-08T13:47:40.614086885Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:167:in `block in delete'
2020-01-08T13:47:40.614095953Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:389:in `block in instrument_with_log'
2020-01-08T13:47:40.614104580Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:398:in `block in instrument'
2020-01-08T13:47:40.614112280Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/notifications.rb:170:in `instrument'
2020-01-08T13:47:40.614120172Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:397:in `instrument'
2020-01-08T13:47:40.614127312Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:389:in `instrument_with_log'
2020-01-08T13:47:40.614134785Z /usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:166:in `delete'
2020-01-08T13:47:40.614142341Z /opt/zammad/lib/cache.rb:12:in `delete'
2020-01-08T13:47:40.614149634Z /opt/zammad/app/models/application_model/has_cache.rb:33:in `block in cache_delete'
2020-01-08T13:47:40.614157005Z /opt/zammad/app/models/application_model/has_cache.rb:32:in `each'
2020-01-08T13:47:40.614175715Z /opt/zammad/app/models/application_model/has_cache.rb:32:in `cache_delete'
2020-01-08T13:47:40.614184921Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:426:in `block in make_lambda'
2020-01-08T13:47:40.614192444Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:198:in `block (2 levels) in halting'
2020-01-08T13:47:40.614199601Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:606:in `block (2 levels) in default_terminator'
2020-01-08T13:47:40.614215200Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:605:in `catch'
2020-01-08T13:47:40.614230670Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:605:in `block in default_terminator'
2020-01-08T13:47:40.614247320Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:199:in `block in halting'
2020-01-08T13:47:40.614255709Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:513:in `block in invoke_before'
2020-01-08T13:47:40.614261680Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:513:in `each'
2020-01-08T13:47:40.614268705Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:513:in `invoke_before'
2020-01-08T13:47:40.614274695Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:131:in `run_callbacks'
2020-01-08T13:47:40.614279291Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:816:in `_run_create_callbacks'
2020-01-08T13:47:40.614292936Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/callbacks.rb:346:in `_create_record'
2020-01-08T13:47:40.614305791Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/timestamp.rb:102:in `_create_record'
2020-01-08T13:47:40.614315098Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/persistence.rb:705:in `create_or_update'
2020-01-08T13:47:40.614339235Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/callbacks.rb:342:in `block in create_or_update'
2020-01-08T13:47:40.614348963Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:98:in `run_callbacks'
2020-01-08T13:47:40.614355403Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:816:in `_run_save_callbacks'
2020-01-08T13:47:40.614361385Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/callbacks.rb:342:in `create_or_update'
2020-01-08T13:47:40.614380155Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/persistence.rb:308:in `save!'
2020-01-08T13:47:40.614387737Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/validations.rb:52:in `save!'
2020-01-08T13:47:40.614394477Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:315:in `block in save!'
2020-01-08T13:47:40.614400786Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:387:in `block in with_transaction_returning_status'
2020-01-08T13:47:40.614407009Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:265:in `transaction'
2020-01-08T13:47:40.614413489Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:212:in `transaction'
2020-01-08T13:47:40.614419364Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:385:in `with_transaction_returning_status'
2020-01-08T13:47:40.614425289Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:315:in `save!'
2020-01-08T13:47:40.614431322Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/suppressor.rb:48:in `save!'
2020-01-08T13:47:40.614437450Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/persistence.rb:53:in `create!'
2020-01-08T13:47:40.614443550Z /opt/zammad/app/jobs/concerns/has_active_job_lock.rb:60:in `block in ensure_active_job_lock_for_enqueue!'
2020-01-08T13:47:40.614455818Z /opt/zammad/app/jobs/concerns/has_active_job_lock.rb:86:in `in_active_job_lock_transaction'
2020-01-08T13:47:40.614462352Z /opt/zammad/app/jobs/concerns/has_active_job_lock.rb:57:in `ensure_active_job_lock_for_enqueue!'
2020-01-08T13:47:40.614470146Z /opt/zammad/app/jobs/concerns/has_active_job_lock.rb:6:in `block (2 levels) in <module:HasActiveJobLock>'
2020-01-08T13:47:40.614478354Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:426:in `instance_exec'
2020-01-08T13:47:40.614486208Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:426:in `block in make_lambda'
2020-01-08T13:47:40.614493428Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:198:in `block (2 levels) in halting'
2020-01-08T13:47:40.614501853Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:606:in `block (2 levels) in default_terminator'
2020-01-08T13:47:40.614509766Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:605:in `catch'
2020-01-08T13:47:40.614517327Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:605:in `block in default_terminator'
2020-01-08T13:47:40.614523491Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:199:in `block in halting'
2020-01-08T13:47:40.614530781Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:513:in `block in invoke_before'
2020-01-08T13:47:40.614542433Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:513:in `each'
2020-01-08T13:47:40.614567292Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:513:in `invoke_before'
2020-01-08T13:47:40.614577722Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:107:in `block in run_callbacks'
2020-01-08T13:47:40.614585795Z /usr/local/bundle/gems/activejob-5.2.4.1/lib/active_job/logging.rb:17:in `block (3 levels) in <module:Logging>'
2020-01-08T13:47:40.614593504Z /usr/local/bundle/gems/activejob-5.2.4.1/lib/active_job/logging.rb:48:in `tag_logger'
2020-01-08T13:47:40.614601438Z /usr/local/bundle/gems/activejob-5.2.4.1/lib/active_job/logging.rb:16:in `block (2 levels) in <module:Logging>'
2020-01-08T13:47:40.614610054Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:118:in `instance_exec'
2020-01-08T13:47:40.614617451Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
2020-01-08T13:47:40.614625579Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:136:in `run_callbacks'
2020-01-08T13:47:40.614631702Z /usr/local/bundle/gems/activejob-5.2.4.1/lib/active_job/enqueuing.rb:49:in `enqueue'
2020-01-08T13:47:40.614637419Z /usr/local/bundle/gems/activejob-5.2.4.1/lib/active_job/enqueuing.rb:20:in `perform_later'
2020-01-08T13:47:40.614643433Z /opt/zammad/app/models/concerns/has_search_index_backend.rb:27:in `search_index_update'
2020-01-08T13:47:40.614656310Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:426:in `block in make_lambda'
2020-01-08T13:47:40.614665266Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:236:in `block in halting_and_conditional'
2020-01-08T13:47:40.614672644Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:517:in `block in invoke_after'
2020-01-08T13:47:40.614680892Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:517:in `each'
2020-01-08T13:47:40.614688395Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:517:in `invoke_after'
2020-01-08T13:47:40.614706163Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:133:in `run_callbacks'
2020-01-08T13:47:40.614712228Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:816:in `_run_touch_callbacks'
2020-01-08T13:47:40.614718738Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/callbacks.rb:332:in `touch'
2020-01-08T13:47:40.614741208Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:319:in `block in touch'
2020-01-08T13:47:40.614747108Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:387:in `block in with_transaction_returning_status'
2020-01-08T13:47:40.614761610Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:265:in `transaction'
2020-01-08T13:47:40.614770157Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:212:in `transaction'
2020-01-08T13:47:40.614777498Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:385:in `with_transaction_returning_status'
2020-01-08T13:47:40.614784769Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:319:in `touch'
2020-01-08T13:47:40.614791347Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/touch_later.rb:39:in `touch'
2020-01-08T13:47:40.614798415Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/no_touching.rb:55:in `touch'
2020-01-08T13:47:40.614815747Z /opt/zammad/app/models/observer/ticket/ref_object_touch.rb:32:in `ref_object_touch'
2020-01-08T13:47:40.614820444Z /opt/zammad/app/models/observer/ticket/ref_object_touch.rb:7:in `after_create'
2020-01-08T13:47:40.614824976Z /usr/local/bundle/gems/rails-observers-0.1.5/lib/rails/observers/active_model/observing.rb:352:in `update'
2020-01-08T13:47:40.614829520Z /usr/local/bundle/gems/rails-observers-0.1.5/lib/rails/observers/activerecord/observer.rb:118:in `block (2 levels) in define_callbacks'
2020-01-08T13:47:40.614834082Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:426:in `block in make_lambda'
2020-01-08T13:47:40.614838572Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:236:in `block in halting_and_conditional'
2020-01-08T13:47:40.614843101Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:517:in `block in invoke_after'
2020-01-08T13:47:40.614847508Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:517:in `each'
2020-01-08T13:47:40.614856677Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:517:in `invoke_after'
2020-01-08T13:47:40.614861556Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:133:in `run_callbacks'
2020-01-08T13:47:40.614866135Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:816:in `_run_create_callbacks'
2020-01-08T13:47:40.614871330Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/callbacks.rb:346:in `_create_record'
2020-01-08T13:47:40.614875965Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/timestamp.rb:102:in `_create_record'
2020-01-08T13:47:40.614889565Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/persistence.rb:705:in `create_or_update'
2020-01-08T13:47:40.614898659Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/callbacks.rb:342:in `block in create_or_update'
2020-01-08T13:47:40.614903201Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:132:in `run_callbacks'
2020-01-08T13:47:40.614907619Z /usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/callbacks.rb:816:in `_run_save_callbacks'
2020-01-08T13:47:40.614912089Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/callbacks.rb:342:in `create_or_update'
2020-01-08T13:47:40.614916563Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/persistence.rb:308:in `save!'
2020-01-08T13:47:40.614921055Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/validations.rb:52:in `save!'
2020-01-08T13:47:40.614925645Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:315:in `block in save!'
2020-01-08T13:47:40.614930191Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:387:in `block in with_transaction_returning_status'
2020-01-08T13:47:40.614934759Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:265:in `transaction'
2020-01-08T13:47:40.614939330Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:212:in `transaction'
2020-01-08T13:47:40.614947280Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:385:in `with_transaction_returning_status'
2020-01-08T13:47:40.614951841Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:315:in `save!'
2020-01-08T13:47:40.614956405Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/suppressor.rb:48:in `save!'
2020-01-08T13:47:40.614960889Z /opt/zammad/app/models/channel/email_parser.rb:245:in `block in _process'
2020-01-08T13:47:40.614965343Z /opt/zammad/app/models/transaction.rb:15:in `block in execute'
2020-01-08T13:47:40.614969967Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:267:in `block in transaction'
2020-01-08T13:47:40.614981399Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/transaction.rb:239:in `block in within_new_transaction'
2020-01-08T13:47:40.614994226Z /usr/local/lib/ruby/2.5.0/monitor.rb:226:in `mon_synchronize'
2020-01-08T13:47:40.614999384Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/transaction.rb:236:in `within_new_transaction'
2020-01-08T13:47:40.615004065Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:267:in `transaction'
2020-01-08T13:47:40.615008706Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/transactions.rb:212:in `transaction'
2020-01-08T13:47:40.615013160Z /opt/zammad/app/models/transaction.rb:13:in `execute'
2020-01-08T13:47:40.615017523Z /opt/zammad/app/models/channel/email_parser.rb:171:in `_process'
2020-01-08T13:47:40.615021932Z /opt/zammad/app/models/channel/email_parser.rb:115:in `block in process'
2020-01-08T13:47:40.615026362Z /usr/local/lib/ruby/2.5.0/timeout.rb:108:in `timeout'
2020-01-08T13:47:40.615030728Z /opt/zammad/app/models/channel/email_parser.rb:114:in `process'
2020-01-08T13:47:40.615035143Z /opt/zammad/app/models/channel/driver/imap.rb:278:in `block in fetch'
2020-01-08T13:47:40.615039630Z /opt/zammad/app/models/channel/driver/imap.rb:213:in `each'
2020-01-08T13:47:40.615043974Z /opt/zammad/app/models/channel/driver/imap.rb:213:in `fetch'
2020-01-08T13:47:40.615048301Z /opt/zammad/app/models/channel.rb:56:in `fetch'
2020-01-08T13:47:40.615057508Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/relation/delegation.rb:71:in `each'
2020-01-08T13:47:40.615062758Z /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/relation/delegation.rb:71:in `each'
2020-01-08T13:47:40.615067274Z /opt/zammad/app/models/channel.rb:30:in `fetch'
2020-01-08T13:47:40.615071753Z (eval):1:in `_start_job'
2020-01-08T13:47:40.615076105Z /opt/zammad/app/models/scheduler.rb:287:in `eval'
2020-01-08T13:47:40.615080357Z /opt/zammad/app/models/scheduler.rb:287:in `_start_job'
2020-01-08T13:47:40.615084734Z /opt/zammad/app/models/scheduler.rb:239:in `block (2 levels) in start_job'
2020-01-08T13:47:40.615089120Z /opt/zammad/app/models/scheduler.rb:237:in `loop'
2020-01-08T13:47:40.615093394Z /opt/zammad/app/models/scheduler.rb:237:in `block in start_job'
2020-01-08T13:47:40.615097803Z /usr/local/bundle/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'

But several of the other unprocessable mails were in Zammad in looked just fine.

Also the scheduler seems to have crashed several times today:

zammad ~ # zammad logs scheduler --since '2020-01-08' -t 2>/dev/null | grep 'Scheduler started'
[...]
2020-01-08T12:43:15.500560033Z I, [2020-01-08T12:43:15.500277 #1-47106541479360]  INFO -- : Scheduler started.
2020-01-08T12:57:43.311653293Z I, [2020-01-08T12:57:43.311423 #1-47238914434500]  INFO -- : Scheduler started.
2020-01-08T13:16:44.064927095Z I, [2020-01-08T13:16:44.064699 #1-47024953681340]  INFO -- : Scheduler started.
2020-01-08T13:29:59.811907021Z I, [2020-01-08T13:29:59.811616 #1-47233392231860]  INFO -- : Scheduler started.
2020-01-08T14:13:29.829025142Z I, [2020-01-08T14:13:29.828849 #1-46973571155380]  INFO -- : Scheduler started.
[...]

The first four with an “could not obtain a connection from the pool within 5.000 seconds” error right before:

2020-01-08T12:42:23.060929356Z E, [2020-01-08T12:42:23.060716 #1-70115245995340] ERROR -- : execute Ticket.process_auto_unassign (try_count 0) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 6.031 se
conds); all pooled connections were in use> in: 6.031102158 seconds.
2020-01-08T12:42:27.030985110Z I, [2020-01-08T12:42:27.030664 #1-47451862771140]  INFO -- : Running job thread for 'Check Channels' (Channel.fetch) status is: sleep
2020-01-08T12:42:28.939494687Z E, [2020-01-08T12:42:28.939145 #1-70115245995340] ERROR -- : Can't reconnect to database #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.872 seconds); all pooled connections were in us
e>
2020-01-08T12:42:32.865976521Z E, [2020-01-08T12:42:32.865770 #1-70116210135100] ERROR -- : execute OnlineNotification.cleanup (try_count 0) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.834 seco
nds); all pooled connections were in use> in: 5.834263398 seconds.
2020-01-08T12:42:36.454926454Z E, [2020-01-08T12:42:36.454737 #1-70115245995340] ERROR -- : execute Ticket.process_auto_unassign (try_count 1) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.789 se
conds); all pooled connections were in use> in: 5.789693835 seconds.
2020-01-08T12:42:36.455103179Z E, [2020-01-08T12:42:36.454981 #1-70116103435220] ERROR -- : thread_client 47400689250760 exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.477 seconds); all pooled con
nections were in use>
2020-01-08T12:42:36.455265998Z E, [2020-01-08T12:42:36.455157 #1-70116103435220] ERROR -- : /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:201:in `block in wait_poll'
2020-01-08T12:42:36.455277407Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:190:in `loop'
2020-01-08T12:42:36.455283111Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:190:in `wait_poll'
2020-01-08T12:42:36.455288437Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:151:in `internal_poll'
2020-01-08T12:42:36.455293666Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:277:in `internal_poll'
2020-01-08T12:42:36.455298831Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:145:in `block in poll'
2020-01-08T12:42:36.455304016Z   /usr/local/lib/ruby/2.5.0/monitor.rb:226:in `mon_synchronize'
2020-01-08T12:42:36.455336486Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:155:in `synchronize'
2020-01-08T12:42:36.455342763Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:145:in `poll'
2020-01-08T12:42:36.455347842Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:818:in `acquire_connection'
2020-01-08T12:42:36.455352931Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:538:in `checkout'
2020-01-08T12:42:36.455358039Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:382:in `connection'
2020-01-08T12:42:36.455363180Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:1033:in `retrieve_connection'
2020-01-08T12:42:36.455368326Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_handling.rb:118:in `retrieve_connection'
2020-01-08T12:42:36.455373388Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/connection_handling.rb:90:in `connection'
2020-01-08T12:42:36.455379377Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/relation/calculations.rb:197:in `block in pluck'
2020-01-08T12:42:36.455384795Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/relation.rb:584:in `skip_query_cache_if_necessary'
2020-01-08T12:42:36.455389923Z   /usr/local/bundle/gems/activerecord-5.2.4.1/lib/active_record/relation/calculations.rb:197:in `pluck'
2020-01-08T12:42:36.455398387Z   /opt/zammad/app/models/ticket/overviews.rb:25:in `all'
2020-01-08T12:42:36.455412022Z   /opt/zammad/app/models/ticket/overviews.rb:96:in `index'
2020-01-08T12:42:36.455422146Z   /opt/zammad/lib/sessions/backend/ticket_overview_list.rb:47:in `load'
2020-01-08T12:42:36.455430834Z   /opt/zammad/lib/sessions/backend/ticket_overview_list.rb:109:in `push'
2020-01-08T12:42:36.455438337Z   /opt/zammad/lib/sessions/client.rb:69:in `each'
2020-01-08T12:42:36.455446510Z   /opt/zammad/lib/sessions/client.rb:69:in `block in fetch'
2020-01-08T12:42:36.455455135Z   /opt/zammad/lib/sessions/client.rb:25:in `loop'
2020-01-08T12:42:36.455462505Z   /opt/zammad/lib/sessions/client.rb:25:in `fetch'
2020-01-08T12:42:36.455468840Z   /opt/zammad/lib/sessions/client.rb:7:in `initialize'
2020-01-08T12:42:36.455475935Z   /opt/zammad/lib/sessions.rb:721:in `new'
2020-01-08T12:42:36.455481585Z   /opt/zammad/lib/sessions.rb:721:in `thread_client'
2020-01-08T12:42:36.455486570Z   /opt/zammad/lib/sessions.rb:743:in `rescue in thread_client'
2020-01-08T12:42:36.455491489Z   /opt/zammad/lib/sessions.rb:720:in `thread_client'
2020-01-08T12:42:36.455496376Z   /opt/zammad/lib/sessions.rb:676:in `block (3 levels) in jobs'
2020-01-08T12:42:36.455501341Z   /usr/local/bundle/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
2020-01-08T12:42:37.836329627Z I, [2020-01-08T12:42:37.836089 #1-47451862771140]  INFO -- : Running job thread for 'Execute jobs' (Job.run) status is: sleep
2020-01-08T12:42:37.836715667Z I, [2020-01-08T12:42:37.836573 #1-47451862771140]  INFO -- : Running job thread for 'Generate user based stats.' (Stats.generate) status is: sleep
2020-01-08T12:42:37.869415227Z E, [2020-01-08T12:42:37.869285 #1-70116210135100] ERROR -- : Can't reconnect to database #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.001 seconds); all pooled connections were in use>
2020-01-08T12:42:42.058653765Z E, [2020-01-08T12:42:42.058431 #1-70115245995340] ERROR -- : Can't reconnect to database #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.603 seconds); all pooled connections were in use>
2020-01-08T12:42:44.251590424Z E, [2020-01-08T12:42:44.251367 #1-70116210135100] ERROR -- : execute OnlineNotification.cleanup (try_count 1) exited with error #<ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.205 seconds); all pooled connections were in use> in: 5.205209887 seconds.
2020-01-08T12:42:46.541499737Z bundler: failed to load command: script/scheduler.rb (script/scheduler.rb)
2020-01-08T12:42:46.614416975Z I, [2020-01-08T12:42:46.614209 #1-47451862771140]  INFO -- : Scheduler stopped.
2020-01-08T12:43:03.511318469Z starting scheduler...
2020-01-08T12:43:13.532777884Z I, [2020-01-08T12:43:13.532248 #1-47106541479360]  INFO -- : Setting.set('models_searchable', ["Chat::Session", "User", "Organization", "Ticket", "KnowledgeBase::Answer::Translation"])
2020-01-08T12:43:15.500560033Z I, [2020-01-08T12:43:15.500277 #1-47106541479360]  INFO -- : Scheduler started.

And the last one with an “ThreadError: deadlock; recursive locking” error:

2020-01-08T14:13:07.489343465Z E, [2020-01-08T14:13:07.489147 #1-70216144869880] ERROR -- : Failed to run Channel.fetch after 10 tries #<ThreadError: deadlock; recursive locking>
2020-01-08T14:13:07.498388904Z bundler: failed to load command: script/scheduler.rb (script/scheduler.rb)
2020-01-08T14:13:07.508937819Z I, [2020-01-08T14:13:07.508771 #1-47233392231860]  INFO -- : Scheduler stopped.
2020-01-08T14:13:07.514948957Z I, [2020-01-08T14:13:07.514737 #1-47233469412880]  INFO -- : Performed SearchIndexJob (Job ID: bde658dc-1f8b-43a5-a5b2-fb0c536f6b5e) from DelayedJob(default) in 79916.03ms
2020-01-08T14:13:23.717447871Z starting scheduler...
2020-01-08T14:13:28.374593934Z I, [2020-01-08T14:13:28.374414 #1-46973571155380]  INFO -- : Setting.set('models_searchable', ["Chat::Session", "User", "Organization", "Ticket", "KnowledgeBase::Answer::Translation"])
2020-01-08T14:13:29.829025142Z I, [2020-01-08T14:13:29.828849 #1-46973571155380]  INFO -- : Scheduler started.

The “could not obtain a connection from the pool within 5.000 seconds” error sounded like a database issue to me; yesterday, select count(*) from pg_stat_activity reported ~150 connections of 200 max_connections, so I raised it to 400. But there is absolutely no “too many connections” error in the postgres log, so the connection limit probably isn’t the cause.

In the end I tried unsuccessfully to restart all docker containers with docker-compose stop, but that simply didn’t do anything. I then rebooted the whole machine, which has resolved the issue for now.

We upgraded the machine from Debian stretch to Debian buster (with Linux 4.19) on 2020-01-03, and before that didn’t have these issues, so this might be related.

Maybe this Ruby bug is related? https://bugs.ruby-lang.org/issues/15360

The ruby version in the docker containers seems to predate the fix (ruby_2_5 r67764):

root@b143f5d45b23:~# ruby --version                                                                                                                                                                               
ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]

and the test script also causes the “ThreadError: deadlock; recursive locking” errors when I run it.

The bug is fixed in 2.5.6, but Zammad explicitly only supports 2.5.5.

Unprocessible mails can occour if the system is on heavy load.
This would explain why it’s getting crowded in your temp folder.

Can you check if your monitoring monitors high cpu usuage?
Maybe you’re hitting technical limitations (at the moment).

I can’t say anything to the ruby version sorry.

I don’t think it’s the server load. Load average and CPU usage are elevated, but still seem acceptable:

zammad ~ # sar -q -s 12:00
Linux 4.19.0-6-amd64 (zammad)  09.01.2020      _x86_64_        (4 CPU)

12:05:01      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
12:15:01            5       629      2,18      2,12      1,93         0
12:25:01            3       619      2,51      1,88      1,80         0
12:35:01            4       629      1,67      1,70      1,74         0
12:45:01            3       618      1,72      1,85      1,79         0
12:55:01            8       617      1,68      1,67      1,74         0
13:05:01            5       624      1,85      1,64      1,70         0
13:15:01            3       635      1,45      1,61      1,65         0
13:25:01            3       645      1,27      1,60      1,66         0
13:35:01            4       616      2,07      2,10      1,91         0
13:45:01            3       584      1,90      1,89      1,89         0
13:55:01            3       645      2,27      1,92      1,84         0
14:05:01            4       575      1,82      1,75      1,74         0
14:15:01            6       672      2,56      2,15      1,93         0
14:25:01            3       646      1,34      1,61      1,79         0
14:35:01            3       649      1,56      1,58      1,69         0
14:45:01            9       637      1,54      1,72      1,74         0
Durchschn.:         4       628      1,84      1,80      1,78         0
zammad ~ # sar -s 12:00  
Linux 4.19.0-6-amd64 (zammad)  09.01.2020      _x86_64_        (4 CPU)

12:05:01        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:15:01        all     34,75      0,00      4,67      0,12      0,95     59,51
12:25:01        all     29,87      0,00      3,74      0,10      0,75     65,53
12:35:01        all     30,16      0,00      3,93      0,13      0,55     65,22
12:45:01        all     30,75      0,00      4,30      0,11      0,84     64,00
12:55:01        all     32,50      0,00      4,09      0,10      0,66     62,65
13:05:01        all     29,20      0,00      4,39      0,11      0,61     65,69
13:15:01        all     28,25      0,00      4,46      0,07      0,82     66,40
13:25:01        all     26,81      0,00      5,00      0,09      0,53     67,56
13:35:01        all     36,61      0,00      5,32      0,12      1,27     56,68
13:45:01        all     35,30      0,00      3,71      0,11      0,59     60,30
13:55:01        all     32,59      0,00      3,48      0,10      0,57     63,26
14:05:01        all     32,55      0,00      3,32      0,09      0,62     63,42
14:15:01        all     36,14      0,00      4,60      0,09      0,63     58,54
14:25:01        all     31,42      0,00      3,69      0,18      0,36     64,36
14:35:01        all     29,98      0,00      2,83      0,11      0,45     66,63
14:45:01        all     30,89      0,00      3,94      0,12      0,70     64,34
Durchschn.:     all     31,74      0,00      4,09      0,11      0,68     63,38

I wondered if the problems might be related to long-running queries in postgres and so I enabled auto_explain, but this doesn’t seem to be the cause. So far only very few queries took longer than 1s:

zammad ~ # zammad logs postgresql -t 2>&1 | grep duration
2020-01-09T12:20:12.755325384Z LOG:  duration: 2000.395 ms  plan:
2020-01-09T12:30:42.479819658Z LOG:  duration: 1065.425 ms  plan:
2020-01-09T12:30:42.480521767Z LOG:  duration: 1113.985 ms  plan:
2020-01-09T12:49:05.217780673Z LOG:  duration: 43339.040 ms  plan:
2020-01-09T13:29:22.331129841Z LOG:  duration: 1003.423 ms  plan:

I’m still getting timeouts while parsing emails:

zammad ~ # zammad logs scheduler --tail 1000 | grep -A1 ERROR                                                         
"ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/f325216e0ce870e9d15d0f4275750547.eml, please create an issue at https://github.com/zammad/zammad/issues"
"ERROR: #<Timeout::Error: execution expired>"
E, [2020-01-09T13:45:13.619701 #1-69947032249360] ERROR -- : ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/f325216e0ce870e9d15d0f4275750547.eml, please create an issue at https://github.com/zammad/zammad/issues
E, [2020-01-09T13:45:13.619784 #1-69947032249360] ERROR -- : execution expired (Timeout::Error)
/usr/local/lib/ruby/2.5.0/monitor.rb:185:in `lock'
--
"ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/045a6ea1feb4c62bec28ed218c5ddd35.eml, please create an issue at https://github.com/zammad/zammad/issues"
"ERROR: #<Timeout::Error: execution expired>"
E, [2020-01-09T13:49:10.300241 #1-69947032249360] ERROR -- : ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/045a6ea1feb4c62bec28ed218c5ddd35.eml, please create an issue at https://github.com/zammad/zammad/issues
E, [2020-01-09T13:49:10.300303 #1-69947032249360] ERROR -- : execution expired (Timeout::Error)
/usr/local/lib/ruby/2.5.0/monitor.rb:185:in `lock'
--
"ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/33324088aa8ba7e8a2614e241e142ed9.eml, please create an issue at https://github.com/zammad/zammad/issues"
"ERROR: #<Timeout::Error: execution expired>"
E, [2020-01-09T13:53:36.615225 #1-69947032249360] ERROR -- : ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/33324088aa8ba7e8a2614e241e142ed9.eml, please create an issue at https://github.com/zammad/zammad/issues
E, [2020-01-09T13:53:36.615265 #1-69947032249360] ERROR -- : execution expired (Timeout::Error)
/usr/local/lib/ruby/2.5.0/monitor.rb:185:in `lock'

As far as I can tell from the code, this is an emergency timeout that kills the parser after 180 seconds.

I’m wondering if this might have something to do with memcached. The stack traces always show Dalli on top:

"ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/f325216e0ce870e9d15d0f4275750547.eml, please create an issue at https://github.com/zammad/zammad/issues"
"ERROR: #<Timeout::Error: execution expired>"
E, [2020-01-09T13:45:13.619701 #1-69947032249360] ERROR -- : ERROR: Can't process email, you will find it for bug reporting under /opt/zammad/tmp/unprocessable_mail/f325216e0ce870e9d15d0f4275750547.eml, please create an issue at https://github.com/zammad/zammad/issues
E, [2020-01-09T13:45:13.619784 #1-69947032249360] ERROR -- : execution expired (Timeout::Error)
/usr/local/lib/ruby/2.5.0/monitor.rb:185:in `lock'
/usr/local/lib/ruby/2.5.0/monitor.rb:185:in `mon_enter'
/usr/local/lib/ruby/2.5.0/monitor.rb:224:in `mon_synchronize'
/usr/local/bundle/gems/dalli-2.7.10/lib/dalli/options.rb:24:in `alive?'
/usr/local/bundle/gems/dalli-2.7.10/lib/dalli/ring.rb:43:in `server_for_key'
/usr/local/bundle/gems/dalli-2.7.10/lib/dalli/client.rb:367:in `perform'
/usr/local/bundle/gems/dalli-2.7.10/lib/dalli/client.rb:148:in `delete'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:333:in `block in delete_entry'
/usr/local/bundle/gems/dalli-2.7.10/lib/dalli/client.rb:266:in `with'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:87:in `with'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:333:in `delete_entry'
/usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/cache/strategy/local_cache.rb:155:in `delete_entry'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:167:in `block in delete'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:389:in `block in instrument_with_log'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:398:in `block in instrument'
/usr/local/bundle/gems/activesupport-5.2.4.1/lib/active_support/notifications.rb:170:in `instrument'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:397:in `instrument'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:389:in `instrument_with_log'
/usr/local/bundle/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:166:in `delete'
/opt/zammad/lib/cache.rb:12:in `delete'
/opt/zammad/app/models/application_model/has_cache.rb:33:in `block in cache_delete'
/opt/zammad/app/models/application_model/has_cache.rb:32:in `each'
/opt/zammad/app/models/application_model/has_cache.rb:32:in `cache_delete'

There’s always calls to the database further below. Maybe the processes hang while attempting to talk to memcached, while keeping a connection to the database open, and thereby indirectly cause the “could not obtain a connection from the pool within 5.000 seconds” errors I’ve reported above?

memcached on the other hands doesn’t report anything unusual:

zammad ~ # zammad logs memcached -t
2020-01-02T18:36:09.386034094Z starting memcached...
2020-01-02T20:20:47.231337721Z Signal handled: Terminated.
2020-01-02T21:15:15.082291448Z starting memcached...
2020-01-03T17:16:11.437946526Z Signal handled: Terminated.
2020-01-03T17:18:05.345607918Z starting memcached...
2020-01-03T17:20:51.060523751Z Signal handled: Terminated.
2020-01-03T17:23:44.100805077Z starting memcached...
2020-01-08T14:15:38.649369722Z Signal handled: Terminated.
2020-01-08T14:18:15.204971746Z starting memcached...

It also immedetialy replies when I query its stats:

zammad ~ # echo -e 'stats\nquit\n' | nc "$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' zammaddockercompose_zammad-memcached_1)" 11211 | head 
STAT pid 1
STAT uptime 85247
STAT time 1578578340
STAT version 1.5.20
STAT libevent 2.1.10-stable
STAT pointer_size 64
STAT rusage_user 393.940407
STAT rusage_system 1237.484591
STAT max_connections 1024
STAT curr_connections 9

Just a small follow up:
We’re investigating this issue internally. We currently, like you, suspect the memcached to be the issue here. I’m currently waiting for feedback.

Right now I’m not sure if you just can hack the needed configuration together to check if the issue is then solved on your end:

Right now, as far as Mantas and I can tell memcached is running single threaded which can cause above issues.

Sorry that this is quite thin on information right now.

Edit: I agree that your systems load looks perfectly fine, I have seen by far worse.

1 Like

For reasons that are beyond me, the situation seems to have improved significantly after these two changes on 2020-01-14:

  1. I renamed /root/zammad-docker-compose to /root/zammaddockercompose.

    This was necessary because docker-compose up wouldn’t work because it suddenly wanted to create a new network zammad-docker-compose_default that conflicted with the existing zammaddockercompose_default. Maybe due to a change in Docker after upgrading to Debian buster…?

  2. I ran docker-compose up and then pressed CTRL+C, thereby accidentally stopping all containers (sometimes I really hate docker…), and then I restarted (and possible recreated, not sure) all containers with docker-compose up -d.

Delayed jobs were really bad on the days before, for example on 2020-01-09:

zammad ~ # zgrep log-delayed-jobs logs/syslog-20200109.gz | grep -v CRON | grep -v '0 ()$'                                                                                                                                                                                      
Jan  9 03:25:01 zammad log-delayed-jobs: 44 (43 SearchIndexJob)                                                                                                                                                                                                                          
Jan  9 03:40:01 zammad log-delayed-jobs: 2 (1 SearchIndexJob, 1 TicketUserTicketCounterJob)                                                                                                                                                                                              
Jan  9 07:35:02 zammad log-delayed-jobs: 4 (1 SearchIndexJob, 1 TicketOnlineNotificationSeenJob, 1 TicketUserTicketCounterJob)                                                                                                                                                           
Jan  9 07:40:01 zammad log-delayed-jobs: 0 (1 UserDeviceLogJob)                                                                                                                                                                                                                          
Jan  9 08:10:02 zammad log-delayed-jobs: 10 (2 SearchIndexJob, 2 TicketArticleCommunicateEmailJob, 1 TicketOnlineNotificationSeenJob, 2 TicketUserTicketCounterJob)                                                                                                                      
Jan  9 08:15:01 zammad log-delayed-jobs: 5 (2 SearchIndexJob, 1 TicketUserTicketCounterJob)                                                                                                                                                                                              
Jan  9 08:30:01 zammad log-delayed-jobs: 8 (3 SearchIndexJob, 1 TicketOnlineNotificationSeenJob, 2 TicketUserTicketCounterJob)                                                                                                                                                           
Jan  9 08:35:01 zammad log-delayed-jobs: 7 (1 SearchIndexJob, 1 TicketUserTicketCounterJob)                                                                                                                                                                                              
Jan  9 08:45:02 zammad log-delayed-jobs: 24 (8 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 2 TicketOnlineNotificationSeenJob, 4 TicketUserTicketCounterJob)                                                                                                                      
Jan  9 08:50:01 zammad log-delayed-jobs: 11 (11 SearchIndexJob)                                                                                                                                                                                                                          
Jan  9 08:55:01 zammad log-delayed-jobs: 8 (4 SearchIndexJob, 2 TicketUserTicketCounterJob, 1 UserDeviceLogJob)                                                                                                                                                                          
Jan  9 09:05:01 zammad log-delayed-jobs: 31 (16 SearchIndexJob, 6 TicketUserTicketCounterJob, 2 UserDeviceLogJob)                                                                                                                                                                        
Jan  9 09:10:02 zammad log-delayed-jobs: 6 (2 SearchIndexJob, 1 TicketUserTicketCounterJob)                                                                                                                                                                                              
Jan  9 09:15:01 zammad log-delayed-jobs: 42 (15 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 3 TicketOnlineNotificationSeenJob, 7 TicketUserTicketCounterJob, 4 UserDeviceLogJob)                                                                                                 
Jan  9 09:20:02 zammad log-delayed-jobs: 141 (83 SearchIndexJob, 4 TicketArticleCommunicateEmailJob, 5 TicketOnlineNotificationSeenJob, 13 TicketUserTicketCounterJob, 13 UserDeviceLogJob)                                                                                              
Jan  9 09:25:01 zammad log-delayed-jobs: 163 (92 SearchIndexJob, 6 TicketArticleCommunicateEmailJob, 8 TicketOnlineNotificationSeenJob, 17 TicketUserTicketCounterJob, 12 UserDeviceLogJob)                                                                                              
Jan  9 09:30:01 zammad log-delayed-jobs: 178 (100 SearchIndexJob, 5 TicketArticleCommunicateEmailJob, 9 TicketOnlineNotificationSeenJob, 17 TicketUserTicketCounterJob, 18 UserDeviceLogJob)                                                                                             
Jan  9 09:35:01 zammad log-delayed-jobs: 246 (146 SearchIndexJob, 8 TicketArticleCommunicateEmailJob, 13 TicketOnlineNotificationSeenJob, 23 TicketUserTicketCounterJob, 20 UserDeviceLogJob)                                                                                            
Jan  9 09:40:01 zammad log-delayed-jobs: 1119 (417 SearchIndexJob, 10 TicketArticleCommunicateEmailJob, 310 TicketOnlineNotificationSeenJob, 26 TicketUserTicketCounterJob, 25 UserDeviceLogJob)                                                                                         
Jan  9 09:45:01 zammad log-delayed-jobs: 1608 (582 SearchIndexJob, 11 TicketArticleCommunicateEmailJob, 455 TicketOnlineNotificationSeenJob, 47 TicketUserTicketCounterJob, 36 UserDeviceLogJob)                                                                                         
Jan  9 09:50:02 zammad log-delayed-jobs: 1699 (613 SearchIndexJob, 9 TicketArticleCommunicateEmailJob, 469 TicketOnlineNotificationSeenJob, 63 TicketUserTicketCounterJob, 42 UserDeviceLogJob)                                                                                          
Jan  9 09:55:01 zammad log-delayed-jobs: 1740 (625 SearchIndexJob, 11 TicketArticleCommunicateEmailJob, 474 TicketOnlineNotificationSeenJob, 70 TicketUserTicketCounterJob, 45 UserDeviceLogJob)                                                                                         
Jan  9 10:00:02 zammad log-delayed-jobs: 1761 (620 SearchIndexJob, 10 TicketArticleCommunicateEmailJob, 473 TicketOnlineNotificationSeenJob, 80 TicketUserTicketCounterJob, 48 UserDeviceLogJob)                                                                                         
...
Jan  9 16:20:01 zammad log-delayed-jobs: 2072 (551 SearchIndexJob, 83 TicketArticleCommunicateEmailJob, 122 TicketOnlineNotificationSeenJob, 254 TicketUserTicketCounterJob, 565 UserDeviceLogJob)                                                                                       
Jan  9 16:25:02 zammad log-delayed-jobs: 2051 (541 SearchIndexJob, 83 TicketArticleCommunicateEmailJob, 122 TicketOnlineNotificationSeenJob, 251 TicketUserTicketCounterJob, 570 UserDeviceLogJob)                                                                                       
Jan  9 16:30:01 zammad log-delayed-jobs: 1957 (526 SearchIndexJob, 75 TicketArticleCommunicateEmailJob, 114 TicketOnlineNotificationSeenJob, 232 TicketUserTicketCounterJob, 546 UserDeviceLogJob)                                                                                       
Jan  9 16:35:01 zammad log-delayed-jobs: 1486 (394 SearchIndexJob, 55 TicketArticleCommunicateEmailJob, 88 TicketOnlineNotificationSeenJob, 169 TicketUserTicketCounterJob, 400 UserDeviceLogJob)                                                                                        
Jan  9 16:40:01 zammad log-delayed-jobs: 1060 (319 SearchIndexJob, 33 TicketArticleCommunicateEmailJob, 55 TicketOnlineNotificationSeenJob, 120 TicketUserTicketCounterJob, 248 UserDeviceLogJob)                                                                                        
Jan  9 16:45:02 zammad log-delayed-jobs: 637 (287 SearchIndexJob, 14 TicketArticleCommunicateEmailJob, 18 TicketOnlineNotificationSeenJob, 49 TicketUserTicketCounterJob, 91 UserDeviceLogJob)                                                                                           
Jan  9 16:50:01 zammad log-delayed-jobs: 343 (215 SearchIndexJob, 2 TicketArticleCommunicateEmailJob, 3 TicketOnlineNotificationSeenJob, 13 TicketUserTicketCounterJob, 14 UserDeviceLogJob)                                                                                             
Jan  9 17:35:01 zammad log-delayed-jobs: 22 (22 SearchIndexJob)
Jan  9 18:25:01 zammad log-delayed-jobs: 7 (2 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 1 TicketOnlineNotificationSeenJob, 1 TicketUserTicketCounterJob, 1 UserDeviceLogJob)                                                                                                   
Jan  9 18:40:01 zammad log-delayed-jobs: 0 (1 UserDeviceLogJob)

and shorty after my changes much better:

zammad ~ # grep log-delayed-jobs /var/log/syslog | grep -v CRON | grep -v '0 ()$'                                                                                                                                                                                               
Jan 17 07:25:02 zammad log-delayed-jobs: 1 (1 UserDeviceLogJob)
Jan 17 07:45:01 zammad log-delayed-jobs: 1086 (361 SearchIndexJob, 361 TicketOnlineNotificationSeenJob, 1 TicketUserTicketCounterJob)
Jan 17 07:50:02 zammad log-delayed-jobs: 1171 (408 SearchIndexJob, 333 TicketOnlineNotificationSeenJob, 6 TicketUserTicketCounterJob, 2 UserDeviceLogJob)                                                                                                                                
Jan 17 08:05:01 zammad log-delayed-jobs: 24 (9 SearchIndexJob, 4 TicketOnlineNotificationSeenJob, 4 TicketUserTicketCounterJob, 1 UserDeviceLogJob)                                                                                                                                      
Jan 17 08:15:01 zammad log-delayed-jobs: 3 (1 TicketUserTicketCounterJob)
Jan 17 08:30:01 zammad log-delayed-jobs: 4 (2 SearchIndexJob, 1 TicketUserTicketCounterJob)
Jan 17 09:00:01 zammad log-delayed-jobs: 4 (2 SearchIndexJob, 1 TicketUserTicketCounterJob)
Jan 17 09:05:01 zammad log-delayed-jobs: 1 (1 UserDeviceLogJob)
Jan 17 09:20:01 zammad log-delayed-jobs: 2 ()
Jan 17 09:25:01 zammad log-delayed-jobs: 1 (1 UserDeviceLogJob)
Jan 17 09:45:01 zammad log-delayed-jobs: 3 (2 SearchIndexJob)
Jan 17 10:05:01 zammad log-delayed-jobs: 3 (1 TicketUserTicketCounterJob, 1 UserDeviceLogJob)
Jan 17 10:30:02 zammad log-delayed-jobs: 4 (2 SearchIndexJob, 1 TicketUserTicketCounterJob)
Jan 17 11:20:02 zammad log-delayed-jobs: 3 (1 TicketUserTicketCounterJob)
Jan 17 11:25:01 zammad log-delayed-jobs: 61 (19 SearchIndexJob, 2 TicketOnlineNotificationSeenJob, 5 TicketUserTicketCounterJob, 9 UserDeviceLogJob)                                                                                                                                     
Jan 17 11:30:01 zammad log-delayed-jobs: 32 (11 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 1 TicketOnlineNotificationSeenJob, 5 TicketUserTicketCounterJob, 2 UserDeviceLogJob)                                                                                                 
Jan 17 11:35:01 zammad log-delayed-jobs: 3 (1 SearchIndexJob, 1 TicketUserTicketCounterJob)
Jan 17 11:45:02 zammad log-delayed-jobs: 43 (13 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 3 TicketOnlineNotificationSeenJob, 4 TicketUserTicketCounterJob, 2 UserDeviceLogJob)                                                                                                 
Jan 17 11:50:01 zammad log-delayed-jobs: 103 (31 SearchIndexJob, 2 TicketArticleCommunicateEmailJob, 3 TicketOnlineNotificationSeenJob, 6 TicketUserTicketCounterJob, 16 UserDeviceLogJob)                                                                                               
Jan 17 11:55:01 zammad log-delayed-jobs: 209 (112 SearchIndexJob, 2 TicketArticleCommunicateEmailJob, 3 TicketOnlineNotificationSeenJob, 5 TicketUserTicketCounterJob, 15 UserDeviceLogJob)                                                                                              
Jan 17 12:00:01 zammad log-delayed-jobs: 250 (124 SearchIndexJob, 2 TicketArticleCommunicateEmailJob, 5 TicketOnlineNotificationSeenJob, 8 TicketUserTicketCounterJob, 20 UserDeviceLogJob)                                                                                              
Jan 17 12:05:01 zammad log-delayed-jobs: 215 (97 SearchIndexJob, 3 TicketArticleCommunicateEmailJob, 5 TicketOnlineNotificationSeenJob, 17 TicketUserTicketCounterJob, 13 UserDeviceLogJob)                                                                                              
Jan 17 12:10:02 zammad log-delayed-jobs: 190 (56 SearchIndexJob, 3 TicketArticleCommunicateEmailJob, 5 TicketOnlineNotificationSeenJob, 20 TicketUserTicketCounterJob, 2 UpdateCtiLogsByCallerJob, 24 UserDeviceLogJob)                                                                  
Jan 17 12:15:01 zammad log-delayed-jobs: 291 (130 SearchIndexJob, 8 TicketArticleCommunicateEmailJob, 8 TicketOnlineNotificationSeenJob, 26 TicketUserTicketCounterJob, 2 UpdateCtiLogsByCallerJob, 38 UserDeviceLogJob)                                                                 
Jan 17 12:20:01 zammad log-delayed-jobs: 221 (106 SearchIndexJob, 7 TicketArticleCommunicateEmailJob, 4 TicketOnlineNotificationSeenJob, 15 TicketUserTicketCounterJob, 7 UpdateCtiLogsByCallerJob, 36 UserDeviceLogJob)                                                                 
Jan 17 12:25:01 zammad log-delayed-jobs: 10 (5 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 1 TicketOnlineNotificationSeenJob, 1 TicketUserTicketCounterJob, 1 UserDeviceLogJob)                                                                                                  
Jan 17 12:30:01 zammad log-delayed-jobs: 14 (6 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 2 TicketOnlineNotificationSeenJob, 2 TicketUserTicketCounterJob, 1 UserDeviceLogJob)                                                                                                  
Jan 17 12:40:01 zammad log-delayed-jobs: 5 (2 SearchIndexJob, 1 TicketUserTicketCounterJob, 1 UserDeviceLogJob)
Jan 17 12:45:02 zammad log-delayed-jobs: 17 (5 SearchIndexJob, 1 TicketOnlineNotificationSeenJob, 2 TicketUserTicketCounterJob, 4 UserDeviceLogJob)                                                                                                                                      
Jan 17 12:55:02 zammad log-delayed-jobs: 4 (1 TicketOnlineNotificationSeenJob, 1 TicketUserTicketCounterJob)
Jan 17 13:00:01 zammad log-delayed-jobs: 14 (3 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 1 TicketOnlineNotificationSeenJob, 1 TicketUserTicketCounterJob, 6 UserDeviceLogJob)                                                                                                  
Jan 17 13:15:01 zammad log-delayed-jobs: 6 (1 TicketOnlineNotificationSeenJob, 1 TicketUserTicketCounterJob, 2 UserDeviceLogJob)
Jan 17 13:25:01 zammad log-delayed-jobs: 5 (2 SearchIndexJob, 1 TicketUserTicketCounterJob)
Jan 17 13:30:02 zammad log-delayed-jobs: 21 (4 SearchIndexJob, 1 TicketUserTicketCounterJob, 4 UserDeviceLogJob)
Jan 17 13:35:01 zammad log-delayed-jobs: 136 (87 SearchIndexJob, 1 TicketArticleCommunicateEmailJob, 2 TicketOnlineNotificationSeenJob, 3 TicketUserTicketCounterJob, 11 UserDeviceLogJob)                                                                                               
Jan 17 13:40:02 zammad log-delayed-jobs: 117 (52 SearchIndexJob, 4 TicketArticleCommunicateEmailJob, 5 TicketOnlineNotificationSeenJob, 8 TicketUserTicketCounterJob, 12 UserDeviceLogJob)                                                                                               
Jan 17 13:50:01 zammad log-delayed-jobs: 15 (3 SearchIndexJob, 1 TicketUserTicketCounterJob, 3 UserDeviceLogJob)
Jan 17 13:55:02 zammad log-delayed-jobs: 76 (23 SearchIndexJob, 3 TicketArticleCommunicateEmailJob, 6 TicketOnlineNotificationSeenJob, 9 TicketUserTicketCounterJob, 11 UserDeviceLogJob)                                                                                                
Jan 17 14:00:01 zammad log-delayed-jobs: 114 (35 SearchIndexJob, 4 TicketArticleCommunicateEmailJob, 6 TicketOnlineNotificationSeenJob, 12 TicketUserTicketCounterJob, 18 UserDeviceLogJob)                                                                                              
Jan 17 14:05:01 zammad log-delayed-jobs: 206 (111 SearchIndexJob, 4 TicketArticleCommunicateEmailJob, 7 TicketOnlineNotificationSeenJob, 12 TicketUserTicketCounterJob, 19 UserDeviceLogJob)                                                                                             
Jan 17 14:10:01 zammad log-delayed-jobs: 36 (10 SearchIndexJob, 3 TicketArticleCommunicateEmailJob, 2 TicketOnlineNotificationSeenJob, 5 TicketUserTicketCounterJob, 4 UserDeviceLogJob)                                                                                                 
Jan 17 14:20:02 zammad log-delayed-jobs: 11 (4 SearchIndexJob, 2 TicketArticleCommunicateEmailJob, 1 TicketOnlineNotificationSeenJob, 2 TicketUserTicketCounterJob, 1 UserDeviceLogJob)                                                                                                  
Jan 17 14:40:01 zammad log-delayed-jobs: 3 (1 TicketUserTicketCounterJob)
Jan 17 15:15:01 zammad log-delayed-jobs: 10 (10 SearchIndexJob)
Jan 17 15:40:01 zammad log-delayed-jobs: 4 (1 TicketUserTicketCounterJob)

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.