Intermittent IMAP Ticket Loss & ConnectionPool Timeout in Split Architecture

Infos:

  • Used Zammad version: 7.0.0-1774512167.a40c879d.noble
  • Used Zammad installation type: package (via apt/packager.io)
  • Operating system: Ubuntu 24.04.3 LTS
  • Browser + version: Chrome/Firefox Latest

Expected behavior:

Zammad should successfully fetch all incoming unread emails from our Microsoft 365 (Office 365) via IMAP, creating tickets seamlessly without dropping connections or permanently skipping unread emails.

Actual behavior:

We are facing intermittent ticket loss. Incoming emails arrive successfully in the Office 365 Outlook inbox, but some are totally ignored by Zammad and no incoming ticket is created.

We discovered through our production.log that our background thread_client workers were frequently crashing with this underlying cache starvation trace:

E, ERROR – : thread_client 1326880 exited with error #<ConnectionPool::TimeoutError: Waited 5 sec, 0/5 available>

/opt/zammad/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.5.3/lib/connection_pool/timed_stack.rb

/opt/zammad/vendor/bundle/ruby/3.2.0/gems/activesupport-7.2.2.2/lib/active_support/cache/mem_cache_store.rb:188:in `block in read_serialized_entry’

Because the background worker gets stuck waiting >5 seconds for that 0/5 Memcache pool to clear, the IMAP polling cycle stalls completely. Microsoft 365 then aggressively severs the idle IMAP connection mid-operation, resulting in this secondary cascade error:

E, ERROR – : Can’t use Channel::Driver::Imap: #<Timeout::Error: execution expired>

app/models/channel/driver/imap.rb:155:in `setup_connection’

Additional Observation: We confirmed the ticket loss is directly tied to the timeout delay causing a race condition with the “Unread” flag. If an email is skipped during a timeout, and a human views it in Outlook natively (marking it Read), Zammad ignores it permanently. If we manually go into Outlook and mark that skipped email as “Unread” again, Zammad instantly fetches it on the next polling cycle and creates the ticket.

Steps to reproduce the behavior:

Our architecture cleanly separates the App server from the Datastores:

  1. Zammad App Instance runs on Ubuntu 24.04.3.
  2. Postgres, Redis (7.4.7), and Memcached (1.6.40) run in Docker containers on a seperate instance which runs on Ubuntu 24.04.3
  3. Zammad connects to Microsoft Office 365 via standard IMAP.
  4. During spikes, the background workers frequently bottleneck waiting for a Memcache connection from the default 5 pool, causing IMAP to timeout.
  5. Additional Note on Mitigation Attempt: We attempted to fix the root bottleneck by manually editing /opt/zammad/config/application.rb to set pool_size: 50. However, even after restarting the zammad systemctl services, the background workers appear to ignore the file modification entirely, continuing to hard-crash with the exact 0/5 available trace. Is there a natively supported CLI command (zammad run config set...) or specific environment variable required to successfully inject a larger Memcache limit into the sidekiq/background workers on package installs?

Just to understand you better, are you saying that

  • the fetching process marks mails as read, before they are added to the databases and thus tickets are not created and “lost” (rather below your radar) or
  • users marking mails as read is causing you to miss mails…?

Most important question apart from above question: Why the heck are other people reading in a mailbox of your ticketing system? Nobody should put their nose their. The documentation warns about this thing, it’s super error prone and an user error (technically…)

1 Like

the fetching process marks mails as read, before they are added to the databases and thus tickets are not created and “lost” (rather below your radar)

Yes, we are seeing exactly this. Intermittently, the fetching process marks the emails as read in Outlook, but the ticket is never actually created in Zammad. We strongly suspect that the errors below (from /var/log/zammad/production.log) are severing the ticket creation process mid-stream:

“E, ERROR – : thread_client 1326880 exited with error #<ConnectionPool::TimeoutError: Waited 5 sec, 0/5 available>”

“E, ERROR – : Can’t use Channel::Driver::Imap: #<Timeout::Error: execution expired>”

users marking mails as read is causing you to miss mails…?

We know that nobody should be putting their nose in the ticketing mailbox natively via Outlook, and we agree the ticket loss itself in this scenario is technically a procedural user error violating the documentation!

However, the only reason our human users ever had the time/opportunity to accidentally open those emails and mark them read was because Zammad’s background workers were totally frozen for 30+ minutes throwing the ConnectionPool::TimeoutError cache starvation trace. If Zammad’s IMAP fetch hadn’t been crashing with timeouts, it would have ingested the emails instantly, leaving the humans zero time to mess it up.

So our core infrastructure question remains: How do we properly configure the Zammad background workers to accept a Memcache pool size larger than 5 on an Ubuntu apt package install, so the IMAP fetchers stop stalling out and crashing during high volume?