Scheduler seems to be broken after updating to 4.0

Infos:

  • Used Zammad version: 4.0
  • Used Zammad installation type: package
  • Operating system: Debian 10 x64
  • Browser + version: Firefox 89

Expected behavior:

  • Zammad fetches emails and sends notifications

Actual behavior:

  • Zammad stops checking mailbox after some time during a day and the corresponding notifications don’t come through to our preferred channel (Slack)

Steps to reproduce the behavior:

  • Upgrade to 4.0

Ever since we upgraded to the latest version (went without any issues), the scheduler has been failing permanently during a day.

scheduler_err.log only has “log writing failed. execution expired” – that’s all there is and been since the update. Googling for this error isn’t fruitful.

There are no errors in production.log, postgres logs are alright, and nothing suspicious in elasticsearch logs. Elastic works fine, Zammad front-end lets me in, although it might occasionally throw 5xx error.

Things run smoothly after restarting zammad service, but only for several hours.

Has anyone encountered this behavior? Any suggestions on how to at least fix the logging for scheduler so I can investigate further?

Thanks

1 Like

Hi!
We encountered the exact same problems and behaviors after upgrading to 4.1
In the beginning everything works fine.
The productions.log shows “scheduler running…” periodically.
Out of the sudden the entry wont show up, Emails wont get fetched and processed.

Since we see no error message we cant reproduce the problem except by waiting for a couple of minutes or hours…

Referring to other post I collected some data

root@intern:/var/log/zammad# zammad run rails c
Loading production environment (Rails 5.2.4.6)
irb(main):001:0> Delayed::Job.count
=> 0
irb(main):002:0> Delayed::Job.first
=> nil
irb(main):003:0> Sessions.list.count
=> 4
irb(main):004:0> User.count
=> 1211
irb(main):005:0> Overview.count
=> 15
irb(main):006:0> Group.count
=> 3
irb(main):007:0> User.joins(roles: :permissions).where(roles: { active: true }, permissions: { name: ‘ticket.agent’, active: true }).uniq.count
=> 12
irb(main):008:0> Delayed::Job.last
=> nil

FYI: I installed an cronjob for an hourly restart of the zammad-worker :frowning:

We set up the same task but every 2 hours - currently it’s the only reliable “fix”.

I made another backup and updated to 4.1. Scheduler is still failing within the first hour and logger is still broken by some timeout issue.

There’s like a dozen topics on this forum and a handful issues on github that describe similar behavior of the scheduler. Some posts date back to 2.x version. None of them have any working solution that could be applied to 4.x.

We have a tiny helpdesk compared to what some of the users have here: 4 agents, 2-10 tickets per day. The server is pretty robust for such setup: 10GB RAM, decent Xeon CPU, 4TB HDD. Elastic automatically starts with 4GB heap size, so that should be enough.

I’d be grateful if someone could help me figure out how to solve the logging problem at least

If possible you may want to consider getting a faster storage backend.
The provided error message only occurs to me on hard core load situations:
Permanent 100% usage for scheduler + too slow I/O.

Edit:
Also having a look at performance tuning section of the documentation could help:
https://docs.zammad.org/en/latest/appendix/configure-env-vars.html

You should try to avoid permanent stickiness on 100% loads of processes.

Getting an SSD for zammad operation is an option we’re looking at now. It’s just strange that everything worked perfectly fine before the upgrade.

Just noticed that scheduler_out.log is being filled with this error:

sh: 1: /usr/sbin/sendmail: not found

However the sendmail exists, is at exact location and does work fine. What could that be?

scheduler_err.log still only contains log writing failed. execution expired.

Sorry no real clue.
Maybe double check if you can execute sendmail as zammad user - because above usually would be a classic for that:
sudo -u zammad /usr/sbin/sendmail

Again: Load situation.
Try to reduce your schedulers load is possible.

Have a look into the performance tuning section to see if you can improve the schedulers life:
https://docs.zammad.org/en/latest/appendix/configure-env-vars.html#performance-tuning

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.