Scheduler may not run after update 5.0.3 -> 5.2.1

  • Used Zammad version: 5.2.1
  • Used Zammad installation type: package
  • Operating system: Debian 10

Hi,

we have 4 Zammad systems running, each on a separate V-Server. All have similar hardware and similar usage. Similar configuration etc.

We have updated from version 5.0.3 to version 5.2.1 just last week and since then we have the following error, but only on 2 of the machines. The other 2 don’t have this error.

scheduler may not run (last execution of Stats.generate 20 minutes ago)

or

scheduler may not run (last execution of DataPrivacyTaskJob.perform_now 18 minutes ago)

Each time the error just disappears within 1-2 minutes. So there basically is no issue but since we monitor zammad’s health status, each time the health status≠true, our monitoring alarm rings.

I have checked logs:

  • production.log - no errors
  • scheduler_err.log - empty
  • scheduler_out.log - empty
  • searchindex-rebuild.log - empty
  • elasticsearch.log - empty

Which other logs can I check? Do you have any tips where to look for more details on why the scheduler does not work for around 20 min?

Many thanks in advance.

Gijs

This error just came for the first time:

scheduler may not run (last execution of RecentView.cleanup 1 day ago) - please contact your system administrator","issues

The error disappeared within a minute after appearing.
Could this be an issue with some time sync that is not running correctly?

This is from the production.log
It is not an error or a warning but it might help to find a reason for the scheduler not running or running “late”.

I, [2022-08-02T00:33:47.248853 #89012-1238800]  INFO -- : execute Stats.generate (try_count 0)...
I, [2022-08-02T00:33:47.533274 #89012-1238800]  INFO -- : ended Stats.generate took: 0.292728832 seconds.
I, [2022-08-02T00:33:50.790962 #89012-109280]  INFO -- : 2022-08-02T00:33:50+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [3d46142f-8d3d-4508-850c-dd635ca37925] from DelayedJob(default) with arguments: ["StatsStore", 7] (id=169979) (queue=default) RUNNING
I, [2022-08-02T00:33:51.113387 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [3d46142f-8d3d-4508-850c-dd635ca37925] from DelayedJob(default) with arguments: ["StatsStore", 7] (id=169979) (queue=default) COMPLETED after 0.3223
I, [2022-08-02T00:33:51.116133 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [a1517a5f-f355-48db-895a-f866108b0965] from DelayedJob(default) with arguments: ["StatsStore", 1] (id=169980) (queue=default) RUNNING
I, [2022-08-02T00:33:51.202231 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [a1517a5f-f355-48db-895a-f866108b0965] from DelayedJob(default) with arguments: ["StatsStore", 1] (id=169980) (queue=default) COMPLETED after 0.0860
I, [2022-08-02T00:33:51.204621 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [31af75a1-6a10-4a72-8caf-561f2a0a4ceb] from DelayedJob(default) with arguments: ["StatsStore", 5] (id=169981) (queue=default) RUNNING
I, [2022-08-02T00:33:51.526409 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [31af75a1-6a10-4a72-8caf-561f2a0a4ceb] from DelayedJob(default) with arguments: ["StatsStore", 5] (id=169981) (queue=default) COMPLETED after 0.3217
I, [2022-08-02T00:33:51.529036 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [1fbe5a89-80ee-4e76-86ad-7078886006f9] from DelayedJob(default) with arguments: ["StatsStore", 6] (id=169982) (queue=default) RUNNING
I, [2022-08-02T00:33:51.620575 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [1fbe5a89-80ee-4e76-86ad-7078886006f9] from DelayedJob(default) with arguments: ["StatsStore", 6] (id=169982) (queue=default) COMPLETED after 0.0914
I, [2022-08-02T00:33:51.623795 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [e11bba9c-d628-465a-a7b7-87eeaa4685ea] from DelayedJob(default) with arguments: ["StatsStore", 9] (id=169983) (queue=default) RUNNING
I, [2022-08-02T00:33:51.723300 #89012-109280]  INFO -- : 2022-08-02T00:33:51+0200: [Worker(host:crm.gs-adventure.com pid:89012)] Job SearchIndexAssociationsJob [e11bba9c-d628-465a-a7b7-87eeaa4685ea] from DelayedJob(default) with arguments: ["StatsStore", 9] (id=169983) (queue=default) COMPLETED after 0.0994
I, [2022-08-02T00:33:55.400977 #89012-718820]  INFO -- : execute Channel.fetch (try_count 0)...
I, [2022-08-02T00:33:55.402858 #89012-718820]  INFO -- : ended Channel.fetch took: 0.008868718 seconds.

The error now occurs on all 4 machines. Most of the time, the error only appears 1 or 2 times a day. The error disappears automatically within 60 seconds.

But why does this error eppear in the first place. Please let me know what else I can check?!

I have changed the info level to debug and hereis 1 log file (part of it).

The error startet at 11:08 h.

Logfile:

Some scheduler jobs are expected to run in a specific time frame. If that time frame is left, Zammad will point that out as when fetching the monitoring endpoint (or having a look within the UI). If the task runs in the mean time the issue may self heal - given that the task was run successful.

Usually background-services (was scheduler before) has plenty of head room to meet above mentioned expectation. However, in some load situations it may no longer meet these expectations. This can have various reasons: e.g. you have fairly high concurrent users in the system at that time or have a lot background tasks going on.

Theoretically your process list should show that. If background-services.rb is running sticky at 100% that’s a very good indicator for said situation. If that’s the case you’ll want to begin performance tuning as that’s the only way to overcome this issue technically.

https://docs.zammad.org/en/latest/appendix/configure-env-vars.html#performance-tuning

There’s no general recipe for this task I’m afraid. It’s rather fiddling. Note that Zammad 5.2 actually came with new tuning options additional to the existing ones. You can pull apart background tasks even more.

That being said I do expect that the system has received at least the mandatory database adjustments: Configure Database server — Zammad documentation

1 Like