As our Zammad implementation gets larger and larger we ran into the issue that we got delayed jobs which lead to delayed e-mails as discussed in several threads already.
The solution to our issue was the following:
But this did not improve CPU load in our system.
We run 4 cores at 2 GHz each. During working hours CPU usage is on average 1 core. Which is a lot for a system that for the user side is “idling”.
Now what happened is due to a setup error on our part the second Session.Job scheduler did not start up, but generally Zammad was continuing its work normally, only GUI updates for each session where not “live” anymore but only on demand if refreshing the browser.
But what we saw now is a Average CPU usage vo 2%! Instead of 100% (1 Core on Linux).
So essentially the whole session GUI update mechanism is hugging up a lot of CPU resources at all times.
Generally this is no problem as the CPU is not doing anything else, but if multiple agents do searches, add customers, answer tickets etc. and create a workload it can happen that we reach all 4 cores maxed out for 10-30 seconds which makes the whole GUI feel sluggish.
At the time where the session scheduler was accidentally off we did not see any such issues and Zammad was always running perfectly fluent no matter the workload.
Now to the question.
TL:DR: Is it somehow possible, or do you guys know strategies how the session job scheduler can be limited in its CPU usage while “real work” is being done on the server?
I have no problem with the scheduler using the available resources while no other important jobs are queued. But it is not great that the system can slow down due to badly managed resources.
Do you know good strategies or do you have tips?
I thought about increasing the “niceness” of the scheduler job to reduce priority, but not sure if that would help at all.
Yes. @MrGeneration is right. This is a big topic for us as well as we’re hosting a lot of instances including larger ones with many overviews. We have a strong interest in fixing this very soon. Unfortunately we lack of proper solution so far that wouldn’t require spending numbers of resources we currently can’t provide.
I checked our overviews. We have many but they are all relatively simple. We have 30 groups with 2 overviews each. One Open and one Closed overview per group where we only sort by state and group.
In the Closed group we additionally had a check for “spam” tags. I removed that now but could not find any improvement in CPU usage.
We also have one Closed All and All Tickets overview for certain roles. But disabling/enabling that also made no difference.
Thank you for your straight forward answer. Good to know that we are not the only ones seeing this. We host on Debian Linux, so maybe there is a workaround option in the Linux resource manager that could be “abused” here. Do you have some ideas here? I mentioned the process “niceness” which could be used. Do you have any experience with that in regards to Zammad?
Unfortunately not. The only thing we found out so far is that it helps a bit to avoid “not in” queries. So “Status - is not - closed” in overview conditions should be avoided. You can just invert them.
We tried to eliminate “large overviews” which many agents have access to.
Specifically the overviews all tickets and all closed tickets were available to everyone.
In a short time we surpassed 5k tickets there which made that overview mostly pointless but we forgot to remove them.
At the same time this overviews seemed to put a high load on the host PC.
When disabling them average CPU usage was nearly halved and system stability has improved so far.