Notifications delayed up to 30 Minutes - Delayed_jobs pendels 100-300

Infos:

  • Used Zammad version: 3.2.0-1575624531.d79c85cf.centos7.x86_64
  • Used Zammad installation source: yum
  • Operating system: CentOS
  • Browser + version: Firefox 68.4.2esr / Chrome 79.0.3945.130

Expected behavior:

  • New ticket created or taking action on ticket, causes browser notification and/or mail to fire

Actual behavior:

  • Notifications and/or Mails fires 20-30 minutes delayed

Steps to reproduce the behaviour:

  • Unfortunately I can not provide steps to reproduce the error

All I can say is that the count of Delayed_jobs will build up up to ~100 to ~300 within 30-60min during the work hours (07:00 - 16:00). One of the effects are, that notifications are delayed up to 30 Minutes

Load average looks fine (1.23), the only think is, that script/scheduler.rb start -t will take 107% of the CPU

After

System:

  • Virtualisation on ESXi

  • 4CPU(s), 3980 MHz

  • 16GB RAM

  • if you need more information, please ask

Tried Solutions

I already searched the community hub for solutions:

  • increased the RAM for elastic search


&

  • increased the max_connections and shared_buffer

Nothing really worked.

More Information about the Zammad

###@### ~ # zammad run rails r "p User.joins(roles: :permissions).where(roles: { active: true }, permissions: { name: 'ticket.agent', active: true }).uniq.count"
80
###@### ~ # zammad run rails r "p Sessions.list.count"
34
###@### ~ # zammad run rails r "p User.count"
3370
###@### ~ # zammad run rails r "p Overview.count"
35
###@### ~ # zammad run rails r "p Group.count"
29
###@### ~ # zammad run rails r 'p Delayed::Job.count'
310
###@### ~ # zammad run rails r 'p Delayed::Job.first'
#<Delayed::Backend::ActiveRecord::Job id: 1416605, priority: 0, attempts: 0, handler: "--- !ruby/object:Transaction::BackgroundJob\nitem:\n...", last_error: nil, run_at: "2020-03-16 14:13:53", locked_at: "2020-03-16 14:47:29", failed_at: nil, locked_by: "host:<zammad-url> pid:1060", queue: nil, created_at: "2020-03-16 14:13:53", updated_at: "2020-03-16 14:13:53">
###@### ~ # zammad run rails r 'p Delayed::Job.last'
#<Delayed::Backend::ActiveRecord::Job id: 1416914, priority: 0, attempts: 0, handler: "--- !ruby/object:Transaction::BackgroundJob\nitem:\n...", last_error: nil, run_at: "2020-03-16 14:49:54", locked_at: nil, failed_at: nil, locked_by: nil, queue: nil, created_at: "2020-03-16 14:49:54", updated_at: "2020-03-16 14:49:54">

Any ideas or solutions?

Okay, no one has opened or edited a ticket in the last 20 minutes. Most likely because it’s closing time and I’m the last one here.

The delayed_jobs have dropped to 0.
CPU-Thread of script/scheduler.rb start -t is pending between 40% and 80%

So it is a performance problem of the scheduler, which doesn’t manage to process the requests fast enough?
But where is the bottleneck and where is the set screw?
Is it possible to make the scheduler multi-threaded?

It’s a combination of issues.
You’re having 34 concurrent user sessions in above case and 35 overviews.

Let’s make it easier for some math and let’s say you’re having 30 concurrent users each having 10 overviews they can see. This causes Zammads scheduler to re-calculate 300 overviews (30 * 10) on each run. This is, because we need to handle those things seperate.

First of all, especially becausen you’re running a 3.2 and thus missing important security updates:
Please update Zammad to a current 3.3 (don’t forget to rebuild your searchindex).

Also, ensure that you only provide overviews you really need.
A overview you most likely won’t need is a overview like show me all closed tickets.
Ensure your overviews do not provide more than 2100 objects (we do limit outputs anyway).

Best practise: Reduce overviews and try not to show duplicate ticket information in several overviews.
Try to stay below 1000 tickets per overview if you can and only use overviews for the user you really need. As Zammads overviews handle group permissions automatically, this might also help reducing them.

Other than that, performance topics are quite complicated and very often a configuration issue.
This can have various reasons. You might also want to ensure that your database server allows at least 200 (better 500) concurrent connections.

First of all, thank you for the quick and detailed answer and your great work you do here.

Nevertheless, I have to admit that this is the first time that Zammad disappoints me here :frowning:

It’s exactly this possibility to make the views so individual and granular that made it so pleasant to work with.
We have about 20 ticket groups, which now all have to fit into two or three overviews (at least maybe filtered by status), making it very unattractive and confusing for some agents.

I will try to update to 3.3. as soon as possible and develop a new plan for the overviews with my superiors.

Is there any plans to separate the mail dispatch and the overviews and use 2 separate schedulers for this in the future?

As soon as I can verify that it will be more performant with less overviews, I will mark the last post as a solution.

I’m sorry, I can’t provide any qualified answer or forecast on development and specific changes to e.g. scheduler etc.

We’re working hard on improving Zammads performance, this will bring changes to various places, the scope however is a blackbox to me. We both have to be patient here to see how this envolves.

Also, I don’t mean to say “reduce all overviews to one”, but meant it more like "try to reduce your overviews to a usefull minimum if possible (in case it’s really the overviews drawing your time).

Many overviews I’ve seen on systems with performance issues are either “useless” or so big that they simply bring no real benefit. But I think I’ve covered that in my answer above already. It always depends on what your agents are allowed to see. If everyone can see all 20 groups, I totally understand you need (and want) to devide that in several groups - if you only have 1-3 groups per agent, it mights be good enough to have simplified overviews.

But I don’t know your use cases good enough, so you might have a very very good reason why you have many overviews!