Scheduler may not run (last execution of OnlineNotification.cleanup about X hours over)

Infos:

Important:
If you are a Zammad Support or hosted customer and experience a technical issue, please refer to: support@zammad.com using your zammad-hostname / or company contract.

  • Used Zammad version: 2.8
  • Used Zammad installation source: (source, package, …) package
  • Operating system: Ubuntu 18
  • Browser + version: Chrome

Expected behavior:

Actual behavior:

Steps to reproduce the behavior:

This problem is something that is not easily produced, but what we are noticing is that when we have a few hundred folks signed into Zammad and creating tickets, Zammad will bog down and the UI will be slow in responsiveness. Checking the “monitoring” menu option, we notice " * scheduler may not run (last execution of OnlineNotification.cleanup about 3 hours over) - please contact your system administrator".

We can use the CLI to initiate processing of these but I would assume it would do it’s on own, i.e. manage its workload without human intervention.

We monitor the hosting system and there are plenty of system resources, i.e. CPU, memory, etc. The system is hosted on 3 AWS EC2 instances, one for Zammad, another for Postgress, and another for Elastic search. All with very large systems, i.e. multicore and lots of memory.

Are there tuning parameters or could you explain how the schedular works and why we see this when we have a good load of users creating tickets on the system?

Sorry but could you please be a bit more specific?

E.g.: Number of cores, dedicated RAM per machine and especially interesting would be type of storage backend (yes, this might be important).

Also, do you have any error messages X hours before the error appears?

Please also run

zammad run rails r 'p Delayed::Job.count'
zammad run rails r 'p Delayed::Job.first
zammad run rails r 'p Delayed::Job.last'

It also would be interesting how many concurrent agents and daily tickets you have (approx).
Also… what is your elasticsearch version?

Okay, here are the further details:

Application Server (Where Zammad is running):
AWS c5.4xlarge
250G gp2

Elasticsearch Full-Text Index Server:
AWS c5.xlarge
1TB io1 10K IOPs

Postgres Database Server:
AWS c5.2xlarge
1TB io1 30K IOPs

This link will give you the AWS instance size options: https://aws.amazon.com/ec2/instance-types/

We did not notice any log errors.

During our testing we had around 150-200 agents testing when we saw the error. Also we also saw the error when creating users through the REST API.

zammad run rails r 'p Delayed::Job.count'
14179
zammad run rails r 'p Delayed::Job.first'
#<Delayed::Backend::ActiveRecord::Job id: 1454980, priority: 0, attempts: 0, handler: "--- !ruby/object:BackgroundJobSearchIndex\nobject: ...", last_error: nil, run_at: "2019-02-15 18:00:43", locked_at: "2019-02-15 18:01:39", failed_at: nil, locked_by: "host:murray.taskeasy.com pid:97541", queue: nil, created_at: "2019-02-15 18:00:43", updated_at: "2019-02-15 18:00:43">
zammad run rails r 'p Delayed::Job.last'
#<Delayed::Backend::ActiveRecord::Job id: 1469158, priority: 0, attempts: 0, handler: "--- !ruby/object:Observer::Ticket::UserTicketCount...", last_error: nil, run_at: "2019-02-19 00:42:15", locked_at: nil, failed_at: nil, locked_by: nil, queue: nil, created_at: "2019-02-19 00:42:15", updated_at: "2019-02-19 00:42:15">

Elastic Search version:

{
  "name" : "eq-1bTU",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "6Jt36sZSRimVFCQp7JvhTg",
  "version" : {
    "number" : "5.6.14",
    "build_hash" : "f310fe9",
    "build_date" : "2018-12-05T21:20:16.416Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}

Hope this helps.

Thanks!!!

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.