Email notification delay about 5 hours, CPU Usage 100%

Infos:

  • Used Zammad version: 2.3.x
  • Used Zammad installation source: (source, package, …) package
  • Operating system: Cent OS
  • Browser + version:
  • Echange 2007 ldap and smtp configured
  • Health shows no errors

Expected behavior:

Actual behavior:

Zammad proces runs with 100%cpu usage, even after restart

Steps to reproduce the behavior:

  • Our Zammad System runs with an CPU usage between 70% and 100%
    Even after restarting the system.
    I raised the VM to 8GB of RAM and 2 CPU`s, but behavior is still the same.
    I think this will be the reason for out second Problem, our notification Mails are sent but with an delay between 1 and 5 hours.
    For Example:
    Create Ticket via Email at 11.13 am this day
    Ticket appears in zammad at 11.14 am this day
    Normally a notification to the Customer has to be sent that Ticket is created with TicketNr xxxxx
    and notification to Agents has to be sent that a new ticket was created.
    As you can see in the Tickethistory no notification was sent

Zammad log and mail.log in var/log are clean
Are there additional logs which I can check?

Hi @antboll - do you have a high amount of users/tickets/articles/roles/groups in your system?

Please contact me via support@zammad.com and refer to this issue. Would be great if you could have a TeamViewer session.

Hi Thorsten,
Zammad running since November 2017
Tickets created 1200
Users 900 due to a fall ldap config, but corrected
Active users 300 inactive 600
One Addtional Role ITService

Are there any ruby commands to get the needed informations?

Just a short info for everyone facing the same issue before we can fix this:

This is caused by a failed update of the elasticsearch. There are probably a lot of Zammad scheduler tasks that try to update the elasticsearch but can’t due to the incompatibility resulting from the error.

So here is how you can resolve the issue (commands may vary depending on your OS and installation source):

  • Stop the whole Zammad application (e.g. systemctl stop zammad)
  • Delete all failed jobs by running zammad run rails r "Delayed::Job.delete_all("attempts > 1")
  • Stop your elasticsearch (e.g systemctl stop elasticsearch)
  • Delete your old elasticsearch indexes (e.g. rm -rf /var/lib/elasticsearch/nodes/0/indices/*)
  • Install the elasticsearch ingest-attachment plugin (e.g. sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install mapper-attachments)
  • Start your elasticsearch again (e.g. systemctl start elasticsearch)
  • Rebuild your elasticsearch index by running zammad run rake searchindex:rebuild (This may take a while depending on the amount of data in your Zammad)
  • Start the whole Zammad application again (e.g. systemctl stop zammad)

After that your Zammad should be back on track and run smooth as it should :tropical_drink:

We are working on resolving this.

4 Likes

Should work better with: https://github.com/zammad/zammad/commit/15a280b449ce7ec905385c81fe49a63356a4dc2a

If you already updated do:

  • curl -XDELETE ‘127.0.0.1:9200/zammad_production’
  • zammad run rake searchindex:rebuild

Neither of these suggestions helped for us, unfortunately. It appears the search index is never rebuilt correctly - after zammad run rake searchindex:rebuild, I have a delayed job, and when Zammad starts the message spam immediately begins again.

Any workarounds in the meantime? Is it safe to downgrade Zammad? We’re running the latest development git package, which was probably a really bad idea. And what’s the recommended version of Elasticsearch? We have 5.6.5, when I upgraded recently I ran into problems. (Probably also the cause of this issue.)

We have several thousand users from LDAP which is critical for this installation, so if that’s a KO, please tell me so we can switch to another helpdesk.

Correction. It now works again - it fixed itself. It seems that the rebuilding by zammad run rake searchindex:rebuild is not final and it had to reindex all users after starting Zammad. That is now done, apparently, and the system is back to normal operation. (For now, at least.)

So for other people reading this, don’t immediately restart the process if you’re in our situation, wait for a while (depending on the amount of data you have).

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.