Ruby regularly hits 100% cpu, ui becomes slow, no errors in production.log

  • Used Zammad version: 5.0.x (cannot yet upgrade, unfortunately)
  • Used Zammad installation type: docker
  • Operating system: ubuntu on aws
  • Browser + version: happens both in firefox and ms edge on multiple computers (i.e. all users experience the slowness)
  • amount of users: anywhere between 3 to 10, never more

Expected behavior:

  • run smoothly, as it did for the last year and a half

Actual behavior:

  • the last 3 to 4 months, the slowness has gradually increased until it is now nearly unworkable

Steps to reproduce the behavior:

  • have zammad running for about a year and a half (rebooted once in a while), have a ticket count of 6500, and an average of 5 users (agents) active

I have looked in the logs, and see nothing special.
there are also no delayed jobs, no unprocessable emails, mysql and elasticsearch processes seem fine, and it is not 100% of the time (but still, like 95% now)

when checking apache, it regularly has issues reaching the zammad process (which corresponds to the moments the ruby process is at 100% cpu and slowing everything down)

If anyone would be able to point me in the right direction where/how i can check what zammad is doing that’s causing the process to go nuts i am all ears, please.

the current stack is on docker:

  • zammad 5.0.x
  • apache
  • mysql/mariadb

i need to get this fixed so we can continue working
(and simultaneously work out an upgrade path to zammad 6.x)

Hi @Bjorn. With the upcoming release of Zammad 6.1 there is a good chance that the performance of Zammad itself is pretty well. Without seeing your system, it’s quite impossible to tell what is going on there (e.g. are there any specific requests that are slow and cause the system to be slow etc.).

hi @fliebe92 ,

if there’s anything i can do to check and inform you, please let me know.
from what i can see, there is nothing special going on on the machine,
and nothing else is really running on the same machine anymore (we are moving away from docker on an ec2 instance)

sometimes the scheduler process reaches 100%, but that’s only for a couple of seconds every x time (verified by stopping the scheduler and restarting it, and following the pid)

other processes are using under 5% cpu and under 2% mem - except for elasticsearch,
which is claiming about 54% memory with it’s java process.

Do you have a lot of overviews in the system?

In the end, Zammad 5.0 is very outdated (2021) and there were a lot of improvements in the meanwhile.

I have a total of 13 overview, including my tickets, all open tickets, all tickets, and then 11 more overviews of specific groups.

after your comment, i removed the “all tickets” overview but no change in speed.

upgrading to the new 6.x release is in the works, but i’m not quite sure how to best approach it,
as we would move away from the docker instance, and onto a dedicated ec2 machine using the zammad packages.
and we ofcourse need to have all the 6500+ tickets (and attachments, on the filesystem) moved to there as well,
and from mysql to postgres, and without sending 6000+ emails when moving…

Smells like a custom docker variant you’re running there? If so, it’s generally very hard to understand the issues and scopes. We hardly can support that.

That being said, it sounds like a configuration issue, consider using the environment variables for tuning. With Zammad 5 most of them are not yet available. web-concurrency however should be.

I’ll just leave that here

It is somewhat custom, but i don’t recall why éxactly we went for a custom docker image (though, if i remember correctly, we followed the “install from source” documentation completely for the image.

I have set the web_concurrency to 4 and it seems to have made a (very) small difference, but not enough to be the resolution, unfortunately.

I read somewhere in the forum (?) that several users using (and leaving open) the searchbar search might cause a (significant) slowdown, and as we have been having this since a few months, which roughly coincides with deciding to let an external contact center handle calls for us… i think it might be that which is causing the load?

I’m not aware of such a performance degrading situation to be honest.
A classic performance degrading over time after starting an instance (or restarting it) is the number of max_connections of the database server being insufficient to what Zammad need.

By default postgresql allows 100 connections, however, Zammads vanialla configuration (without performance tweaks) already requires up to 200 connections. This will cause issues with database pooling and introduce you to slow updates etc.

That shouldn’t affect MySQL/MariaDB and thus it’s most likely performance tuning issues. Possibly combined with terribly old code which had a lot of performance improvements over the last years.

Or you need performance tuning because you have too many concurrent agents / too much data to transfer concurrently.