Performance Issues with 50 - 60 concurrent users

Infos:

  • Used Zammad version: 3.3.0
  • Used Zammad installation source: source inside a self built docker image
  • Operating system: ubuntu 20.04 (but zammad is running in docker)
  • Browser + version: -

Expected behavior:

  • Agents can work simultaneously without problems.

Actual behavior:

  • When a certain amount of agents are connected to zammad, zammad hangs trying to open tickets, send tickets, look up tickets etc.

Hi everyone,

we are currently running a modified version of the following repo:

We modified it because we are using AWS RDS Postgres as our database for zammad, hence we needed to make a few adjustments to the images inside the above mentioned repository.

So our setup looks like this:

  • RDS Postgres (db.t3.medium -> 1 core 4GB RAM)
  • EC2 Instance (c5.4xlarge -> 16 cores 32 GB Ram)
  • AWS LoadBalancer which forwards requests on our domain to the nginx container on the EC2 instance

On the EC2 Instance we have the zammad-railsserver, zammad-scheduler, nginx, elasticsearch, memcached and the zammad-websocket-server all in a separate docker container.

Moving the EC2 instance up from 8 cores to 16 cores did not help with our problem that zammad keeps hanging occasionally when a certain threshold of simultaneously connected agents is reached. The EC2 instance does not seem to be sweating all that much. The RAM usage is constantly rising though, so we occasionally have to restart the zammad-scheduler. The RDS instance is also not really overused. CPU Usage is at most at 20%.

What I did notice on the ec2-instance is that the puma process is usually above 100% CPU usage, when zammad keeps hanging.

We currently have 330K tickets in the system. We have six email accounts connected and over 50 filters. We also have over 50 groups and a couple of overviews. We are also using the ldap integration.

From what I understand it’s currently not possible to run several instances of the zammad railsserver is this correct?

Do you guys have any suggestions how we can debug this issue further?

1 Like

Sorry but I can’t provide any help for self created containers and stuff.
However, as you seem still to be using a vanilla version of Zammad, the following page may help you:

https://docs.zammad.org/en/latest/appendix/configure-env-vars.html#performance-tuning

What you want is that your puma does not permanently stick at 100%. It’s perfectly normal if it does from time to time, that is bound to shipping information to your user.
WEB_CONCURRENCY and ZAMMAD_SESSION_JOBS_CONCURRENT should be able to solve your issues if you have enough cpu power which I currently expect.

The first option is for the number of web processes Zammad spawns. If you have 50 or more concurrent users, you’ll typically may need somewhat between 4 and 12 processes. The second environment variable allows you to spawn more than one session worker. This is required if Zammad no longer can keep up with updating your overviews for example.

Above is just a very rough idea.
It’s nearly impossible to provide a good working set for every system, as every system behaves differently. (not just in power terms)

And yes, it’s correct that Zammad doesn’t allow several railsservers to run. With above options this shouldn’t be required in terms of performance fiddling.

Hello MrGeneration,

thank you for the reply.
We did in fact play around with the “WEB_CONCURRENCY” variable and are currently at 16. Personally I didn’t notice any difference between 8 and 16 but since we are expecting even more users on the ticket system we’re leaving it at that.

As for “ZAMMAD_SESSION_JOBS_CONCURRENT”… we did find out about it, but are kind of clueless as to what kind of value we should assign to it.
Can you recommend something here?

Another huge part of the “performance issues” in our case was this bug: https://github.com/zammad/zammad/issues/3087 btw.

All the best.

I generally can’t, because if I do there’s someone else with a way smaller system doing the same stuff and then wonders why everything is on fire.

You could start with 4 - please however note that this may interfere with 16 puma workers.
Technically you did assign 16 processes to 16 cores which in theory if spiking to 100% core usuage may slow down other proccesses.

It’s a tuning thing. If your scheduler doesn’t stick at 100%, you’re fine. If it does (and it does permanently), you want to raise the session job concurrent counter up.