Lost Network Connection and CPU Usage

cabillman · August 14, 2019, 6:34pm

Infos:

Used Zammad version: zammad-3.0.0-15
Used Zammad installation source: docker image / kubernetes
Operating system: Various based on docker image Docker
Browser + version: Firefox 68

Expected behavior:

CPU usage would reflect our current agent usage.

Actual behavior:

During off hours with no agents CPU usage does not recover until kubernetes pods are killed.

Steps to reproduce the behavior:

Unknown

On several occasions we started seeing “Lost network connection” in the web ui. The errors lined up with high CPU on the scheduler and websocket pods. Looking at the container cpu usage showed some interesting patterns. The scheduler pod is orange and websockets green.

Over the weekend and evenings we have no agents using the system. You can see in that graph CPU picks up on Monday but never drops back down much below 60% even in the off hours. On Tuesday CPU usage hits almost 100% and never drops back down until I kill the pods at 7am Wednesday. After the pod restart the websocket pod never spikes above 50%.

I can increase cpu limits for the containers but it seemed odd that usage never drops back down even in off hours.

Any ideas how we can track down the root cause? I can open a github issue but I figured more specifics would be helpful.

Thanks!

MrGeneration · August 20, 2019, 9:50am

Please note that we only can support the Zammad application itself.
When you’re referecing to “docker image”, I suspect you’re working with docker-compose.

(the single container image is for testing only and will loose all created data upon restarting)

Please also provide the following hardware specs:

used CPU type (including clock speed)
number of cores
RAM available for Zammad
storage used of the host (ssd, hdd, sas,…)

How many agents are working concurrent on your installation?

Also run the following commands within your Zammad container:

rails r 'p Delayed::Job.count'
rails r 'p Overview.count'
rails r 'p Ticket.count'
rails r 'p User.count'
rails r 'p Trigger.where(active:true).count'
rails r 'p Scheduler.where(:active:true).count'

Possible reasons for high CPU usuage might be broken background jobs that are still trying to run.
Overviews can also be a CPU hungry part on Zammad.

If you receive buttloads of emails during night, this might also be a reason why Zammad takes so much CPU time.

system · December 18, 2019, 9:50am

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.