System performance very limited without obvious reason

Adam · September 27, 2021, 8:27pm

Infos:

Used Zammad version: 4.1
Used Zammad installation type: (source, package, docker-compose, …) Package
Operating system: Ubuntu 20
Browser + version: All

Expected behavior:

Our system, although far from being heavily utilized, is extremely slow. When we started on a fresh install performance was fantastic, and a ticket would load immediately.
Now any ticket would take 3 to 10 sec to load, and even logging in takes time after login / pwd are entered.
Even when connecting for an SSH session the prompt would take 10 to 15 second.
The system is obviously overload, but why is the question… we suspect Elasticsearch…
We are running a VM with 16Gb of RAM and 16 Core, on a iSCSI drive over a 10gb/s dedicated link. Our system has 5 to 10 simultaneous users (agents+customers).

See how bad it is

Actual behavior:

Very slow system.

Steps to reproduce the behavior:

We tried to install a fresh OS and restored a Zammad backup, and end-up in the same state.

dvanzuijlekom · September 28, 2021, 5:18pm

That seems like a beefy VM, but does it actually get all the CPU cycles from the host OS, or is it capped? It might also be slow due to CPU over-commitment on the hypervisor or combined with a very busy virtualization platform.

Another reason for feeling sluggish could be due to high I/O wait (slow storage), which isn’t visible on your screenshot. Maybe post the results of top or check the output of iotop for possible storage latency/bandwidth issues?

Adam · September 28, 2021, 9:26pm

Thanks for taking the time to share your thoughts.
No the VM isn’t capped - on the contrary the physical machine has lots of unused CPU/RAM and it not limiting VMs.

top/iotop look like this:

Can you see like me that Elasticsearch and postgre are kind of taking too much room on the memory?
I was also thinking about perhaps the system buffering more than needed (rely too much on the HDD buffer file as opposed to using actual RAM)

Again thanks for your help,
A.

Adam · September 28, 2021, 9:32pm

Maybe this can help as well (Disk latency viewed from the hypervisor):

dvanzuijlekom · September 29, 2021, 8:01am

The top capture shows a system which is barely under load (but this is only a snapshot). The disk latency graph shows the problem (if I’m reading it correctly). The frequent ~20ms delays, with spikes to 40ms and up to 60ms would be very noticeable on the system and indicate storage performance issues. You should probably focus on that.

Adam · September 30, 2021, 8:50pm

Thanks, I am looking into this on our NAS.
I wonder why Zammad is so sensitive to latency (our other services are running fine on the same setup) - is it because of Elasticsearch needing constant/frequent access to HDD?

A.

dvanzuijlekom · September 30, 2021, 10:45pm

Your performance may be poor if you haven’t tuned Zammad out of the box. This can become more apparent after you’ve hit a certain number of tickets in your setup, or simultaneous agents working on those tickets. Your really should look into performance tuning and start by gathering metrics on CPU usage, load, disk I/O, network traffic, etc. The basics. Monitoring the database might be a good start, as it is constantly being used on a busy system. If it is improperly tuned, most of that I/O might hit the disks instead of RAM, killing performance, unless you have a very high performance storage backend.

Adam · October 1, 2021, 3:49pm

Thanks Dennis. We have ordered an MVNe PCI card to be used as buffer on the NAS. We’ll see how this affects overall perf.

To be honest my “naive” approach was that, because we only have 3/4 concurrent access, tuning wasn’t critical. But I’ll follow your advice and look into this after the NAS has been upgraded.

A.

Adam · October 6, 2021, 9:51pm

Quick update, and hopefully some insight for people facing similar behavior:

The ssd cache did help a little, but I was still seing high latency that translated into slow GUI.
So I looked into vSphere documentation and realize the SCSI controller I was using for the Zammad VM wasn’t perhaps the best one → Changed to ‘VMware Paravirtual SCSI’, and the latency totally went down (less than 10ms)…

It could have been the end of if, but unfortunately the overall system is still very slow.
Here is my config: 16 Cores, 16 Gb of RAM, 500 Gb HDD

One of the most frustrating behavior is logging in the system, and having to wait 10s in front of a white page, before the Zammad GUI would show…

Any suggestion would be more than Welcome

A.

dvanzuijlekom · October 6, 2021, 11:24pm

What I’ve said before and what you seem to be ignoring: start by monitoring the basics. Start by graphing performance metrics over time, inside the OS as well as at the hypervisor level. I’m guessing your VM packs a bit of a punch, but the software just isn’t using all that power because it isn’t tuned appropriately. Adding RAM does nothing if you aren’t configuring your applications to actually use it. Giving the VM a lot of CPU’s does nothing if the applications aren’t instructed to start more worker threads to distribute the load. Performance metrics will probably point to a possible problem area to focus on, allowing for some quick wins.

What is the output of this database query?
SELECT * FROM pg_settings WHERE source != 'default';
What is the output of this shell command (run while the system is feeling sluggish):
zammad run rails r 'p Delayed::Job.count'
What is the output of these shell commands?
zammad config:get WEB_CONCURRENCY
zammad config:get ZAMMAD_SESSION_JOBS_CONCURRENT

Adam · October 7, 2021, 7:32pm

I have the same impression: the power isn’t used by the VM.
Please see bellow the output of the database:

name	setting	unit
application_name	pgAdmin 4 - CONN:4532221	NULL
bytea_output	hex	NULL
client_encoding	UNICODE	NULL
client_min_messages	notice	NULL
cluster_name	12/main	NULL
data_checksums	off	NULL
DateStyle	ISO, MDY	NULL
default_text_search_config	pg_catalog.english	NULL
dynamic_shared_memory_type	posix	NULL
lc_collate	en_US.UTF-8	NULL
lc_ctype	en_US.UTF-8	NULL
lc_messages	en_US.UTF-8	NULL
lc_monetary	en_US.UTF-8	NULL
lc_numeric	en_US.UTF-8	NULL
lc_time	en_US.UTF-8	NULL
listen_addresses	*	NULL
log_line_prefix	%m [%p] %q%u@%d	NULL
log_timezone	Etc/UTC	NULL
max_connections	2000	NULL
max_stack_depth	5120	kB
max_wal_size	1024	MB
min_wal_size	80	MB
port	5432	NULL
server_encoding	UTF8	NULL
shared_buffers	262144	8kB
ssl	on	NULL
ssl_cert_file	/etc/ssl/certs/ssl-cert-snakeoil.pem	NULL
ssl_key_file	/etc/ssl/private/ssl-cert-snakeoil.key	NULL
temp_buffers	32768	8kB
TimeZone	Etc/UTC	NULL
transaction_deferrable	off	NULL
transaction_isolation	read committed	NULL
transaction_read_only	off	NULL
wal_buffers	2048	8kB
wal_segment_size	16777216	B
work_mem	10240	kB

run rails r ‘p Delayed::Job.count’ → returned 0
zammad config:get WEB_CONCURRENCY → returned nothing
zammad config:get ZAMMAD_SESSION_JOBS_CONCURRENT → returned nothing

Thanks for your support,
A.

Adam · October 13, 2021, 1:51pm

Justin case someone has an idea - I see Elasticsearch crashing quite often as well…

● elasticsearch.service - Elasticsearch
Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Wed 2021-10-13 00:01:47 CEST; 15h ago
Docs: https://www.elastic.co
Process: 1501 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=143)
Main PID: 1501 (code=exited, status=143)
Oct 13 00:00:32 support systemd[1]: Starting Elasticsearch…
Oct 13 00:01:47 support systemd[1]: elasticsearch.service: start operation timed out. Terminating.
Oct 13 00:01:47 support systemd[1]: elasticsearch.service: Failed with result ‘timeout’.
Oct 13 00:01:47 support systemd[1]: Failed to start Elasticsearch.

MrGeneration · October 28, 2021, 12:47am

You may want to have a look into the tuning section:
https://docs.zammad.org/en/latest/appendix/configure-env-vars.html

And while you’re at it, as your configuration looks off compared to our suggestion, also have a look at our database configuration guide:
https://docs.zammad.org/en/latest/appendix/configure-database-server.html

Have a look into your elasticsearch log file.
The reasons for elasticsearch crashing or not starting can be various.

Nobodoy can help you at that point. The Elasticsearch community may be the better sparring partner on that regard.

system · February 25, 2022, 12:47am

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.