Performance troubleshooting 5.2.1

Re-setting the WEB_CONCURRENCY and ZAMMAD_SESSION_JOBS_CONCURRENT to the values you’ve previously set would probably help you a lot already and might be the solution to most of the sluggishness. The values of 8 sounds reasonable to me based on your hardware specs and my own trial & error experiences (tune a setting, monitor performance, tune a setting, monitor performance… etc).

For comparison, here are some stats about a few or our setups. Other than the concurrency settings, you might also want to look into PostgreSQL performance tuning, this will help in mitigating deadlock situations as well as increasing base performance. We run our VM’s on speedy SSD’s, so disk I/O is not a huge problem, you will run into performance issues if all your data lives on spinning rust.

The following contains a few examples of our Zammad instance setups, it will probably not fit your specific situation. Do not blindly copy tuning parameters without knowing what effect they will have

For your deadlock problems, you could look into PostgreSQL’s synchronous_commit tunable, but first READ and UNDERSTAND what this does and why it could be dangerous for you.

All our Production instances are on Ubuntu 20.04 and PostgreSQL 12, installed via package, we have recently upgraded from 4.x to 5.0 and we are currently running 5.2.1.

All of these systems have been rebooted just a few hours ago due to kernel upgrades, so memory usage is very low due to caches not being warmed up etc.

System 1, this instance is pretty active, we recently added some more memory, but this is not (yet) reflected in the tuning parameters:

root@helpdesk:~# diff /etc/postgresql/12/main/postgresql.conf /etc/postgresql/12/main/postgresql.conf-20210802 
64,65c64
< # 20210802, DVZ: Raise connectionlimit
< max_connections = 200                 # (change requires restart)
---
> max_connections = 100                 # (change requires restart)
122,123c121
< # 20210802, DVZ: Increase to allow for more indices to be cached etc
< shared_buffers = 512MB                        # min 128kB
---
> shared_buffers = 128MB                        # min 128kB
132,133c130
< # 20211009, DVZ: Allow for more in-memory results
< work_mem = 8MB                                # min 64kB
---
> #work_mem = 4MB                               # min 64kB
201,202c198
< # 20210802, DVZ: Decrease transaction completion time
< synchronous_commit = off              # synchronization level;
---
> #synchronous_commit = on              # synchronization level;
495,496c491
< # 20210802, DVZ: Log slow queries
< log_min_duration_statement = 50               # -1 is disabled, 0 logs all statements
---
> #log_min_duration_statement = -1      # -1 is disabled, 0 logs all statements
538,540c533,534
< # 20210802, DVZ: Log waits, DDL changes and tempfiles
< log_lock_waits = on                   # log lock waits >= deadlock_timeout
< log_statement = 'ddl'                 # none, ddl, mod, all
---
> #log_lock_waits = off                 # log lock waits >= deadlock_timeout
> #log_statement = 'none'                       # none, ddl, mod, all
542c536
< log_temp_files = 0                    # log temporary files equal or larger
---
> #log_temp_files = -1                  # log temporary files equal or larger
root@helpdesk:~# grep -c ^processor /proc/cpuinfo 
4
root@helpdesk:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          16009        4705        8944         164        2359       10813
Swap:          3903           0        3903
root@helpdesk:~# zammad config:get WEB_CONCURRENCY
5
root@helpdesk:~# zammad config:get ZAMMAD_SESSION_JOBS_CONCURRENT
8

System 2, this instance is barely used anymore. Due to the low volume of tickets, we used to analyze queries, hence the query logging and auto_explain module being active (which is usually a bad idea in PROD):

root@helpdesk:~# diff /etc/postgresql/12/main/postgresql.conf /etc/postgresql/12/main/postgresql.conf-20210730 
64,65c64
< # 20210730, DVZ: Raise connectionlimit
< max_connections = 200                 # (change requires restart)
---
> max_connections = 100                 # (change requires restart)
122,123c121
< # 20210802, DVZ: Increase to allow for more indices to be cached etc
< shared_buffers = 512MB                        # min 128kB
---
> shared_buffers = 128MB                        # min 128kB
132,133c130
< # 20211009, DVZ: Allow for more in-memory results
< work_mem = 8MB                                # min 64kB
---
> #work_mem = 4MB                               # min 64kB
201,202c198
< # 20210802, DVZ: Decrease transaction completion time
< synchronous_commit = off              # synchronization level;
---
> #synchronous_commit = on              # synchronization level;
495,496c491
< # 20210730, DVZ: Log slow queries
< log_min_duration_statement = 50               # -1 is disabled, 0 logs all statements
---
> #log_min_duration_statement = -1      # -1 is disabled, 0 logs all statements
538,540c533,534
< # 20210730, DVZ: Log waits, DDL changes and tempfiles
< log_lock_waits = on                   # log lock waits >= deadlock_timeout
< log_statement = 'ddl'                 # none, ddl, mod, all
---
> #log_lock_waits = off                 # log lock waits >= deadlock_timeout
> #log_statement = 'none'                       # none, ddl, mod, all
542c536
< log_temp_files = 0                    # log temporary files equal or larger
---
> #log_temp_files = -1                  # log temporary files equal or larger
684,685c678
< # 20210731, DVZ: Show queryplans of slow queries
< session_preload_libraries = 'auto_explain'
---
> #session_preload_libraries = ''
758,761d750
< 
< # 20210731, DVZ: Explain ALL the things!
< auto_explain.log_min_duration = '100ms'
< auto_explain.log_analyze = true
root@helpdesk:~# grep -c ^processor /proc/cpuinfo 
2
root@helpdesk:~# free -m
              total        used        free      shared  buff/cache   available
Mem:           7957        4642        2229         168        1085        2882
Swap:          3903           0        3903
root@helpdesk:~# zammad config:get WEB_CONCURRENCY
5
root@helpdesk:~# zammad config:get ZAMMAD_SESSION_JOBS_CONCURRENT
4

System 3, one of our instances which used to be very active and under constant high load. We can probably scale this system down quite a bit.

root@helpdesk:~# diff /etc/postgresql/12/main/postgresql.conf /etc/postgresql/12/main/postgresql.conf-20210730 
64,65c64
< # 20210802, DVZ: RDO#95573 Increase connectionlimit
< max_connections = 300                 # (change requires restart)
---
> max_connections = 100                 # (change requires restart)
122,123c121
< # 20210802, DVZ: RDO#95573 Increase to allow for more indices to be cached etc
< shared_buffers = 4096MB                       # min 128kB
---
> shared_buffers = 128MB                        # min 128kB
132,136c130,132
< # 20211009, DVZ: Allow for more in-memory results
< work_mem = 8MB                                # min 64kB
< # 20210824, DVZ: Advice from Nibble IT
< maintenance_work_mem = 265MB          # min 1MB
< autovacuum_work_mem = 64MB            # min 1MB, or -1 to use maintenance_work_mem
---
> #work_mem = 4MB                               # min 64kB
> #maintenance_work_mem = 64MB          # min 1MB
> #autovacuum_work_mem = -1             # min 1MB, or -1 to use maintenance_work_mem
202,203c198
< # 20210802, DVZ: RDO#95573 Decrease transaction completion time
< synchronous_commit = off              # synchronization level;
---
> #synchronous_commit = on              # synchronization level;
392,393c387
< # 20210805, DVZ: RDO#95573 Give PostgreSQL an estimate on available memory for caching
< effective_cache_size = 8GB
---
> #effective_cache_size = 4GB
497,498c491
< # 20210730, DVZ: RDO#95573 Log (very) slow queries
< log_min_duration_statement = 200      # -1 is disabled, 0 logs all statements
---
> #log_min_duration_statement = -1      # -1 is disabled, 0 logs all statements
513,515c506,508
< log_checkpoints = on
< log_connections = on
< log_disconnections = on
---
> #log_checkpoints = off
> #log_connections = off
> #log_disconnections = off
540,542c533,534
< # 20210730, DVZ: RDO#95573 Log waits, DDL changes and tempfiles
< log_lock_waits = on                   # log lock waits >= deadlock_timeout
< log_statement = 'ddl'                 # none, ddl, mod, all
---
> #log_lock_waits = off                 # log lock waits >= deadlock_timeout
> #log_statement = 'none'                       # none, ddl, mod, all
544c536
< log_temp_files = 0                    # log temporary files equal or larger
---
> #log_temp_files = -1                  # log temporary files equal or larger
586c578
< log_autovacuum_min_duration = 0               # -1 disables, 0 logs all actions and
---
> #log_autovacuum_min_duration = -1     # -1 disables, 0 logs all actions and
root@helpdesk:~# grep -c ^process /proc/cpuinfo 
16
root@helpdesk:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          42142        8662       31642         467        1836       32510
Swap:          7999           0        7999
root@helpdesk:~# zammad config:get WEB_CONCURRENCY
10
root@helpdesk:~# zammad config:get ZAMMAD_SESSION_JOBS_CONCURRENT
13
2 Likes