Infos:
- Used Zammad version: 3.5.0
- Used Zammad installation source: zammad-helm
- Operating system: (Docker on Container-Optimized OS on k8s 1.16 in GKE)
- Browser + version: (any)
Expected behavior:
- Run a Zammad installation for 20 users with reasonable compute resources.
Actual behavior:
Zammad keeps consuming more and more resources, no matter how many we give it. https://docs.zammad.org/en/latest/appendix/configure-env-vars.html recommends a Ruby command count the number of active sessions and for us its’s nearly 600:
zammad@zammad-0:~$ bin/rails r "p Sessions.list.uniq.count"
I, [2020-11-06T14:07:37.334189 #207-47006180533700] INFO -- : Setting.set('models_searchable', ["Chat::Session", "KnowledgeBase::Answer::Translation", "Organization", "Ticket", "User"])
573
It is absolutely impossible that 570 people are using our Zammad, let alone have it currently opened in their browsers.
We can also see hundreds of “invalid client_id receive!” errors in the logs because of hundreds of bogus (?) /api/v1/message_receive HTTP requests.
10.52.3.2 - - [06/Nov/2020:14:03:30 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:31 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:31 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:31 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:31 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:31 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:31 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:31 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:32 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:32 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:32 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:32 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:32 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:32 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:33 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
10.52.3.2 - - [06/Nov/2020:14:03:33 +0000] "POST /api/v1/message_receive HTTP/1.1" 422 92 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
These tons of sessions then also consume hundreds of Postgres database connections, no matter what we do. If we configure the Rails DB pool, the Zammad scheduler keeps crashing because it needs more connections than the pool allows (currently we have set the pool to 380 connections). We can’t use pgbounce because its isolation level breaks Zammad and keeps it from starting at all.
Please give me hints where to look for the root cause and what to do to run a small-scale Zammad installations without spending thousands of dollars per month for idle database connections. As everyone else is apparently running Zammad with no issues, we must be doing something terribly wrong.