Bad performance and search results are limited

Infos:

Important:
If you are a Zammad Support or hosted customer and experience a technical issue, please refer to: support@zammad.com using your zammad-hostname / or company contract.

  • Used Zammad version: 2.6
  • Used Zammad installation source: (source, package, …) source
  • Operating system: OpenSuSe 42.3
  • Browser + version: Firefox 62.03 Chrome 69.0
  • ElasticSearch Version: 5.6.12

Our Zammad system still has a bad performance. Sometimes you get 500 errors from the webserver or when you call a ticket or reload the page. After approx. 5-10 minutes whole has calmed down again. Usually the problem occurs when 2-5 agents work on the system at the same time.

What does this process do? Job BackgroundJobSearchIndex
This process runs every second according to log.

And the search doesn’t bring much results e.g. if I search for a ticket number(#2018100904461) no result comes and the result field remains empty.

Expected behavior:

Actual behavior:

Steps to reproduce the behavior:

What hardware specifications does your Zammad-System have?
Sounds like your Zammad is not able to keep up with index jobs and stuff. I bett you got a ton of delayedjobs in your instance.

Zammad runs on a VM on an ESXi system with 12 CPU und 8GB RAM. How can I display the delayedjobs?

What you’re searching for is:

zammad run rails r 'p Delayed::Job.count'

the result is 0 …

Hello,

this morning there are 866 delayed jobs

That’s really odd.

Is the indexing task from Zammad to Elasticsearch.
Is your elasticsearch instance reachable by Zammad and working?

Sounds like your indexing is having some sort of trouble.
Please provide production log. Even 500-Errors should be found there.

Elasticsearch is reachable. We have recently received 500 error messages in the browser that the default gateway is not reachable. Unfortunately the log files are very large (190 - 250 MB) to upload.

Maybe it is nginx related, look a the error.log. In my log I saw lines like

2018/10/19 01:21:36 [error] 1845#1845: *161 client intended to send too large body: 4283351 bytes, client: 10.128.0.4, server: 

Which led me to client_max_body_size and

2018/10/19 00:29:10 [error] 1388#1388: *17 upstream sent too big header while reading response header from upstream

Which led me to proxy_buffer_size and proxy_buffering off.

Just to clarify, I’m running elasticsearch on a different instance from Zammad. The errors above generated 500 and 502 errors while rebuilding index.

Hello,

we also had some performance issues in the past so we tried to increase WEB_CONCURRENCY like suggested in some other performance related threads. However this lead to many internal 500 errors on our system.
So we edited WEB_CONCURRENCY back to 1 which removed all the 500 errors we were getting. So a higher value for WEB_CONCURRENCY than 1 might be the reason for the errors you see.

The actual performance fix for us was to move attachment storage from database to filesystem.
You can do that under Admin->Settings->System->Storage and executing the command there in rails console.
After changing only this setting we received a performance increase from ~10 s for a database search to < 1 s for a database search.

Maybe this helps you as well, if you have not done that already.

Br,
Nino

there could be a few different things going on with the 500 error after increasing web concurrency.
one thing it could be is the postgres max_connections. check your postgres logs for errors related to having no spare connections.

Also turn on the slow query log and see if there are any queries taking more than 2 seconds, you could also be getting deadlock or other timeouts.

In the browser we found out with the Dev Tools which 500 errors we got exactly.

e.g. Firefox cannot connect to the server under wss://zammad.***.de/ws and wss://zammad.***.de:6042.

Please check if your websocket server is running and check the config for the nginx host.
Zammad uses an AJAX fallback if no websocket server can be found, which is much slower.

the websocket is running. these entries can be found in the nginx error log:

2018/10/25 11:58:52 [error] 13011#13011: 1122 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.200.86, server: zammad.**.de, request: “GET /ws HTTP/1.1”, upstream: “http://127.0.0.1:6042/ws”, host: “zammad.***.de”

“Connection refused while connecting to upstream client” tells a different story if I’m correct.
please check the ports using

netstat -tulpen

as root

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.1:9300 0.0.0.0:* LISTEN 488 23782 1338/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 870 1463/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 0 22386 1332/cupsd
tcp 0 0 127.0.0.1:3000 0.0.0.0:* LISTEN 486 19321 1379/127.0.0.1:3000
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 26 20303 1581/postgres
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 0 14314 1662/master
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN 0 19366 1856/0
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 0 9952 1481/nginx -g daemo
tcp 0 0 127.0.0.1:9200 0.0.0.0:* LISTEN 488 23825 1338/java
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 0 9951 1481/nginx -g daemo
tcp 0 0 :::22 :::* LISTEN 0 872 1463/sshd
tcp 0 0 :::5432 :::* LISTEN 26 20304 1581/postgres

Hi @dak91

then nginx and I’am are correct. It is not running / starting.
The default port is 6402 and I can’t find it in your list.

So there is your problem.

#
# this is the nginx config for zammad
#

upstream zammad-railsserver {
    server localhost:3000;
}

upstream zammad-websocket {
    server localhost:6042;
}

server {
    listen 80;

    # replace 'localhost' with your fqdn if you want to use zammad from remote
    server_name zammad.ang.de www.zammad.ang.de t.ang.de www.t.ang.de;

    # redirect https
    return 301 https://$server_name$request_uri;
}


server {
        listen 443 ssl http2;
        server_name zammad.ang.de  www.zammad.ang.de t.ang.de www.t.ang.de;
        ssl_certificate /etc/certs/ssl_ang/multi_ang_de_2018.crt;
        ssl_certificate_key /etc/certs/ssl_ang/multi_ang_de_2018.key;
        add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";
        add_header X-Frame-Options DENY;
        add_header X-Content-Type-Options nosniff;

    root /opt/zammad/public;

    access_log /var/log/nginx/zammad.access.log noip;
    #error_log  /var/log/nginx/zammad.error.log;

    client_max_body_size 50M;

    location ~ ^/(assets/|robots.txt|humans.txt|favicon.ico) {
        expires max;
    }




    location /ws {
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header CLIENT_IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
       proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 86400;
        proxy_pass http://zammad-websocket;
    }
 location / {
        proxy_set_header Host $http_host;
        proxy_set_header CLIENT_IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 300;
        proxy_pass http://zammad-railsserver;
}
        gzip on;
        gzip_types text/plain text/xml text/css image/svg+xml application/javascript application/x-javascript application/json application/xml;
        gzip_proxied any;
    }
}

Your config looks OK. But your WEBSOCKET Server is not running at all. It is not about the nginx config.

Check the output of:

sudo systemctl status zammad-websocket

Yesterday we installed the update 2.7. Since then we have had the problem that no more mails are retrieved or that the activity stream is not updated. Yesterday I noticed that the port 6042 is running again but today after the update the port is not available again although websocket is running.

zammad-websocket.service - LSB: websocket component of zammad
Loaded: loaded (/etc/init.d/zammad-websocket; bad; vendor preset: disabled)
Active: active (exited) since Thu 2018-10-25 16:44:20 CEST; 18h ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 512)

Oct 25 16:44:20 zammad systemd[1]: Starting LSB: websocket component of zammad…
Oct 25 16:44:20 zammad systemd[1]: Started LSB: websocket component of zammad.