Infos:
- Zammad Version 3.6
- Installed from apt repository
- Ubuntu server (One is 18.04.5, the other 20.04.1)
- Firefox 84 (seen on all browsers)
- LDAP & 365 login integration.
- emails from 365, but was IMAP until recently with the same behaviour.
- no other integration enabled. No incoming API use.
Expected behaviour:
- Server works (!)
Actual behaviour:
- After some time web service stops responding. Either I get the page loaded, but no content (i.e. plain white page) - I suspect this is caching in the browser getting me that far. A reload normally gives an Nginx error (bad gateway). Occasionally, if the page is already running I get a ‘could not connect’ message but I think this is the same thing - a reload at that point tends to give the nginx gateway error.
- Sometimes emails stop being collected (not always - maybe 50/50) - I can see the incoming mailbox growing so know they’re not being collected as they are not left in the mailbox once imported.
Steps to reproduce the behaviour:
- On my system (mostly just me using it), it’s always broken first thing in my morning. The user system (15 users, 3 or 4 active at the most) just stops at irregular intervals during the day.
- it is possible that it’s triggered by closing the browser session. People will come and go more often on the user system - I’m logged on all day and just close it in the evening. I think I noticed it stop once when I restarted the browser during the day, but was fixing something else at the time so wasn’t paying attention.
- htop shows no real difference when the system has died - 2.0 - 2.8GB memory used (out of 8GB). CPU use is always low.
Workaround
- Restarting the zammad service from the shell clears the problem, but if I add a regular restart into the crontab, it doesn’t seem to help. I’ve also added a restart for elasticsearch to no effect.
- I found that restarting postgresql will clear the problem, but sometimes leaves odd effects in the browser. Restarting postgres and zammad in the crontab seems to have helped the user system somewhat, but not entirely removed the problem. It didn’t help my system.
I’m not sure which logs I would need to look at, but I can’t find anything out of the ordinary in those I’ve looked through.