Daily timeout (error 504)

Infos:

  • Used Zammad version: 6.4.0-1732538192.197abebd.jammy
  • Used Zammad installation type: Ubuntu package
  • Operating system: Ubuntu 22.04.5 LTS
  • Browser + version: MS Edge 131.0.2903.70

Actual behavior:

  • Once every day we cannot access Zammad but get the error 504 in the browser.
  • Zammad and Nginx services are running.
  • No errors in /var/log/zammad/production.log.
  • Multiple errors in /var/log/nginx/zammad.error.log like:
    [error] 1068#1068: *16004 upstream timed out (110: Unknown error) while reading response header from upstream, client: ..., server: ...., request: "GET /api/v1/ticket_attachment/14633/68968/499923?view=inline HTTP/2.0", upstream: "http://127.0.0.1:3000/api/v1/ticket_attachment/14633/68968/499923?view=inline", host: ..., referrer: ....
  • After a server reboot everything works fine.
1 Like

We are aware of this issue but didn’t find the culprit yet.
A systemctl stop zammad && systemctl start zammad should be more than enough.

See:

1 Like

Thanks MrGeneration. I do appreciate the quick and professional answers I get here from the community.

I installed a cronjob checking the URL and running systemctl stop zammad && systemctl start zammad if the reply code isn’t 200.

Let’s see if that will help for the time being.

1 Like

We’ve also occasionally had the same problem since upgrading to Zammad 6.4.
I can’t see any problems in the logs either, except that nginx logs a timeout from the zammad upstream.

If I can help you find a solution, please let me know (how i can help).

1 Like

We recently ran into the same thing. Our “quite” time is around 3:00 am US Central. Cron job at that time that automatically reboots Zammad everyday has been added to prevent the mid-day meltdowns. For about a week it has been holding its grounds.

Hope solution is found quickly.

We have the same problem since upgrading to version 6.4 in the middle of November.

When the system gifts us with Error 502 or 504 we have around 40-50 active agents/customer-sessions at that time.

How many agents/sessions do you guys have normally in parallel?

PS: hm, I guessed less sessions maybe make a difference but just now it stopped working with 27 sessions

We have around 50. Most of them are inactive though. Uders are logging in with different IP-addresses (home office, LAN, WLAN).

Can I see somewhere how many active sessions I have?

you can check the sessions with:

zammad run rails r “p Sessions.list.uniq.count”

I tried zammad run rails r “p Sessions.list.uniq.count”
but get the error: undefined local variable or method `“p’ for main:Object

That’s because the quotes are fucked up.

zammad run rails r "p Sessions.list.uniq.count"

Code blocks avoid that usually.

We just upgraded to 6.4 last night. I’ve had to restart the Zammad service 4 or 5 times today. It just did it again and I went digging but can’t figure it out.

That’s because the quotes are messed up.
zammad run rails r "p Sessions.list.uniq.count"
Code blocks avoid that usually.

I checked, and we only have 2 active sessions when it occurred.

Same here. only 5 - 6 sessions still timeout / crash.

Already 2 months after updated to 6.4.

Still no one can fix it…

We are working on finding the issue. It’s no easy fix, other wise we would have fixed it already.

Just to keep you updated on this matter,
we still have that problem but the amount of needed restarts went down from several times a day to like two times within the the last two weeks. Last time it happened there were 12 sessions.

I am not sure if that relies to having less system load because of holidays or changes in version.

Last time it happened on january 2nd and I updated to the latest version then.

Zammad Version 6.4.1-1735804650.fec6d7e0.noble

Till now (fingers crossed) no crashes

Since the problem continues to occur intermittently, I’ve set up a cron job and a bash script that checks the website’s status every 2 minutes. If it detects that the website is down, it will execute the Zammad restart command.

I hope this helps anyone who is waiting for the core team to resolve the issue.

@tlycj:Just to be curious, how do you check the website`s status?

under settings → Monitoring → Health Check

check the return json

Thanks tlyc. That’s a feature I wasn’t aware of. It looks quite prospective.

Good news.
Please upgrade to the latest Zammad version, the bug was fixed yesterday.
The bug was marked as fixed and closed in the github repository.

Let’s keep our fingers crossed, thanks for the fix.

Please let us know if it is not.