Daily timeout (error 504)

chris2 · December 6, 2024, 12:46pm

Infos:

Used Zammad version: 6.4.0-1732538192.197abebd.jammy
Used Zammad installation type: Ubuntu package
Operating system: Ubuntu 22.04.5 LTS
Browser + version: MS Edge 131.0.2903.70

Actual behavior:

Once every day we cannot access Zammad but get the error 504 in the browser.
Zammad and Nginx services are running.
No errors in /var/log/zammad/production.log.
Multiple errors in /var/log/nginx/zammad.error.log like:
[error] 1068#1068: *16004 upstream timed out (110: Unknown error) while reading response header from upstream, client: ..., server: ...., request: "GET /api/v1/ticket_attachment/14633/68968/499923?view=inline HTTP/2.0", upstream: "http://127.0.0.1:3000/api/v1/ticket_attachment/14633/68968/499923?view=inline", host: ..., referrer: ....
After a server reboot everything works fine.

MrGeneration · December 6, 2024, 12:56pm

We are aware of this issue but didn’t find the culprit yet.
A systemctl stop zammad && systemctl start zammad should be more than enough.

See:

github.com/zammad/zammad

Occasionally puma hangs up after updating to Zammad 6.4

opened 03:52PM - 05 Dec 24 UTC

Suhopoljac

bug needs verification prioritised by payment background worker

### Used Zammad Version 6.4.1 ### Environment - Installation method: any - Op…erating system (if you're unsure: `cat /etc/os-release` ): any - Database + version: any - Elasticsearch version: any - Browser + version: MS Edge, v. 131.0.2903.70 (Official Build) (64-bit) ### Actual behaviour Sometime after the update to Zammad 6.4 the puma worker hangs up. ### Expected behaviour After updating to Zammad 6.4, I would expect, that the puma does not hang up. ### Steps to reproduce the behaviour Unknown. Happens occasionally after updating to 6.4. The current workaround is to, instead of typing the systemctl restart zammad, to stop and then start via two separate commands: `systemctl stop zammad` `systemctl start zammad` ### Support Ticket Ticket#10167295, Ticket#10165686 ### I'm sure this is a bug and no feature request or a general question. yes

chris2 · December 9, 2024, 5:22pm

Thanks MrGeneration. I do appreciate the quick and professional answers I get here from the community.

I installed a cronjob checking the URL and running systemctl stop zammad && systemctl start zammad if the reply code isn’t 200.

Let’s see if that will help for the time being.

datenschuft · December 10, 2024, 2:40pm

We’ve also occasionally had the same problem since upgrading to Zammad 6.4.
I can’t see any problems in the logs either, except that nginx logs a timeout from the zammad upstream.

If I can help you find a solution, please let me know (how i can help).

support.desk · December 12, 2024, 2:03pm

We recently ran into the same thing. Our “quite” time is around 3:00 am US Central. Cron job at that time that automatically reboots Zammad everyday has been added to prevent the mid-day meltdowns. For about a week it has been holding its grounds.

Hope solution is found quickly.

LSt-TSR · December 17, 2024, 9:09am

We have the same problem since upgrading to version 6.4 in the middle of November.

When the system gifts us with Error 502 or 504 we have around 40-50 active agents/customer-sessions at that time.

How many agents/sessions do you guys have normally in parallel?

PS: hm, I guessed less sessions maybe make a difference but just now it stopped working with 27 sessions

chris2 · December 17, 2024, 10:26am

We have around 50. Most of them are inactive though. Uders are logging in with different IP-addresses (home office, LAN, WLAN).

Can I see somewhere how many active sessions I have?

LSt-TSR · December 17, 2024, 11:59am

you can check the sessions with:

zammad run rails r “p Sessions.list.uniq.count”

chris2 · December 17, 2024, 1:07pm

I tried zammad run rails r “p Sessions.list.uniq.count”
but get the error: undefined local variable or method `“p’ for main:Object

MrGeneration · December 17, 2024, 1:11pm

That’s because the quotes are fucked up.

zammad run rails r "p Sessions.list.uniq.count"

Code blocks avoid that usually.

AndrewBucklin · January 3, 2025, 11:58pm

We just upgraded to 6.4 last night. I’ve had to restart the Zammad service 4 or 5 times today. It just did it again and I went digging but can’t figure it out.

That’s because the quotes are messed up.
zammad run rails r "p Sessions.list.uniq.count"
Code blocks avoid that usually.

I checked, and we only have 2 active sessions when it occurred.

tlyc · January 6, 2025, 12:38pm

Same here. only 5 - 6 sessions still timeout / crash.

Already 2 months after updated to 6.4.

Still no one can fix it…

MrGeneration · January 6, 2025, 12:39pm

We are working on finding the issue. It’s no easy fix, other wise we would have fixed it already.

LSt-TSR · January 7, 2025, 9:27am

Just to keep you updated on this matter,
we still have that problem but the amount of needed restarts went down from several times a day to like two times within the the last two weeks. Last time it happened there were 12 sessions.

I am not sure if that relies to having less system load because of holidays or changes in version.

Last time it happened on january 2nd and I updated to the latest version then.

Zammad Version 6.4.1-1735804650.fec6d7e0.noble

Till now (fingers crossed) no crashes

tlyc · January 10, 2025, 2:21pm

Since the problem continues to occur intermittently, I’ve set up a cron job and a bash script that checks the website’s status every 2 minutes. If it detects that the website is down, it will execute the Zammad restart command.

I hope this helps anyone who is waiting for the core team to resolve the issue.

chris2 · January 10, 2025, 2:50pm

@tlycj:Just to be curious, how do you check the website`s status?

tlyc · January 13, 2025, 2:11pm

under settings → Monitoring → Health Check

check the return json

chris2 · January 13, 2025, 3:40pm

Thanks tlyc. That’s a feature I wasn’t aware of. It looks quite prospective.

LSt-TSR · January 16, 2025, 1:29pm

Good news.
Please upgrade to the latest Zammad version, the bug was fixed yesterday.
The bug was marked as fixed and closed in the github repository.

Let’s keep our fingers crossed, thanks for the fix.

MrGeneration · January 16, 2025, 1:37pm

Please let us know if it is not.