Used Zammad installation source: zammad-docker-compose
Expected behavior:
hmmm, not sure.
Actual behavior:
I’m using freeipa on the same server, quite frequently, maybe once or twice a day, a user will get disabled and I’ll see in the ldap integration screen
out failed tcp failed -> chesty 26 minutes ago
clicking “Start New” to do another re-sync re-enables the user.
I’ve done some basic network testing, and it’s not a network problem (network is a docker bridge on the same host)
The server has 96G of ram, and 32 cores, they aren’t the fastest cores E5-2660 0 @ 2.20GHz, the server is lightly loaded. loadavg of 3, 40GB free ram with no swap.
Steps to reproduce the behavior:
no idea, it might be related to the speed of the ldap server? might be an actual network problem but I can’t find one.
Hi @chesty - let’s see what we got here please do the following:
1.) Create a file called debug_issue.rb in your Zammad directory (usually /opt/zammad)
2.) Run the file from your Zammad directory via zammad run rails r debug_issue.rb or rails r debug_issue.rb as zammad user, depending on your installation source (package/source)
3.) Post the output here. Make sure all sensitive data is anonymized! If you want you can drop it as an email to support@zammad.com. Please refer to this thread and me
4.) Delete the debug_issue.rb file
require 'mixin/rails_logger'
module Mixin
module RailsLogger
def self.logger
@logger ||= Logger.new(log_to).tap do |logger|
logger.level = :debug
end
end
def self.log_to
# STDOUT
'debug_issue.log'
end
end
end
ImportJob.create(name: 'Import::Ldap').start
that’s really annoying, I ran it 10 times on the weekend, didn’t see any failures. I thought I’d wait until monday when there’s load on the server and run it again. I ran it 5 times before I got busy and didn’t see any failures.
I’ll keep trying over the next week to reproduce. I guess if you’re happy with the current behaviour of tcp errors it’s all good. At least I know now how to turn debugging on.
@anon29869905, I’m still testing, but it seems things that happen through the scheduler like ldap sync are likely to fail occasionally, but if I run them from rails console then never fail.
Today I turned LDAP sync back on, it started getting tcp errors around 1pm, I clicked start new to reenable the account, and got 2 more tcp errors. 10-20 minutes later I started running the debug_issue.rb and it was super quick and didn’t error once, I’ve run it 5 times.
I’m looking at my docker-compose-override,
in zammad-railsserver I have
environment:
- WEB_CONCURRENCY=8
- MAX_THREADS=32
- MIN_THREADS=8
do I need those in
zammad-scheduler: and zammad-websocket: too?
Seems like my notification wasn’t send for this. However, does this issue still occur? If so, are there any ERROR messages related to this in your log/production.log?
Thanks @anon29869905, no it’s not happening anymore, I’m pretty sure it was caused by 600000 jobs in the delayed_job queue and postgres deadlock timeouts.
and the large delayed_job queue and postgres deadlocks also caused unprocessable emails. I’d get an unprocessable email, process it manually and it would process fine with no errors.
I fixed the large delayed_jobs which was caused a script that ran every 5 minutes that searched for email message-ids, maybe 50 of them every 5 minutes, those 50 api requests create maybe 500 delayed_jobs or more? I don’t really understand why, the api request might take 5 seconds, not sure why all the delayed_jobs get created.
and I also bumped up the postgres deadlock timeout a touch to 15 seconds from the default 1 second.