Ldap sync tcp errors

chesty · May 28, 2018, 4:19am

Used Zammad version: 2.4
Used Zammad installation source: zammad-docker-compose

Expected behavior:

hmmm, not sure.

Actual behavior:

I’m using freeipa on the same server, quite frequently, maybe once or twice a day, a user will get disabled and I’ll see in the ldap integration screen

out failed tcp failed -> chesty 26 minutes ago

clicking “Start New” to do another re-sync re-enables the user.

I’ve done some basic network testing, and it’s not a network problem (network is a docker bridge on the same host)

The server has 96G of ram, and 32 cores, they aren’t the fastest cores E5-2660 0 @ 2.20GHz, the server is lightly loaded. loadavg of 3, 40GB free ram with no swap.

Steps to reproduce the behavior:

no idea, it might be related to the speed of the ldap server? might be an actual network problem but I can’t find one.

anon29869905 · June 1, 2018, 2:24pm

Hi @chesty - let’s see what we got here please do the following:

1.) Create a file called debug_issue.rb in your Zammad directory (usually /opt/zammad)
2.) Run the file from your Zammad directory via zammad run rails r debug_issue.rb or rails r debug_issue.rb as zammad user, depending on your installation source (package/source)
3.) Post the output here. Make sure all sensitive data is anonymized! If you want you can drop it as an email to support@zammad.com. Please refer to this thread and me
4.) Delete the debug_issue.rb file

require 'mixin/rails_logger'

module Mixin
  module RailsLogger
    def self.logger
      @logger ||= Logger.new(log_to).tap do |logger|
        logger.level = :debug
      end
    end

    def self.log_to
      # STDOUT
      'debug_issue.log'
    end
  end
end


ImportJob.create(name: 'Import::Ldap').start

chesty · June 4, 2018, 11:23am

Thanks @anon29869905,

that’s really annoying, I ran it 10 times on the weekend, didn’t see any failures. I thought I’d wait until monday when there’s load on the server and run it again. I ran it 5 times before I got busy and didn’t see any failures.

I’ll keep trying over the next week to reproduce. I guess if you’re happy with the current behaviour of tcp errors it’s all good. At least I know now how to turn debugging on.

cheers.

chesty · June 8, 2018, 5:28am

@anon29869905, I’m still testing, but it seems things that happen through the scheduler like ldap sync are likely to fail occasionally, but if I run them from rails console then never fail.

Today I turned LDAP sync back on, it started getting tcp errors around 1pm, I clicked start new to reenable the account, and got 2 more tcp errors. 10-20 minutes later I started running the debug_issue.rb and it was super quick and didn’t error once, I’ve run it 5 times.

I’m looking at my docker-compose-override,
in zammad-railsserver I have
environment:
- WEB_CONCURRENCY=8
- MAX_THREADS=32
- MIN_THREADS=8

do I need those in
zammad-scheduler: and zammad-websocket: too?

anon29869905 · July 3, 2018, 4:34pm

Seems like my notification wasn’t send for this. However, does this issue still occur? If so, are there any ERROR messages related to this in your log/production.log?

chesty · July 3, 2018, 4:51pm

Thanks @anon29869905, no it’s not happening anymore, I’m pretty sure it was caused by 600000 jobs in the delayed_job queue and postgres deadlock timeouts.

chesty · July 3, 2018, 4:58pm

and the large delayed_job queue and postgres deadlocks also caused unprocessable emails. I’d get an unprocessable email, process it manually and it would process fine with no errors.

I fixed the large delayed_jobs which was caused a script that ran every 5 minutes that searched for email message-ids, maybe 50 of them every 5 minutes, those 50 api requests create maybe 500 delayed_jobs or more? I don’t really understand why, the api request might take 5 seconds, not sure why all the delayed_jobs get created.
and I also bumped up the postgres deadlock timeout a touch to 15 seconds from the default 1 second.

anon29869905 · July 3, 2018, 5:04pm

Thanks for clearing things up!

system · October 31, 2018, 5:04pm

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.