Microsoft Azure App OAuth2 Login Callback Phase takes very long

trkms · February 20, 2023, 6:05pm

Used Zammad version: 5.3.1-1676465796.746f9768.focal
Used Zammad installation type: package
Operating system: Ubuntu 20.04
Browser + version: any browser

Expected behavior:

After redirecting to the callback url, the user should be redirected to the dashboard in an appropriate time

Actual behavior:

The azure token endpoint redirects to the redirect url of the ticket system instance. The logs show that a callback phase has started. Now, it takes around 15-20sec until it responds with HTTP 302 and redirects to the dashboard. A second before the response the log file says that “Reading image from tempfile '#Tempfile:/tmp/avatar20230220-2977-30l81j.jpeg”. When the redirect comes, the log file says “Completed 302 Found in 101ms (ActiveRecord: 35.5ms | Allocations: 23343)”.

Steps to reproduce the behavior:

It suddenly occured after having the system running for more than three years now.

MrGeneration · March 6, 2023, 9:09pm

What exactly takes 20 seconds to load.
Until you see the initial Zammad application page or loading through all tabs your user has open?

jok · March 7, 2023, 4:23pm

Hi,
I’ve got the exact same problem on v5.2.3 on an Ubuntu Server 22.04.2 LTS. Will try to update to the latest version of Zammad to check whether that helps.
He’s probably tracking the time till initial page render of the zammad application page, at least I am. In my case, this issue was there since the installation at the end of august of 2022.

Here are my logs:

I, [2023-03-07T16:27:44.822343 #658-4771060]  INFO -- : Started POST "/auth/microsoft_office365" for xx.xx.xx.xx at 2023-03-07 16:27:44 +0000
I, [2023-03-07T16:27:44.834665 #658-4771060]  INFO -- : (microsoft_office365) Request phase initiated.
I, [2023-03-07T16:27:45.162019 #658-4769360]  INFO -- : Started GET "/auth/microsoft_office365/callback?code=[FILTERED]&state=xxxx&session_state=xxxxx" for xx.xx.xx.xx at 2023-03-07 16:27:45 +0000
I, [2023-03-07T16:27:45.168507 #658-4769360]  INFO -- : (microsoft_office365) Callback phase initiated.
I, [2023-03-07T16:27:49.421367 #1004260-109380]  INFO -- : Running job thread for 'Process ticket escalations.' (Ticket.process_escalation) status is: sleep
I, [2023-03-07T16:27:59.421942 #1004260-109380]  INFO -- : Running job thread for 'Generate 'Session' data.' (Sessions.jobs) status is: sleep
I, [2023-03-07T16:28:04.153223 #658-4769360]  INFO -- : Processing by SessionsController#create_omniauth as HTML
I, [2023-03-07T16:28:04.153417 #658-4769360]  INFO -- :   Parameters: {"code"=>"[FILTERED]", "state"=>"xxxxx", "session_state"=>"xxxxx", "provider"=>"microsoft_office365"}
I, [2023-03-07T16:28:04.246018 #658-4769360]  INFO -- : Reading image from tempfile '#<Tempfile:/tmp/avatar20230307-658-gnza09.jpeg>'
I, [2023-03-07T16:28:04.327384 #658-4769360]  INFO -- : Redirected to https://xxxxx.xx/
I, [2023-03-07T16:28:04.330047 #658-4769360]  INFO -- : Completed 302 Found in 176ms (ActiveRecord: 64.3ms | Allocations: 20220)
I, [2023-03-07T16:28:04.363774 #658-4769240]  INFO -- : Started GET "/" for xx.xx.xx.xx at 2023-03-07 16:28:04 +0000
I, [2023-03-07T16:28:04.373676 #658-4769240]  INFO -- : Processing by InitController#index as HTML
I, [2023-03-07T16:28:04.384602 #658-4769240]  INFO -- :   Rendered init/index.html.erb within layouts/application (Duration: 0.3ms | Allocations: 67)
I, [2023-03-07T16:28:04.385900 #658-4769240]  INFO -- :   Rendered layout layouts/application.html.erb (Duration: 1.6ms | Allocations: 510)
I, [2023-03-07T16:28:04.387690 #658-4769240]  INFO -- : Completed 200 OK in 14ms (Views: 2.4ms | ActiveRecord: 4.0ms | Allocations: 2949)
I, [2023-03-07T16:28:04.553063 #658-4769540]  INFO -- : Started POST "/api/v1/signshow" for xx.xx.xx.xx at 2023-03-07 16:28:04 +0000
I, [2023-03-07T16:28:04.568528 #658-4769540]  INFO -- : Processing by SessionsController#show as JSON
I, [2023-03-07T16:28:04.568659 #658-4769540]  INFO -- :   Parameters: {"fingerprint"=>"276055661"}
I, [2023-03-07T16:28:05.230605 #658-4769540]  INFO -- : Completed 200 OK in 662ms (Views: 59.4ms | ActiveRecord: 83.0ms | Allocations: 179555)
I, [2023-03-07T16:28:05.280114 #658-4769160]  INFO -- : Started GET "/api/v1/translations/lang/de-de?_=1678206484507" for xx.xx.xx.xx at 2023-03-07 16:28:05 +0000
I, [2023-03-07T16:28:05.295355 #658-4769160]  INFO -- : Processing by TranslationsController#lang as JSON
I, [2023-03-07T16:28:05.295769 #658-4769160]  INFO -- :   Parameters: {"_"=>"1678206484507", "locale"=>"de-de"}
I, [2023-03-07T16:28:05.605571 #658-4769160]  INFO -- : Completed 200 OK in 309ms (Views: 230.2ms | ActiveRecord: 55.2ms | Allocations: 50195)
I, [2023-03-07T16:28:05.720487 #658-4769320]  INFO -- : Started GET "/api/v1/users/image/65378b0c17fee52b25a1333b27eee1a2" for xx.xx.xx.xx at 2023-03-07 16:28:05 +0000
I, [2023-03-07T16:28:05.732638 #658-4769320]  INFO -- : Processing by UsersController#image as HTML
I, [2023-03-07T16:28:05.732776 #658-4769320]  INFO -- :   Parameters: {"hash"=>"65378b0c17fee52b25a1333b27eee1a2"}
I, [2023-03-07T16:28:05.740931 #658-4771180]  INFO -- : Started GET "/api/v1/activity_stream?full=true&limit=25&_=1678206484508" for xx.xx.xx.xx at 2023-03-07 16:28:05 +0000
I, [2023-03-07T16:28:05.745022 #658-4584820]  INFO -- : Started GET "/api/v1/first_steps?_=1678206484509" for xx.xx.xx.xx at 2023-03-07 16:28:05 +0000
I, [2023-03-07T16:28:05.754862 #658-4584820]  INFO -- : Processing by FirstStepsController#index as JSON
I, [2023-03-07T16:28:05.755196 #658-4584820]  INFO -- :   Parameters: {"_"=>"1678206484509"}
I, [2023-03-07T16:28:05.764913 #658-4771180]  INFO -- : Processing by ActivityStreamController#show as JSON
I, [2023-03-07T16:28:05.765024 #658-4771180]  INFO -- :   Parameters: {"full"=>"true", "limit"=>"25", "_"=>"1678206484508"}
I, [2023-03-07T16:28:05.794828 #658-4769320]  INFO -- :   Rendered text template (Duration: 0.0ms | Allocations: 1)
I, [2023-03-07T16:28:05.797553 #658-4769320]  INFO -- : Sent data avatar (3.1ms)
I, [2023-03-07T16:28:05.799069 #658-4769320]  INFO -- : Completed 200 OK in 66ms (Views: 2.9ms | ActiveRecord: 39.5ms | Allocations: 23320)
I, [2023-03-07T16:28:05.807772 #658-4584820]  INFO -- : Completed 200 OK in 52ms (Views: 0.4ms | ActiveRecord: 33.0ms | Allocations: 18362)
I, [2023-03-07T16:28:06.002526 #658-4771180]  INFO -- : Completed 200 OK in 237ms (Views: 17.8ms | ActiveRecord: 71.9ms | Allocations: 69642)
I, [2023-03-07T16:28:06.089825 #658-4771300]  INFO -- : Started GET "/api/v1/users/image/313e166492d1125f1a47fa4cd0382020" for xx.xx.xx.xx at 2023-03-07 16:28:06 +0000
I, [2023-03-07T16:28:06.105453 #658-4771300]  INFO -- : Processing by UsersController#image as HTML
I, [2023-03-07T16:28:06.105615 #658-4771300]  INFO -- :   Parameters: {"hash"=>"313e166492d1125f1a47fa4cd0382020"}
I, [2023-03-07T16:28:06.139026 #658-4771300]  INFO -- :   Rendered text template (Duration: 0.0ms | Allocations: 1)
I, [2023-03-07T16:28:06.139355 #658-4771300]  INFO -- : Sent data avatar (1.7ms)
I, [2023-03-07T16:28:06.141895 #658-4771300]  INFO -- : Completed 200 OK in 36ms (Views: 0.7ms | ActiveRecord: 12.1ms | Allocations: 4167)
I, [2023-03-07T16:28:06.387251 #658-4771260]  INFO -- : Started GET "/api/v1/cti/log?_=1678206484510" for xx.xx.xx.xx at 2023-03-07 16:28:06 +0000
I, [2023-03-07T16:28:06.401662 #658-4771260]  INFO -- : Processing by CtiController#index as JSON
I, [2023-03-07T16:28:06.401806 #658-4771260]  INFO -- :   Parameters: {"_"=>"1678206484510"}
I, [2023-03-07T16:28:06.436253 #658-4771260]  INFO -- : Completed 200 OK in 34ms (Views: 0.4ms | ActiveRecord: 12.5ms | Allocations: 5887)

trkms · March 16, 2023, 5:43pm

It is the duration between the oauth2 callback request and the dashboard when it is visible in the browser. It seems like the same problem as jok has.

MrGeneration · March 29, 2023, 1:07pm

If it’s the application responding too slow to the initial connect request, then you’d want to have a look at Zammad’s WEB_CONCURRENCY which might not be high enough.

Please see:
Configuration via Environment Variables — Zammad documentation (performance tuning section)

jok · June 29, 2023, 9:39am

Setting WEB_CONCURRENCY to 120 didn’t solve the problem - either way, this is an instance which only two users access at all (my colleague & me) on an appropriate machine, so that kind of tuning shouldn’t be required - right?

While logging in, the server is really bored - no load at all. The system also works snappy otherwise, only the login takes this long. Is there anything else I could try?

Now on version 6.0.0-1688021663.f2f6c365.focal.

For testing purposes, I upscaled the host to 4 cores and 8GB ram - while nearly all of the ram is used up by elasticsearch, the CPU doesn’t care at all. The login still takes the same long time - are there any timeouts while logging in via MS?

trkms · June 29, 2023, 3:34pm

I solved this issue in the meanwhile - a simple ipv6 configuration problem caused the system to stuck at the callback for the exact same time before the fallback to v4 was probably chosen after a fixed timeout.

jok · June 29, 2023, 3:51pm

Thank you!!
Disabling Ipv6 solved this issue. Yes, solved, not worked around, because nobody likes Ipv6 anyway

lmarschall · July 3, 2023, 7:45am

May I ask, how you disabled IPv6 on your servers? We are currently facing the same slow sign in process by using SSO and I disabled IPv6 via NGINX with the following line:

resolver 8.8.8.8 8.8.4.4 ipv6=off;

But I does not speed up things, unfortunately

jok · July 3, 2023, 8:21am

Minor misunderstanding there - I didn’t turn off the access to the webserver via IPv6, I turned off IPv6 on the box altogether via the kernel argument net.ipv6.conf.all.disable_ipv6=1. Not really a great solution, but for me it works, as in our infrastructure IPv6 isn’t used anyway.

lmarschall · July 7, 2023, 8:37pm

Thank you so much, I disabled ipv6 on our VPS and now not only the SSO is fast, the webhooks also work finally. Any hints why things fail with ipv6 with zammad. I checked firewall settings, they seem to be fine.

jok · July 7, 2023, 8:52pm

Great! No idea - my configuration was fine too (at least it’s the vanilla configuration on any hetzner cloud vps - I didn’t change anything to mess it up). I think this is a bug, and I can’t believe we’re the only ones who experience this, because it should be very common - am I missing something here?

lmarschall · July 7, 2023, 9:06pm

We are using Hetzner VPS as well, I’ll see what I can debug, maybe its a nginx firewall combination problem.

MrGeneration · July 7, 2023, 9:08pm

Nope.

I assume you’re proxying to localhost in your nginx / apache.
On most systems that’s fine, but it also means: 127.0.0.1 and ::1.
However, rails by default listens to 127.0.0.1 so this will cause timeouts.

Change localhost to 127.0.0.1 and you should be fine.

lmarschall · July 7, 2023, 9:19pm

That’s our config, we already use 127.0.0.1

upstream zammad-railsserver {
    server 127.0.0.1:3000;
}

upstream zammad-websocket {
    server 127.0.0.1:6042;
}

server {
  listen 80;
  listen [::]:80;

  server_name ;

  # security - prevent information disclosure about server version
  server_tokens off;

  access_log /var/log/nginx/zammad.access.log;
  error_log /var/log/nginx/zammad.error.log;

  location /.well-known/ {
    root /var/www/html;
  }

  return 301 https://$server_name$request_uri;

}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;

    # replace 'localhost' with your fqdn if you want to use zammad from remote
    server_name ;

    # security - prevent information disclosure about server version
    server_tokens off;

    ssl_certificate ;
    ssl_certificate_key ;
	
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers on;
    ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH';
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 180m;
    ssl_stapling on;
    ssl_stapling_verify on;

    add_header Strict-Transport-Security "max-age=31536000" always;

    resolver 8.8.8.8 8.8.4.4 ipv6=off;

    root /opt/zammad/public;

    access_log /var/log/nginx/zammad.access.log;
    error_log  /var/log/nginx/zammad.error.log;

    client_max_body_size 50M;

    location ~ ^/(assets/|robots.txt|humans.txt|favicon.ico) {
        expires max;
    }

    location /ws {
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header CLIENT_IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 86400;
        proxy_pass http://zammad-websocket;
    }

    location /cable {
	proxy_http_version 1.1;
	proxy_set_header Upgrade $http_upgrade;
	proxy_set_header Connection "Upgrade";
	proxy_set_header CLIENT_IP $remote_addr;
	proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
	proxy_set_header X-Forwarded-Proto $scheme;
	proxy_read_timeout 86400;
	proxy_pass http://zammad-railsserver;
    }

    location / {
        proxy_set_header Host $http_host;
        proxy_set_header CLIENT_IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
	proxy_set_header X-Forwarded-User "";
        proxy_read_timeout 300;
        proxy_pass http://zammad-railsserver;

        gzip on;
        gzip_types text/plain text/xml text/css image/svg+xml application/javascript application/x-javascript application/json application/xml;
        gzip_proxied any;
    }
}

MrGeneration · July 7, 2023, 9:40pm

Are you certain this is not a client issue?
SaaS and my personal instances do not face this kind of issue.

Usually Zammad redirects you to Microsoft and Microsoft redirects you back to Zammad.
This is also why Microsoft doesn’t need direct access via HTTPs to your instance.

The URL calling is 100% done by your browser.
So the only issue could be either your instance being too busy for whatever reason (so a performance / tuning issue) or your proxy being unable to communicate to the rails server.

So right now it more looks like option 1 or client issue. Especially you restarting your VPS after disabling IPv6 might fool you to have fixed the issue while it’s more of a configuration issue.

Cycle through the database section here Configure Database server — Zammad documentation because your issue sounds like a perfect example of this scope.

lmarschall · July 7, 2023, 10:11pm

Actually there was no restarting after changing the kernel parameters, I used this guide, How to Disable IPv6 on Ubuntu Linux and only disabled ipv6 temporally. I’m going to tinker with the settings and resetting the firewall anyway, so maybe there’s a way I can replicate this behavior.

jok · July 7, 2023, 10:37pm

I’ve got the exact Problem Imarshall has aswell. Looking at the times in the log I provided earlier (Mar 7), you can see that the callback from microsoft reaches the zammad server almost instantly (:45). Then there’s a delay until :04 until something meainingful happens again.

Another observation: Hetzner seems to have done something weird with their network configuration: If I remove all the netplan configuration, it normally would do dhcp4 and get its ipv4 (which is the only IP that I assigned via the CP anyway). But it doesn’t - if I remove the netplan config it works if I do netplan try, but as soon as I reboot the machine I’ve got to rescue it as the network is dead. The same happens if I disable ipv6 via the kernel command line.

If I disable ipv6 temporairly via sysctl -w, logging into zammad via sso reproducibly works almost instantly while keeping the system online.

On this system, which, as I said, is in its default configuration, the interface is named “eth0” and the netplan config is filled with all kinds of stuff, like setting an ipv6 manually - on newer machines I create via hetzner, the interface is named ens3. The zammad machine is about a year old. Yes, I’ve lived with the login delay for a long time

I’m pretty certain that moving the system onto a new hetzner appliance with the new, non-weird type of network setup (an empty netplan config, it “just works”) will make it work again.

system · July 1, 2024, 10:37pm

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.