[SOLVED] Deadlock problem

  • Used Zammad version: 2.4.x
  • Used Zammad installation source: zammad-docker-compose

I’m having a nightmare of a time atm, a user can’t sign in, they get a message ostensibly saying a deadlock was detected while trying to update the user table with the last login date.

there are 10 other agents. This is what I think is happening, I’m not sure though. looking at postgres they are locking the user table all the time, locking, unlocking, 10 of them all the time, they queue up waiting for the lock, the user trying to login can’t get a lock in time and gets the error message.

Something’s not right here. it’s server class with ssds. not the fastest cpus but IO should be really fast, 6gbps sata. help.

adding some logs

2018-06-12 00:52:18 UTC [5754]: [153-1] user=postgres,db=zammad_production,app=puma: cluster worker 7: 1 [zammad],client=172.24.0.6 LOG:  process 5754 acquired ExclusiveLock on tuple (274,1) of relation 16
417 of database 16384 after 13172.721 ms
2018-06-12 00:52:18 UTC [5754]: [154-1] user=postgres,db=zammad_production,app=puma: cluster worker 7: 1 [zammad],client=172.24.0.6 STATEMENT:  UPDATE "users" SET "last_login" = $1, "updated_at" = $2 WHERE
 "users"."id" = $3
2018-06-12 00:52:18 UTC [64]: [364-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 LOG:  duration: 94379.571 ms  execute <unnamed>: UPDATE "users" SET "last_login" = $1, "updated_by_id" = $2, "updated_at" = $3 WHERE "users"."id" = $4
2018-06-12 00:52:24 UTC [6261]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=172.24.0.6 LOG:  connection received: host=172.24.0.6 port=43394
2018-06-12 00:52:24 UTC [6261]: [2-1] user=postgres,db=zammad_production,app=[unknown],client=172.24.0.6 LOG:  connection authorized: user=postgres database=zammad_production
2018-06-12 00:52:25 UTC [64]: [366-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 LOG:  process 64 detected deadlock while waiting for ShareLock on transaction 13060920 after 1000.157 ms
2018-06-12 00:52:25 UTC [64]: [367-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 DETAIL:  Process holding the lock: 79. Wait queue: .
2018-06-12 00:52:25 UTC [64]: [368-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 CONTEXT:  while updating tuple (298,15) in relation "users"
2018-06-12 00:52:25 UTC [64]: [369-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 STATEMENT:  UPDATE "users" SET "updated_at" = '2018-06-12 00:52:24.177000' WHERE "users"."id" = $1
2018-06-12 00:52:25 UTC [64]: [370-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 ERROR:  deadlock detected
2018-06-12 00:52:25 UTC [64]: [371-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 DETAIL:  Process 64 waits for ShareLock on transaction 13060920; blocked by process 79.
        Process 79 waits for ShareLock on transaction 13059412; blocked by process 64.
        Process 64: UPDATE "users" SET "updated_at" = '2018-06-12 00:52:24.177000' WHERE "users"."id" = $1
        Process 79: UPDATE "organizations" SET "updated_at" = '2018-06-12 00:52:22.039000' WHERE "organizations"."id" = $1
2018-06-12 00:52:25 UTC [64]: [372-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 HINT:  See server log for query details.
2018-06-12 00:52:25 UTC [64]: [373-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 CONTEXT:  while updating tuple (298,15) in relation "users"
2018-06-12 00:52:25 UTC [64]: [374-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 STATEMENT:  UPDATE "users" SET "updated_at" = '2018-06-12 00:52:24.177000' WHERE "users"."id" = $1

New theory, I think because my CPUs are slow and transaction are taking longer than a second, the default deadlock_timeout of 1000 ms is too small, I’ve bumped it up to 20000 ms. We’ll see how it goes tomorrow.

I adjusted the deadlock timeout but the problem was caused by a cron job running every 5 minutes submitting too many jobs to the scheduler for the scheduler to keep up. I had a 600000 jobs queued up after 24 hours. oops.

oh wow, that’s quite a lot!
Thanks for the Feedback and updates! :slight_smile:

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.