[SOLVED] Deadlock problem

chesty · June 12, 2018, 4:38am

Used Zammad version: 2.4.x
Used Zammad installation source: zammad-docker-compose

I’m having a nightmare of a time atm, a user can’t sign in, they get a message ostensibly saying a deadlock was detected while trying to update the user table with the last login date.

there are 10 other agents. This is what I think is happening, I’m not sure though. looking at postgres they are locking the user table all the time, locking, unlocking, 10 of them all the time, they queue up waiting for the lock, the user trying to login can’t get a lock in time and gets the error message.

Something’s not right here. it’s server class with ssds. not the fastest cpus but IO should be really fast, 6gbps sata. help.

chesty · June 12, 2018, 7:44am

adding some logs

2018-06-12 00:52:18 UTC [5754]: [153-1] user=postgres,db=zammad_production,app=puma: cluster worker 7: 1 [zammad],client=172.24.0.6 LOG:  process 5754 acquired ExclusiveLock on tuple (274,1) of relation 16
417 of database 16384 after 13172.721 ms
2018-06-12 00:52:18 UTC [5754]: [154-1] user=postgres,db=zammad_production,app=puma: cluster worker 7: 1 [zammad],client=172.24.0.6 STATEMENT:  UPDATE "users" SET "last_login" = $1, "updated_at" = $2 WHERE
 "users"."id" = $3
2018-06-12 00:52:18 UTC [64]: [364-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 LOG:  duration: 94379.571 ms  execute <unnamed>: UPDATE "users" SET "last_login" = $1, "updated_by_id" = $2, "updated_at" = $3 WHERE "users"."id" = $4

chesty · June 12, 2018, 7:46am

2018-06-12 00:52:24 UTC [6261]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=172.24.0.6 LOG:  connection received: host=172.24.0.6 port=43394
2018-06-12 00:52:24 UTC [6261]: [2-1] user=postgres,db=zammad_production,app=[unknown],client=172.24.0.6 LOG:  connection authorized: user=postgres database=zammad_production
2018-06-12 00:52:25 UTC [64]: [366-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 LOG:  process 64 detected deadlock while waiting for ShareLock on transaction 13060920 after 1000.157 ms
2018-06-12 00:52:25 UTC [64]: [367-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 DETAIL:  Process holding the lock: 79. Wait queue: .
2018-06-12 00:52:25 UTC [64]: [368-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 CONTEXT:  while updating tuple (298,15) in relation "users"
2018-06-12 00:52:25 UTC [64]: [369-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 STATEMENT:  UPDATE "users" SET "updated_at" = '2018-06-12 00:52:24.177000' WHERE "users"."id" = $1
2018-06-12 00:52:25 UTC [64]: [370-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 ERROR:  deadlock detected
2018-06-12 00:52:25 UTC [64]: [371-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 DETAIL:  Process 64 waits for ShareLock on transaction 13060920; blocked by process 79.
        Process 79 waits for ShareLock on transaction 13059412; blocked by process 64.
        Process 64: UPDATE "users" SET "updated_at" = '2018-06-12 00:52:24.177000' WHERE "users"."id" = $1
        Process 79: UPDATE "organizations" SET "updated_at" = '2018-06-12 00:52:22.039000' WHERE "organizations"."id" = $1
2018-06-12 00:52:25 UTC [64]: [372-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 HINT:  See server log for query details.
2018-06-12 00:52:25 UTC [64]: [373-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 CONTEXT:  while updating tuple (298,15) in relation "users"
2018-06-12 00:52:25 UTC [64]: [374-1] user=postgres,db=zammad_production,app=puma: cluster worker 5: 1 [zammad],client=172.24.0.6 STATEMENT:  UPDATE "users" SET "updated_at" = '2018-06-12 00:52:24.177000' WHERE "users"."id" = $1

chesty · June 12, 2018, 10:16am

New theory, I think because my CPUs are slow and transaction are taking longer than a second, the default deadlock_timeout of 1000 ms is too small, I’ve bumped it up to 20000 ms. We’ll see how it goes tomorrow.

chesty · June 13, 2018, 1:31am

I adjusted the deadlock timeout but the problem was caused by a cron job running every 5 minutes submitting too many jobs to the scheduler for the scheduler to keep up. I had a 600000 jobs queued up after 24 hours. oops.

MrGeneration · June 13, 2018, 9:52am

oh wow, that’s quite a lot!
Thanks for the Feedback and updates!

system · October 11, 2018, 10:00am

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.