Can´t Backup Zammad

When I execute the Zammad backup script, it completes the backup of the files, but it doesn’t start the backup of the database; it simply freezes. I’ve updated Zammad several times, yet the issue remains. It used to work properly before, but this problem began around 3 months ago. Since then, I’ve been unable to make backups. When I perform a pgdump, the database backup operates correctly, but the Zammad backup script doesn’t. My Zammad’s storage mechanism is file-based. I’ve tried running the script with Zammad in maintenance mode, without activating the maintenance mode, and even by stopping it using ‘systemctl stop zammad’. I also attempted the backup with a full system dump and without, but neither approach works. Below are the specifications and configurations:

Server:
VPS with 10 Cores, 64 GB of RAM, and 400GB NVMe.

Zammad:
WEB_CONCURRENCY=10
RAILS_MAX_THREADS=20
ZAMMAD_PROCESS_SCHEDULED_JOBS_WORKERS=2
ZAMMAD_PROCESS_DELAYED_JOBS_WORKERS=2

ElasticSearch:
-Xms24g
-Xmx24g

Postgresql:
shared_buffers = 16GB
work_mem = 32MB
max_parallel_workers_per_gather = 10
effective_cache_size = 32GB
max_connections = 4000"

Can you post the output of the ps auxfww command when you’re running the backup script and you think it is stalled? Do you see something happening in your PostgreSQL logs perhaps? What does top or htop show?

Hello, I will send you the PostgreSQL log from the last time this happened. I can’t run a new backup right now because we are using it. As soon as our workday ends, I’ll run it again. Here’s the log that appears when it freezes:

2023-09-09 21:45:10.933 CEST [576416] zammad@zammad LOG: could not send data to client: Broken pipe
2023-09-09 21:45:10.933 CEST [576416] zammad@zammad STATEMENT: SELECT c.tableoid, c.oid, c.relname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS relacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) as rrelacl, NULL AS initrelacl, NULL as initrrelacl, c.relkind, c.relnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname, c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, ‘f’::bool AS relhasoids, c.relrowsecurity, c.relforcerowsecurity, c.relfrozenxid, c.relminmxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid, tc.relminmxid AS tminmxid, c.relpersistence, c.relispopulated, c.relreplident, c.relpages, am.amname, c.reloftype, d.refobjid AS owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, array_remove(array_remove(c.reloptions,‘check_option=local’),‘check_option=cascaded’) AS reloptions, CASE WHEN ‘check_option=local’ = ANY (c.reloptions) THEN ‘LOCAL’::text WHEN ‘check_option=cascaded’ = ANY (c.reloptions) THEN ‘CASCADED’::text ELSE NULL END AS checkoption, tc.reloptions AS toast_reloptions, c.relkind = ‘S’ AND EXISTS (SELECT 1 FROM pg_depend WHERE classid = ‘pg_class’::regclass AND objid = c.oid AND objsubid = 0 AND refclassid = ‘pg_class’::regclass AND deptype = ‘i’) AS is_identity_sequence, EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = ‘pg_class’::regclass AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND ((SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(‘c’,c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(‘c’,c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) IS NOT NULL OR (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(‘c’,c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(‘c’,c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) IS NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL))AS changed_acl, c.relispartition AS ispartition FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind = ‘S’ AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype IN (‘a’, ‘i’)) LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid AND c.relkind <> ‘p’) LEFT JOIN pg_am am ON (c.relam = am.oid) LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = ‘pg_class’::regclass AND pip.objsubid = 0) WHERE c.relkind in (‘r’, ‘S’, ‘v’, ‘c’, ‘m’, ‘f’, ‘p’) ORDER BY c.oid
2023-09-09 21:45:10.937 CEST [576416] zammad@zammad FATAL: connection to client lost
2023-09-09 21:45:10.937 CEST [576416] zammad@zammad STATEMENT: SELECT c.tableoid, c.oid, c.relname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS relacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) as rrelacl, NULL AS initrelacl, NULL as initrrelacl, c.relkind, c.relnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname, c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, ‘f’::bool AS relhasoids, c.relrowsecurity, c.relforcerowsecurity, c.relfrozenxid, c.relminmxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid, tc.relminmxid AS tminmxid, c.relpersistence, c.relispopulated, c.relreplident, c.relpages, am.amname, c.reloftype, d.refobjid AS owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, array_remove(array_remove(c.reloptions,‘check_option=local’),‘check_option=cascaded’) AS reloptions, CASE WHEN ‘check_option=local’ = ANY (c.reloptions) THEN ‘LOCAL’::text WHEN ‘check_option=cascaded’ = ANY (c.reloptions) THEN ‘CASCADED’::text ELSE NULL END AS checkoption, tc.reloptions AS toast_reloptions, c.relkind = ‘S’ AND EXISTS (SELECT 1 FROM pg_depend WHERE classid = ‘pg_class’::regclass AND objid = c.oid AND objsubid = 0 AND refclassid = ‘pg_class’::regclass AND deptype = ‘i’) AS is_identity_sequence, EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = ‘pg_class’::regclass AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND ((SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(‘c’,c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(‘c’,c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) IS NOT NULL OR (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(‘c’,c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(‘c’,c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) IS NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL))AS changed_acl, c.relispartition AS ispartition FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind = ‘S’ AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype IN (‘a’, ‘i’)) LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid AND c.relkind <> ‘p’) LEFT JOIN pg_am am ON (c.relam = am.oid) LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = ‘pg_class’::regclass AND pip.objsubid = 0) WHERE c.relkind in (‘r’, ‘S’, ‘v’, ‘c’, ‘m’, ‘f’, ‘p’) ORDER BY c.oid

2023-09-09 21:45:10.933 CEST [576416] zammad@zammad LOG: could not send data to client: Broken pipe
2023-09-09 21:45:10.937 CEST [576416] zammad@zammad FATAL: connection to client lost

This doesn’t look too good. Could it be the backup process or (parts of) Zammad is killed because it’s using too much resources (e.g. the by the OOM Killer or other security controls)? Could you check your system logs?

Hello, I apologize for the delay. Over the past few days, I haven’t had a chance to perform the tests. I just completed a few, and here are the results: When I ran the backup script, it created a zip of the files, and I attached a screenshot of the ‘top’ command, which shows that the zip process was using most of the CPU. However, after the file backup was completed, the script did not start the database backup. Upon checking the PostgreSQL log, it reads the following:

2023-09-20 01:15:26.255 CEST [2172] zammad@zammad LOG: could not receive data from client: Connection reset by peer
2023-09-20 01:15:26.256 CEST [2172] zammad@zammad LOG: unexpected EOF on client connection with an open transaction

zammad backup process
zammad db stuck
zammad zip
stgreSQL log, it reads the following:

Another part of the log: 2023-09-20 02:13:03.526 CEST [1912] zammad@zammad LOG: could not receive data from client: Connection reset by peer
2023-09-20 02:13:03.527 CEST [1912] zammad@zammad LOG: unexpected EOF on client connection with an open transaction
2023-09-20 02:13:53.967 CEST [872] zammad@zammad ERROR: could not serialize access due to read/write dependencies among transactions
2023-09-20 02:13:53.967 CEST [872] zammad@zammad DETAIL: Reason code: Canceled on identification as a pivot, during write.
2023-09-20 02:13:53.967 CEST [872] zammad@zammad HINT: The transaction might succeed if retried.
2023-09-20 02:13:53.967 CEST [872] zammad@zammad STATEMENT: INSERT INTO “active_job_locks” (“lock_key”, “active_job_id”, “created_at”, “updated_at”) VALUES ($1, $2, $3, $4) RETURNING “id”
2023-09-20 02:13:53.971 CEST [872] zammad@zammad ERROR: current transaction is aborted, commands ignored until end of transaction block
2023-09-20 02:13:53.971 CEST [872] zammad@zammad STATEMENT: DEALLOCATE a1
2023-09-20 02:13:53.972 CEST [872] zammad@zammad ERROR: current transaction is aborted, commands ignored until end of transaction block
2023-09-20 02:13:53.972 CEST [872] zammad@zammad STATEMENT: DEALLOCATE a2
2023-09-20 02:13:53.972 CEST [872] zammad@zammad ERROR: current transaction is aborted, commands ignored until end of transaction block
2023-09-20 02:13:53.972 CEST [872] zammad@zammad STATEMENT: DEALLOCATE a3
2023-09-20 02:13:53.972 CEST [872] zammad@zammad ERROR: current transaction is aborted, commands ignored until end of transaction block
2023-09-20 02:13:53.972 CEST [872] zammad@zammad STATEMENT: DEALLOCATE a4
2023-09-20 02:13:53.972 CEST [872] zammad@zammad ERROR: current transaction is aborted, commands ignored until end of transaction block
2023-09-20 02:13:53.972 CEST [872] zammad@zammad STATEMENT: DEALLOCATE a5
2023-09-20 02:13:53.972 CEST [872] zammad@zammad ERROR: current transaction is aborted, commands ignored until end of transaction block
2023-09-20 02:13:53.972 CEST [872] zammad@zammad STATEMENT: DEALLOCATE a6
2023-09-20 02:13:53.972 CEST [872] zammad@zammad ERROR: current transaction is aborted, commands ignored until end of transaction block
2023-09-20 02:13:53.972 CEST [872] zammad@zammad STATEMENT: DEALLOCATE a7

Try to find a maintenance window and stop Zammad, then restart postgresql as safety measure and run the backup process again. After that start Zammad.

Maybe that helps.

Hello, I stopped Zammad when I did the last backup. These logs are from when Zammad was stopped. I also stopped Elasticsearch.

The backup process does not write to the database.

So where does

come from?
I’m very certain that there was at least one Zammad process still running. This kind of error message is a classic one for outdated psql connections being used.

Alright, today I will try a different approach. I’ll use systemctl disable zammad and restart my VPS to ensure that Zammad doesn’t even start up, and then attempt the backup. Once I’m done, I’ll post the results. Thank you very much.

1 Like

Hello, I tried doing as I mentioned. I used systemctl disable zammad and systemctl disable elasticsearch , restarted my VPS, and checked to see if either was running. Both were stopped. I executed the backup script and the behavior was the same. It completes the zip of the data (I can tell through FTP that it reaches a certain size and the file doesn’t grow any further), and it doesn’t start the database backup. This time I waited for 15 minutes and it simply didn’t start, just like the other times, so I stopped the backup process. However, this time there was nothing in the PostgreSQL log, as shown below:

2023-09-20 23:30:53.472 CEST [656] LOG: received fast shutdown request
2023-09-20 23:30:53.473 CEST [656] LOG: aborting any active transactions
2023-09-20 23:30:53.526 CEST [656] LOG: background worker “logical replication launcher” (PID 820) exited with exit code 1
2023-09-20 23:30:53.538 CEST [815] LOG: shutting down
2023-09-20 23:30:53.906 CEST [656] LOG: database system is shut down
2023-09-20 23:31:43.033 CEST [553] LOG: starting PostgreSQL 12.16 (Ubuntu 12.16-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2023-09-20 23:31:43.033 CEST [553] LOG: listening on IPv4 address “127.0.0.1”, port 5432
2023-09-20 23:31:43.033 CEST [553] LOG: could not bind IPv6 address “::1”: Cannot assign requested address
2023-09-20 23:31:43.033 CEST [553] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2023-09-20 23:31:43.034 CEST [553] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2023-09-20 23:31:43.112 CEST [557] LOG: database system was shut down at 2023-09-20 23:30:53 CEST
2023-09-20 23:31:43.127 CEST [553] LOG: database system is ready to accept connections

I tried once more under the same conditions, and this time the following error appeared in the log:

2023-09-21 01:53:56.956 CEST [3359] zammad@zammad LOG: could not send data to client: Broken pipe
2023-09-21 01:53:56.956 CEST [3359] zammad@zammad STATEMENT: SELECT c.tableoid, c.oid, c.relname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS relacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) as rrelacl, NULL AS initrelacl, NULL as initrrelacl, c.relkind, c.relnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname, c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, ‘f’::bool AS relhasoids, c.relrowsecurity, c.relforcerowsecurity, c.relfrozenxid, c.relminmxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid, tc.relminmxid AS tminmxid, c.relpersistence, c.relispopulated, c.relreplident, c.relpages, am.amname, c.reloftype, d.refobjid AS owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, array_remove(array_remove(c.reloptions,‘check_option=local’),‘check_option=cascaded’) AS reloptions, CASE WHEN ‘check_option=local’ = ANY (c.reloptions) THEN ‘LOCAL’::text WHEN ‘check_option=cascaded’ = ANY (c.reloptions) THEN ‘CASCADED’::text ELSE NULL END AS checkoption, tc.reloptions AS toast_reloptions, c.relkind = ‘S’ AND EXISTS (SELECT 1 FROM pg_depend WHERE classid = ‘pg_class’::regclass AND objid = c.oid AND objsubid = 0 AND refclassid = ‘pg_class’::regclass AND deptype = ‘i’) AS is_identity_sequence, EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = ‘pg_class’::regclass AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND ((SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(‘c’,c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(‘c’,c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) IS NOT NULL OR (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(‘c’,c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(‘c’,c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) IS NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL))AS changed_acl, c.relispartition AS ispartition FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind = ‘S’ AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype IN (‘a’, ‘i’)) LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid AND c.relkind <> ‘p’) LEFT JOIN pg_am am ON (c.relam = am.oid) LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = ‘pg_class’::regclass AND pip.objsubid = 0) WHERE c.relkind in (‘r’, ‘S’, ‘v’, ‘c’, ‘m’, ‘f’, ‘p’) ORDER BY c.oid
2023-09-21 01:53:56.959 CEST [3359] zammad@zammad FATAL: connection to client lost
2023-09-21 01:53:56.959 CEST [3359] zammad@zammad STATEMENT: SELECT c.tableoid, c.oid, c.relname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS relacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = ‘S’ THEN ‘s’ ELSE ‘r’ END::“char”,c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) as rrelacl, NULL AS initrelacl, NULL as initrrelacl, c.relkind, c.relnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname, c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, ‘f’::bool AS relhasoids, c.relrowsecurity, c.relforcerowsecurity, c.relfrozenxid, c.relminmxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid, tc.relminmxid AS tminmxid, c.relpersistence, c.relispopulated, c.relreplident, c.relpages, am.amname, c.reloftype, d.refobjid AS owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, array_remove(array_remove(c.reloptions,‘check_option=local’),‘check_option=cascaded’) AS reloptions, CASE WHEN ‘check_option=local’ = ANY (c.reloptions) THEN ‘LOCAL’::text WHEN ‘check_option=cascaded’ = ANY (c.reloptions) THEN ‘CASCADED’::text ELSE NULL END AS checkoption, tc.reloptions AS toast_reloptions, c.relkind = ‘S’ AND EXISTS (SELECT 1 FROM pg_depend WHERE classid = ‘pg_class’::regclass AND objid = c.oid AND objsubid = 0 AND refclassid = ‘pg_class’::regclass AND deptype = ‘i’) AS is_identity_sequence, EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = ‘pg_class’::regclass AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND ((SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(‘c’,c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(‘c’,c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) IS NOT NULL OR (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(‘c’,c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(‘c’,c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) IS NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL))AS changed_acl, c.relispartition AS ispartition FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind = ‘S’ AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype IN (‘a’, ‘i’)) LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid AND c.relkind <> ‘p’) LEFT JOIN pg_am am ON (c.relam = am.oid) LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = ‘pg_class’::regclass AND pip.objsubid = 0) WHERE c.relkind in (‘r’, ‘S’, ‘v’, ‘c’, ‘m’, ‘f’, ‘p’) ORDER BY c.oid

As I said before, the following lines:

2023-09-21 01:53:56.956 CEST [3359] zammad@zammad LOG: could not send data to client: Broken pipe
2023-09-21 01:53:56.959 CEST [3359] zammad@zammad FATAL: connection to client lost

seem to indicate something killed the active query clientside. That was either you, being impatient and killing the backup process, or something else killing the (sub)process. And as I said before: check your system logs to see if something is wrong. Check dmesg output, /var/log/syslog and /var/log/messages etc.

When you think the backup script has stopped, check ps auxfww and note what it says it’s doing regarding the backup. And maybe don’t kill the backup and let it run longer to see if it is actually stalled, or just takes a long while to complete.

As a side note, you appear to be having IPv6 issues on this VM (is it disabled?):

2023-09-20 23:31:43.033 CEST [553] LOG: could not bind IPv6 address “::1”: Cannot assign requested address

Hello, I apologize for forgetting to run the ps auxfww . I will do it again today, but let me try to explain better how I concluded that it’s freezing. My Zammad’s storage mechanism is file-based, and the zip file is currently 46GB in size. Every time I did the backup, I also connected via FTP and observed the /var/tmp/zammad_backup directory to watch the files being created, and it has always been instantaneous. As soon as the zip is created, the database file creation starts immediately (yes, I always refresh the folder to see the file being created). However, for the past few months, I haven’t been able to complete this process, even after waiting a long time. As I mentioned, yesterday I waited 15 minutes and the file didn’t start to be generated. Observing the SSH shell, the backup script also doesn’t start to generate the database file. There were other times when I left it indefinitely, the SSH even closes and loses the connection (I left it for over an hour) and nothing happened. I have two more Zammad installations, one on another test server and one locally for testing. On the other two, it works normally, only on this one did I start having this problem about 3 months ago, and since then I haven’t been able to do backups. I’ve tried with Zammad running, with Zammad stopped, doing the full system dump and not doing the full system dump. Yesterday, I even disabled the startup of Zammad and Elasticsearch, restarted the VPS, and started the process, and still nothing.


zammad 2 hours later

Hello, I conducted the test with ps auxfww . Attached is a screenshot from the moment it was frozen. I waited for 2 hours to see if the database backup would start, but nothing happened. Eventually, the SSH connection closed on its own, as shown in the attached screenshot as well. Bellow is what appeared in the syslog at that time:

Sep 21 18:22:12 vmi1179260 systemd[1]: dmesg.service: Succeeded.
Sep 21 18:22:37 vmi1179260 systemd-timesyncd[429]: Initial synchronization to time server 185.125.190.56:123 (ntp.ubuntu.com).
Sep 21 18:24:06 vmi1179260 systemd-networkd-wait-online[361]: Event loop failed: Connection timed out
Sep 21 18:24:06 vmi1179260 systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Sep 21 18:24:06 vmi1179260 systemd[1]: systemd-networkd-wait-online.service: Failed with result ‘exit-code’.
Sep 21 18:24:06 vmi1179260 systemd[1]: Failed to start Wait for Network to be Configured.
Sep 21 18:24:06 vmi1179260 systemd[1]: Reached target Network is Online.
Sep 21 18:24:06 vmi1179260 systemd[1]: Starting LSB: powerful, efficient, and scalable Mail Transport Agent…
Sep 21 18:24:06 vmi1179260 systemd[1]: Condition check resulted in Ubuntu Pro Background Auto Attach being skipped.
Sep 21 18:24:06 vmi1179260 systemd[1]: Starting Webmin server daemon…
Sep 21 18:24:06 vmi1179260 sendmail[760]: * Starting Mail Transport Agent (MTA) sendmail
Sep 21 18:24:06 vmi1179260 systemd[1]: Created slice User Slice of UID 114.
Sep 21 18:24:06 vmi1179260 systemd[1]: Starting User Runtime Directory /run/user/114…
Sep 21 18:24:06 vmi1179260 systemd[1]: Finished User Runtime Directory /run/user/114.
Sep 21 18:24:06 vmi1179260 systemd[1]: Starting User Manager for UID 114…
Sep 21 18:24:06 vmi1179260 systemd[819]: Reached target Paths.
Sep 21 18:24:06 vmi1179260 systemd[819]: Reached target Timers.
Sep 21 18:24:06 vmi1179260 systemd[819]: Starting D-Bus User Message Bus Socket.
Sep 21 18:24:06 vmi1179260 systemd[819]: Listening on GnuPG network certificate management daemon.
Sep 21 18:24:06 vmi1179260 systemd[819]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Sep 21 18:24:06 vmi1179260 systemd[819]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Sep 21 18:24:06 vmi1179260 systemd[819]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Sep 21 18:24:06 vmi1179260 systemd[819]: Listening on GnuPG cryptographic agent and passphrase cache.
Sep 21 18:24:06 vmi1179260 systemd[819]: Listening on debconf communication socket.
Sep 21 18:24:06 vmi1179260 systemd[819]: Listening on D-Bus User Message Bus Socket.
Sep 21 18:24:06 vmi1179260 systemd[819]: Reached target Sockets.
Sep 21 18:24:06 vmi1179260 systemd[819]: Reached target Basic System.
Sep 21 18:24:06 vmi1179260 systemd[819]: Reached target Main User Target.
Sep 21 18:24:06 vmi1179260 systemd[819]: Startup finished in 68ms.
Sep 21 18:24:06 vmi1179260 systemd[1]: Started User Manager for UID 114.
Sep 21 18:24:06 vmi1179260 systemd[1]: Started Session c1 of user smmsp.
Sep 21 18:24:06 vmi1179260 systemd[1]: session-c1.scope: Succeeded.
Sep 21 18:24:07 vmi1179260 sm-mta[831]: starting daemon (8.15.2): SMTP+queueing@00:10:00
Sep 21 18:24:09 vmi1179260 sendmail[760]: …done.

I really think the backup script is either quietly being killed off, possibly due to security controls kicking in and terminating your connection or logging you off from the system, possibly due to perceived inactivity, or something like that. It might be a firewall or router outside of the system forcefully terminating the connection due to suspected inactivity. The client_loop: send disconnect: Broken pipe seems to hint at some sort of network issue, and it also would explain why you’re not seeing any further status messages from the backup script. In any case, I don’t think it’s a problem with the backup script itself.

You might want to try to configure your SSH client to send keepalives to the server, or run the backup script in the background while you make sure to do something that generates a bit of network traffic to keep the session alive in the foreground. Or check your Cloud provider for security controls which terminate idle network connections, if that is possible. Running the backup script non-interactive, scheduled via Cron should completely circumvent this issue.

1 Like

Hello, I ran the test using a cronjob and it worked. Now I’ll check what might be killing my connection. Thank you very much.

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.