Zammad 5.4 upgrade: error during translation update

Infos:

  • Used Zammad version: 5.4.0
  • Used Zammad installation type: (source, package, docker-compose, …): package
  • Operating system: CentOS Stream 8
  • Browser + version: Opera

Expected behavior:

  • Upgrade from 5.3.1 without any error

Actual behavior:

  • Upgrade done according to Updating Zammad — Zammad documentation (CentOS variant)
  • During the postinstall script, I got the following error (I shortened the traceback, hopefully keepting the useful part):

Running transaction
Preparing : 1/1
Running scriptlet: zammad-5.4.0-1678779346.2fb8d15c.centos8.x86_64 1/2
Upgrading : zammad-5.4.0-1678779346.2fb8d15c.centos8.x86_64 1/2
Running scriptlet: zammad-5.4.0-1678779346.2fb8d15c.centos8.x86_64 1/2

(Re)creating init scripts

Nothing to do.
Nothing to do.
Nothing to do.

Enabling Zammad on boot

Stopping Zammad

Clear cache…

database.yml found. Updating db…

== 20221116134211 Issue2185AndOrConditions: migrating =========================
== 20221116134211 Issue2185AndOrConditions: migrated (0.0257s) ================

== 20221206124535 Issue4412SettingSamlLogoutUrl: migrating ====================
== 20221206124535 Issue4412SettingSamlLogoutUrl: migrated (0.0056s) ===========

== 20221206141429 Issue4412SettingSamlIdpCertSwitchTextarea: migrating ========
== 20221206141429 Issue4412SettingSamlIdpCertSwitchTextarea: migrated (0.0057s)

== 20221208131506 Issue3971ExchangeOauth2: migrating ==========================
== 20221208131506 Issue3971ExchangeOauth2: migrated (0.0463s) =================

== 20221209095600 Issue4410AccountLinkingNotification: migrating ==============
== 20221209095600 Issue4410AccountLinkingNotification: migrated (0.0046s) =====

== 20221213145618 Issue4411SettingSamlCustomUidAttribute: migrating ===========
== 20221213145618 Issue4411SettingSamlCustomUidAttribute: migrated (0.0062s) ==

== 20221220095750 Issue4434GitHubRename: migrating ============================
== 20221220095750 Issue4434GitHubRename: migrated (0.0062s) ===================

== 20221221082844 AndOrConditionsSettingDefault: migrating ====================
== 20221221082844 AndOrConditionsSettingDefault: migrated (0.0064s) ===========

== 20230120163410 TaskbarsAddAppsSupport: migrating ===========================
– change_table(:taskbars)
→ 0.2754s
== 20230120163410 TaskbarsAddAppsSupport: migrated (0.2765s) ==================

== 20230125114758 Issue4437AddCleanup: migrating ==============================
== 20230125114758 Issue4437AddCleanup: migrated (0.0049s) =====================

== 20230129220648 TaskbarUpdatePreferenceTasks: migrating =====================
== 20230129220648 TaskbarUpdatePreferenceTasks: migrated (5.5302s) ============

== 20230130173733 AddTaskbarCleanupJob: migrating =============================
== 20230130173733 AddTaskbarCleanupJob: migrated (0.0053s) ====================

== 20230207152500 TicketArticleNoteHint: migrating ============================
== 20230207152500 TicketArticleNoteHint: migrated (0.0063s) ===================

== 20230224160814 Issue4505InconsistentScreenViewActiveAttribute: migrating ===
== 20230224160814 Issue4505InconsistentScreenViewActiveAttribute: migrated (0.0062s)

Updating translations…

/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in exec_params': PG::InternalError: ERROR: posting list tuple with 9 items cannot be split at offset 24 (ActiveRecord::StatementInvalid) from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in block (2 levels) in exec_no_cache’
from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activesupport-6.1.7.2/lib/active_support/dependencies/interlock.rb:48:in block in permit_concurrent_loads' .... with_transaction_returning_status' from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/connection_adapters/abstract/database_statements.rb:318:in transaction’
from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/transactions.rb:350:in with_transaction_returning_status' ..... /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in exec_params’: ERROR: posting list tuple with 9 items cannot be split at offset 24 (PG::InternalError)
from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in block (2 levels) in exec_no_cache' from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activesupport-6.1.7.2/lib/active_support/dependencies/interlock.rb:48:in block in permit_concurrent_loads’

with_transaction_returning_status’
from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/connection_adapters/abstract/database_statements.rb:318:in transaction' from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/transactions.rb:350:in with_transaction_returning_status’
from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.2/lib/active_record/transactions.rb:302:in `save!’

Configuring Elasticsearch…

→ Installing ingest-attachment
[ingest-attachment] is no longer a plugin but instead a module packaged with this distribution of Elasticsearch
→ Please restart Elasticsearch to activate any plugins installed

Steps to reproduce the behavior:

  • N/A

How serious is this error? Any suggestion to clean it up? The Zammad instance seems to work properly but I can imagine problems with translations probably… This instance was first installed in 5.1 then upgraded to 5.2, 5.3.1 successfully, without any error.

It seems to be hinting at a PostgreSQL error. What does your database log say?

@dvanzuijlekom you are right, there is a postgresql error matching the upgrade time (~4 pm), I forgot to check it. I deleted my previous reply as it was not the right extract from the logs, sorry.

2023-03-15 16:10:06.458 CET [3094] LOG: database system is ready to accept connections
2023-03-15 16:11:37.630 CET [3701] ERROR: column taskbars.app does not exist at character 73
2023-03-15 16:11:37.630 CET [3701] STATEMENT: SELECT “taskbars”.* FROM “taskbars” WHERE “taskbars”.“user_id” = $1 AND “taskbars”.“app” = $2 ORDER BY “taskbars”.“id” ASC
2023-03-15 16:11:49.935 CET [3701] ERROR: column taskbars.app does not exist at character 73
2023-03-15 16:11:49.935 CET [3701] STATEMENT: SELECT “taskbars”.* FROM “taskbars” WHERE “taskbars”.“user_id” = $1 AND “taskbars”.“app” = $2 ORDER BY “taskbars”.“id” ASC
2023-03-15 16:12:37.175 CET [3701] ERROR: column taskbars.app does not exist at character 73
2023-03-15 16:12:37.175 CET [3701] STATEMENT: SELECT “taskbars”.* FROM “taskbars” WHERE “taskbars”.“user_id” = $1 AND “taskbars”.“app” = $2 ORDER BY “taskbars”.“id” ASC
2023-03-15 16:12:59.140 CET [3701] ERROR: column taskbars.app does not exist at character 73
2023-03-15 16:12:59.140 CET [3701] STATEMENT: SELECT “taskbars”.* FROM “taskbars” WHERE “taskbars”.“user_id” = $1 AND “taskbars”.“app” = $2 ORDER BY “taskbars”.“id” ASC
2023-03-15 16:13:04.055 CET [3701] ERROR: column taskbars.app does not exist at character 73
2023-03-15 16:13:04.055 CET [3701] STATEMENT: SELECT “taskbars”.* FROM “taskbars” WHERE “taskbars”.“user_id” = $1 AND “taskbars”.“app” = $2 ORDER BY “taskbars”.“id” ASC
2023-03-15 16:14:30.052 CET [3701] ERROR: column taskbars.app does not exist at character 73
2023-03-15 16:14:30.052 CET [3701] STATEMENT: SELECT “taskbars”.* FROM “taskbars” WHERE “taskbars”.“user_id” = $1 AND “taskbars”.“app” = $2 ORDER BY “taskbars”.“id” ASC
2023-03-15 16:15:02.359 CET [3701] ERROR: column taskbars.app does not exist at character 73
2023-03-15 16:15:02.359 CET [3701] STATEMENT: SELECT “taskbars”.* FROM “taskbars” WHERE “taskbars”.“user_id” = $1 AND “taskbars”.“app” = $2 ORDER BY “taskbars”.“id” ASC
2023-03-15 16:18:48.044 CET [5361] ERROR: table tid from new index tuple (1554,17) overlaps with invalid duplicate tuple at offset 10 of block 198 in index “index_translations_on_source”
2023-03-15 16:18:48.044 CET [5361] STATEMENT: UPDATE “translations” SET “target” = $1, “target_initial” = $2, “updated_at” = $3 WHERE “translations”.“id” = $4
2023-03-15 16:22:16.455 CET [7214] ERROR: posting list tuple with 9 items cannot be split at offset 24
2023-03-15 16:22:16.455 CET [7214] STATEMENT: UPDATE “translations” SET “target” = $1, “target_initial” = $2, “updated_at” = $3 WHERE “translations”.“id” = $4

As PostgreSQL seems to be complaining about missing columns, my first guess would be that the database migration didn’t quite succeed in updating the database schema. There are a couple of similar topics about this already. You might want to check here for instance Upgrading from 5.2.1 to zammad 5.4

I read Upgrading from 5.2.1 to zammad 5.4 and ran again the db migration as I think it is harmless to do so. I got no message doing this, either good or bad. I am not sure where I should look for something: nothing in the Postgresql logs, nothing that looks related in the Zammad log file.

BTW, I don’t know what the exact impact of this problem should be. As it is related to some translation table from what I understand, it may be that I don’t use it… At least the Engish and French versions look ok.

This morning I upgraded from zammad-5.4.0-1678779346.2fb8d15c.centos8.x86_64 to zammad-5.4.0-1679311416.157ee553.centos8.x86_64 and I got again an error during the database upgrade, again during the translation update if I’m right:

Updating translations…

/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in exec_params': PG::IndexCorrupted: ERROR: table tid from new index tuple (1951,31) overlaps with invalid duplicate tuple at offset 15 of block 198 in index "index_translations_on_source" (ActiveRecord::StatementInvalid) from /opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in block (2 levels) in exec_no_cache’

In Postgresql logs, I found :

2023-03-23 08:59:40.705 CET [1257963] ERROR: table tid from new index tuple (1951,31) overlaps with invalid duplicate tuple at offset 15 of block 198 in index “index_translations_on_source”
2023-03-23 08:59:40.705 CET [1257963] STATEMENT: UPDATE “translations” SET “target” = $1, “target_initial” = $2, “updated_at” = $3 WHERE “translations”.“id” = $4

Does it sound as something known?

The IndexCorrupted error seems to hint at an internal PostgreSQL database issue with the index. This shouldn’t happen during normal operations. Did anything strange happen to the machine/VM? For example: crashes, abnormal shutdowns, hardware issues (disk/memory), etc?

Be sure to look at the database startup messages in your PostgreSQL logs, and see if there are warnings or notifications about improper shutdowns. For example, it should normally look like this (during a restart):

2023-03-23 12:51:42.891 CET [606] LOG:  received fast shutdown request
2023-03-23 12:51:42.943 CET [606] LOG:  aborting any active transactions
2023-03-23 12:51:42.945 CET [606] LOG:  background worker "logical replication launcher" (PID 653) exited with exit code 1
2023-03-23 12:51:42.945 CET [648] LOG:  shutting down
2023-03-23 12:51:43.026 CET [606] LOG:  database system is shut down
2023-03-23 12:51:43.168 CET [237185] LOG:  starting PostgreSQL 13.9 (Debian 13.9-0+deb11u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-03-23 12:51:43.169 CET [237185] LOG:  listening on IPv6 address "::1", port 5432
2023-03-23 12:51:43.169 CET [237185] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-03-23 12:51:43.178 CET [237185] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-03-23 12:51:43.201 CET [237186] LOG:  database system was shut down at 2023-03-23 12:51:43 CET
2023-03-23 12:51:43.240 CET [237185] LOG:  database system is ready to accept connections

If something is wrong (e.g. after an earlier crash of the system), you might see messages like these:

2023-03-23 12:56:04.735 CET [237589] LOG:  starting PostgreSQL 13.9 (Debian 13.9-0+deb11u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-03-23 12:56:04.736 CET [237589] LOG:  listening on IPv6 address "::1", port 5432
2023-03-23 12:56:04.736 CET [237589] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-03-23 12:56:04.744 CET [237589] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-03-23 12:56:04.763 CET [237590] LOG:  database system was interrupted; last known up at 2023-03-23 12:51:43 CET
2023-03-23 12:56:04.841 CET [237590] LOG:  database system was not properly shut down; automatic recovery in progress
2023-03-23 12:56:04.853 CET [237590] LOG:  redo starts at 0/FDBF4778
2023-03-23 12:56:04.854 CET [237590] LOG:  invalid record length at 0/FDBF8EA8: wanted 24, got 0
2023-03-23 12:56:04.854 CET [237590] LOG:  redo done at 0/FDBF8E70
2023-03-23 12:56:04.971 CET [237589] LOG:  database system is ready to accept connections

As for the problem with the index: you might want to try to reindex it, in order to try and repair it. Be sure to read up on the documentation on REINDEX first (also check for your specific version of PostgreSQL). You might want to do this outside of business hours, especially if you want to REINDEX the entire database due to possible larger corruption issues. In the latter case, you might also want to stop Zammad, to ensure no other connections are made to the database during maintenance.

To repair the single index PostgreSQL is complaining about, issue the following command while logged on to the Zammad PostgreSQL database with a superuser account:

First repeat the following: "I won't copy/paste commands blindly from the internet before fully understanding what they do"
Yes, really... Show me the command, I know what I'm doing
REINDEX INDEX index_translations_on_source;

To REINDEX the entire database, you might want to issue the following command. This can take a long time to complete if you have a large Zammad instance, and this will probably block certain operations and/or cause strange behaviour when agents are using the system. You’ll probably want to stop Zammad.

Yes yes, show me the command already
REINDEX DATABASE zammad;

Thanks for the hints. I’ll try (after double checking I am sure to understand what I’m doing :grinning:). As the zammad/postgresql instance is running in a VM, we generally don’t do a proper shutdown when restarting the instance and we have never seen it was a problem (with other postgresql -based services) as postgresql redo the missing transactions…

My main concern is that I don’t know how to check that the problem has been cleared. Running the dataset upgrade command does not raise any exception, it happens just during upgrades…

I would like to urge you to stop doing that. Please do a clean shutdown/restart of the VM instead of doing hard resets. This can cause all sorts of subtle and not-so-subtle file corruption issues, which may take a lot of time to detect. It is basically like playing Russian Roulette with your filesystem and applications. You might be lucky and never encounter any (major) issues. You might also be extremely unlucky at some point.

I’m afraid I can’t give advice on that, maybe one of the kind Zammad folks can give some pointers on how to do integrity checking of your Zammad instance.

Noted for the clean shutdown, even if I am not convinced it is the cause of the problem. At some point during my first attempt to upgrade to 5.4, I had to revert to 5.3.1 and reupgrade and I’m wondering if it is not what caused the problem, despite the downgrade mentioned no error (in the first upgrade, Zammad was upgraded without access to the database so didn’t upgrade it).

I did the reindex on the table causing the error (I don’t want to do a full reindex if not needed and it is the only error I have seen so far). I’m interested if there is a way to reexecute what is done during the postinstall for the translations as it doesn’t seem to be what zammad run rake db:migrate does (this command produces no error).

A side note: Zammad doesn’t support downgrading.
While it’s technically possible with enough ruby knowledge and the correct commands, this may cause data loss or even worse inconsistent data. You should avoid those kind of steps when ever possible.

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.