Zammad Search Index Rebuild Error: "ArgumentError: invalid byte sequence in UTF-8"

Infos:

  • Used Zammad version: 6.2.0-1705322325.8ff65214.bookworm
  • Used Zammad installation type: Package
  • Used Elasticsearch vesrion: 7.17.16
  • Operating system: Debian 12
  • Browser + version: N/A

Expected behavior:

  • Search index rebuild is successful for Elasticsearch

Actual behavior:

  • Search index rebuild errors out at certain tickets with:
    ArgumentError: invalid byte sequence in UTF-8

Here’s what a full example of what an error looks like with ticket referenced:

Dropping indexes... done.
Deleting pipeline... done.
Creating indexes... done.
Creating pipeline... done.
Reloading data... 
  - Chat::Session...
    done in 0 seconds.
  - Cti::Log...   
    done in 0 seconds.
  - Group...
    done in 0 seconds.
  - KnowledgeBase::Answer::Translation...
    done in 0 seconds.
  - KnowledgeBase::Category::Translation...
    done in 0 seconds.
  - KnowledgeBase::Translation...
    done in 0 seconds.
  - Organization...
    done in 0 seconds.
  - StatsStore... 
    done in 0 seconds.
  - Ticket::Priority...
    done in 0 seconds.
  - Ticket::State...
    done in 0 seconds.
  - Ticket...
rake aborted!
Unable to send Ticket.find(240102).search_index_update_backend backend: #<ArgumentError: invalid byte sequence in UTF-8>
/opt/zammad/app/models/concerns/has_search_index_backend.rb:200:in `rescue in block in search_index_reload'
/opt/zammad/app/models/concerns/has_search_index_backend.rb:194:in `block in search_index_reload'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:627:in `call_with_index'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:394:in `block in work_direct'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:637:in `with_instrumentation'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:393:in `work_direct'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:285:in `map'
/opt/zammad/app/models/concerns/has_search_index_backend.rb:191:in `search_index_reload'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:42:in `block (5 levels) in <main>'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:41:in `block (4 levels) in <main>'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:39:in `each'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:39:in `block (3 levels) in <main>'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:60:in `block (3 levels) in <main>'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/exe/rake:27:in `<top (required)>'
/opt/zammad/bin/bundle:121:in `load'
/opt/zammad/bin/bundle:121:in `<main>'

Caused by:
ArgumentError: invalid byte sequence in UTF-8
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.8/lib/active_support/core_ext/object/blank.rb:127:in `match?'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.8/lib/active_support/core_ext/object/blank.rb:127:in `blank?'
/opt/zammad/app/models/ticket/search_index.rb:86:in `search_index_attribute_lookup_file_oversized?'
/opt/zammad/app/models/ticket/search_index.rb:52:in `block (2 levels) in search_index_attribute_lookup'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/delegation.rb:88:in `each'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/delegation.rb:88:in `each'
/opt/zammad/app/models/ticket/search_index.rb:48:in `block in search_index_attribute_lookup'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/batches.rb:71:in `each'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/batches.rb:71:in `block in find_each'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/batches.rb:138:in `block in find_in_batches'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/batches.rb:245:in `block in in_batches'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/batches.rb:229:in `loop'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/batches.rb:229:in `in_batches'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/batches.rb:137:in `find_in_batches'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.8/lib/active_record/relation/batches.rb:70:in `find_each'
/opt/zammad/app/models/ticket/search_index.rb:22:in `each'
/opt/zammad/app/models/ticket/search_index.rb:22:in `search_index_attribute_lookup'
/opt/zammad/app/models/concerns/has_search_index_backend.rb:137:in `search_index_update_backend'
/opt/zammad/app/models/concerns/has_search_index_backend.rb:195:in `block in search_index_reload'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:627:in `call_with_index'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:394:in `block in work_direct'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:637:in `with_instrumentation'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:393:in `work_direct'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/parallel-1.23.0/lib/parallel.rb:285:in `map'
/opt/zammad/app/models/concerns/has_search_index_backend.rb:191:in `search_index_reload'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:42:in `block (5 levels) in <main>'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:41:in `block (4 levels) in <main>'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:39:in `each'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:39:in `block (3 levels) in <main>'
/opt/zammad/lib/tasks/zammad/search_index_es.rake:60:in `block (3 levels) in <main>'
/opt/zammad/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/exe/rake:27:in `<top (required)>'
/opt/zammad/bin/bundle:121:in `load'
/opt/zammad/bin/bundle:121:in `<main>'
Tasks: TOP => zammad:searchindex:rebuild
(See full trace by running task with --trace)

Inspecting the ticket doesn’t reveal any unusual characters. I’ve even attempted to delete a ticket where the ticket was non-essential but it keeps occuring on multiple tickets so impossible to delete them all.

Steps to reproduce the behavior:

  • Try to rebuild the search index using this command:
    zammad run rake zammad:searchindex:rebuild

All was working fine in previous versions of Zammad but now can’t rebuild search index for Elasticsearch due to this error. Any help appreciated.

Nudging this issue - any help appreciated.

Further investigations:

:ballot_box_with_check: Checked Postgres database and that is encoded in UTF8.
:ballot_box_with_check: Upgraded Elasticseach 7.17.16 to 7.17.17 and still an issue.
:ballot_box_with_check: Upgraded to Elasticsearch 8.12 and still an issue.

I can’t rebuild ny search index at all. Trying everything but no luck. Any ideas?

Maybe report as bug on Github?

Thanks, I’ve been trying a ton of ideas without compromising the security of the database. Finally, I had success by running:

zammad run rake zammad:searchindex:rebuild[8]

Meaning, to my understanding, it would utilise all CPU cores on my machine. It took a long time but finally made it to the end! No idea what this works and still confused about the whole UTF8 issue but hopefully this helps someone in the future.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.