Elasticsearch forgets results from attachments over time

Infos:

  • Used Zammad version: 5.1.0
  • Used Zammad installation type: Package (new, no upgrades)
  • Used Elasticsearch version: 7.17.2
  • Operating system: Ubuntu 20.04.4
  • Browser + version: Any

Expected behavior:

  • Getting Elasticsearch results from attachments anytime

Actual behavior:

  • Elasticsearch ‘forgets’ results from attachments over time

Steps to reproduce the behavior:

  • attach a pdf
  • search for a string in this file and find it
  • wait a few hours
  • search for the same string again, the result disappeared
  • reindex elasticsearch, the result will be back for a few hours

Summary

Good day everyone,

Elasticsearch forgets results from attachments over time.
After a reindex the problem is gone for a few hours.

I’m not sure if this is a configuration issue somewhere. If so I was unable to find it. 8)

I tried to find the cause of the behaviour, but I can’t find any errors in the logs.
In the process I noticed this two issues: Elasticsearch Index reseted during ticket delete · Issue #2725 · zammad/zammad · GitHub and Removing a record drops whole index on Elasticsearch 6 and later · Issue #2742 · zammad/zammad · GitHub

This command helps to get deleted KB records back, I can find my test strings from attached PDFs in Zammad again:
zammad run rails r ‘p KnowledgeBase::Answer::Translation.search_index_reload’

However, if i compare curl -X GET ‘http://localhost:9200/_cat/indices?v’ before and after I can’t find much of a difference.

As only attachments are affected I tried to reinstall the elasticsearch-plugin ingest-attachment.
This worked fine too, but didn’t change the behavior.

Does anyone have an idea why some elasticsearch records get deleted over time?
Or am I on the wrong trace and missing something else here?

Best,
Tom

Meanwhile I’ve tested this on another installation (package on Ubuntu, too).

The result is the same, Elasticsearch forgets attachments after a few hours, a reindex always helps, for a few hours.

Here is what I mean, a search string which should match a pdf attachment but does not:

After rebuilding the index with zammad run rake searchindex:rebuild the result is back:

It’s odd, because no-one else seems to have seen this problem? :upside_down_face:

I could’t find the source of the problem, on any Zammad instance, maybe time will help.
Until then I’ll use cron to circumnavigate this.

# elasticsearch attachments index workaround
7 */3  * * *   root    zammad run rake searchindex:rebuild > /dev/null
1 Like

That cronjob is the shittiest “solution” I’ve seen in a while. No offense.
Also it affects the complete searchindex not just your knowledgebase which seems to be affected in your case.

I’ve searched for a fairly old attachment name in our support instance (ticket context) and couldn’t find it as well. I then checked a not so old (2 weeks) attachment, same context, same issue.

Interestingly, on my 5.1 and 5.2 test installations with default setup the attachments are not indexed at all. What catched my eye here, index wise, was this in the doc of an relevant ticket entry

	"_ignored": [
		"article.attachment.content.keyword"
	],

So I’d guess it’s a bug. Please create a bug report on GitHub - zammad/zammad: Zammad is a web based open source helpdesk/customer support system and mention this thread.

It’s not a solution. It’s a workaround so agents can search in Zammad.

I’ve created an issue: Elasticsearch forgets results from attachments over time · Issue #4134 · zammad/zammad · GitHub

2 Likes