AI OCR feature in Zammad 7 vs. Tesseract

Dear Zammad Team,

we are evaluating Zammad 7 in our test environment and found most of the AI features very useful.
Thank you very much for your effort to properly implement features where it makes sense.

One thing we noticed was the OCR feature using AI. While I think that AI has its pros, the amout of processing required (energy and ressource consumption) is significant.

For our environment, using AI for simple OCR tasks (screenshots of logs or even error messages) seems not necessary or even overengineered.
A simpler approach like by using Tesseract could help obtaining text data for AI summarization while keeping a low ressource footprint.

Have you considered using Tesseract in the development process?

I do not want to see you guys removing the AI OCR feature but maybe consider offering an alternative.
Another approach could be to use tesseract as default and switch to AI if no text has been detected.

Keen to know your thoughts about that.

Best regards,
Bernhard

For now we used a simple approach to add this support, because also mostly every newer LLM directly supporting it this days, without the need to swichting around.

It’s a similar situation like for translation, I think here are also multiple possibilities.

In the end, the current OCR-Feature implementation is only the start, there are more ideas, so maybe we will also add some more possibilities of selecting the OCR mode (LLM vs other OCR engines - but our direction would definitiv to have one way).

2 Likes

Thanks for your feedback!
I’m happy to hear that there are many ideas. Keen to see them in reality one day

Have you successfully tested the OCR feature with any providers or vision LLM models? Which models did you try can you share thoughts on results?

On zammad v7.1 I connected a local Gemma4 31B and Qwen3.6 35B, both are vision capable, unfortunately the zammad documentation is very sparse Provider — Zammad Admin Documentation documentation
With OCR toggled enabled and enabling all Ticket Summary Services Generation I tested summary of a few emails with photo images attached or inline ie jpg of objects and people and after AI summary generation do not see any mention of photo images. Next I need to try raster images of text documents.

@dominikklein
Does the current enabling “Recognize image text (OCR)” send all file attachments to LLM? Or is zammad restricting file name extension ie only jpg png and restricting to 1 or more file attachments? Including a few additional information in docs can help users properly set expectations and configurations.

Unless I am confused, I was surprised Zammad appears to only support a single Provider configuration, and also prevent enabling more than one Model from a single Provider.

A simple alternative solution that Zammad may support in future ie feature request Support multiple AI LLM providers and/or multiple AI LLM model names are allowing you to configure more than one provider and/or more than one model and allow specifying which particular model to be used for which types of inference processing. A simple approach to keeping low resource footprint would be using focused LLM model for particular task, ie a small purpose-built OCR vision LLM model that excels for recognizing image text, a different medium model for performing trivial rote tasks ( categorization, tagging etc) , a different larger model performing more complex tasks like writing assistance etc.

Another alternative you could evaluate LLM router proxies; you specify a single Provider in Zammad and that provider uses a smart router LLM analyzes your multi modal text and image context and intelligently routes the context to best model and engine for actual final processing. This would allow you to direct and control your resource usage and maybe keep lower resource footprint.

Keep in mind, we have limited resources, there will be improvements in the future, also about multi-provider configurations, and decide per feature, and so on.

For some features like OCR, you can at least for the same provider already use a different LLM.

In the end there are a lot of possibilities, but we need to see, what is the best way for us.

1 Like

Yes, of course.

v7.1 is really excellent!

Thank you!

@chrisl

We’re using currently our own ollama instance in the datacenter with lfm2:24b for text/generic/agentic tasks and richardyoung/olmocr2:7b-q8 for OCR.

Both models fit nicely in out AI servers memory. They are configured to be persistent and operate at acceptable speeds. OCR seem to work but I have not yet checked if the returned text is of good quality. I wonder if this is visible in the AI debug logs.

Nice!

I’ve never yet tried LiquidAI models. I need to give lfm2:24b a spin and see how it goes. I want to run Zammad on some more ancient hardware but for proof of concept did quick tests on a recent machine with Gemma4 12B and 31B (dense models), for a few tests with llamacpp on two rtx30x0s and with medium quants it was too slow ( had default thinking enabled. forgot to try disabling thinking which would offer speed-up), Zammad ui long spinners waiting before any output displayed that I believe human user agents would be frustrated and find pointless waiting for a summary when reading the actual ticket would be faster :slight_smile:
Next now trying Qwen3.6-35B-A3B-MTP-GGUF thinking disabled, that’s MoE sparse like your LFM2:24b , since interwebs appear to hype its capabilities for its relative small size. I started running the tiny iq2 quant variant to see if it outputs anything strange or silly because its so small and no problems thus far and its real quick.
Both Gemma4 and Qwen3.6 support both text and image using same model.

Curious any reason you decided to run lfm2:24b and/or whether you compared it against any other similar sized models running with Zammad ranking for effectiveness?

Btw, is there any particular reason you are running ollama instead of llamacpp since ollama rides on top of the llamacpp engine? How were you able to setup ollama with two different models and also expose them both to Zammad as a single provider fqdn or ip:port ? … update: wow, I just learned that ollama supports running multiple models spawned by single service ollama/docs/faq.mdx at main · ollama/ollama · GitHub