This might be an edge case due to some weird form of email forwarding or allowlisting being done on a remote MX, because I know Zammad will usually log an error in a ticket when an e-mail fails to be delivered to a recipient, so the behaviour below is puzzling me.
Infos:
Used Zammad version: 5.3.1-1671611341.37743b14.focal
When Zammad sends notifications to all agents about new tickets, or ticket updates and encounters a specific SMTP error on one of the email-addresses, it logs an error message about the address and skips to the next one in the list of agents to try to notify.
Actual behavior:
It seems Zammad stops sending pending notifications entirely if a single agent e-mailaddress causes a ācallout verification failedā event on a remote MX. It seems to crash the channel with a fatal error. It does show up in the monitoring, but with little extra information.
The ticket history shows two notifications were sent, the last one in the list appears to be the one which crashed the driver (marked with an arrow). No error is displayed in either the ticket history or the ticket itself.
You need to have a couple of agents and at least one agent with an email-address which produces an SMTP error on a remote MX, for example, this list of agents:
SMTP log on relayhost (redacted). The relayhost receives an error from a remote MX, which is probably some sort of forwarding service for this specific agentās emailaddress (ZZZ). The XXX is our own SMTP relayhost, YYY is the remote MX:
Feb 17 12:10:29 relay.XXXXXXXX.XXX exim[20764]: 2023-02-17 12:10:29 H=(XXXXXXXXXX) [XXX.XXX.XXX.XXX] X=TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256 CV=no F=<helpdesk@example.com> A=login:XXXXXXXXX rejected RCPT <ZZZZZZZZ>: mail.YYYYYYY.YYY [YYY.YYY.YYY.YYY] : SMTP error from remote mail server after RCPT TO:<ZZZZZZZZ>: 550 5.1.0 <helpdesk@example.com>: Sender address rejected: User unknown in virtual mailbox table
Channels are self healing. As your notification channel is affected, the issue will persist as long as Zammad tries to send mail to the problem agent. As soon as you solve that (whatever it is) and the next email notification is being send, the channel automatically switches back to āOKā / green and will disappear from your monitoring.
The 550 is not a Zammad code and comes from your mail server and out of Zammadās control. The mail transactions (apart from that one affected) should still send normally as expected.
I understand that. What I donāt understand is that a single error on one of the agent email addresses causes a fatal error on the notifications sender, which in turn means no further pending notifications about a ticket-update are sent to the rest of the agents. The entire process is aborted. I would have expected the notifications sender to just skip to the next agent email address to notify, instead of aborting/crashing entirely.
No, they do not. That is what I would have expected too, whenever the affected agent is part of the group of agents to be notified of a ticket update, the notifications sender aborts all further pending notifications when it gets to the affected agent and receives an error from the mailserver.
I think Zammad treats the mailserver error as a generic fatal server error, while it is in fact just a fatal error on a single recipient.
That is impossible.
Zammad sends each agent mail one by one (as it does with every mail).
Are your other agents really not receiving any mail while having it configured�
If nobody receives mail, it would mean that every single mail Zammad tries to send via that channel is being rejected by either your relay or the receiving server.
Your production log might be gold in that scenario.
Code wise Iām very sure that your described behavior is not a thing, but as youāre saying youāre actually seeing it happen, the log would be the proof most likely.
That being said - your instance would be the first ever Iāve seen to behave like that.
Another side note⦠possibly have a look into your users preferences (User.find_by(email: āā).preferences) and check if thereās mail_delivery_failed: true set. If thatās the case, the affected user shouldnāt receive mail at all.
Trust me, Iām quite sure of this. Iāve even reproduced it for you to see for yourself (see further down below).
No. Itās not that nobody receives mail, itās that everybody receives email up until it reaches the agent with issues. The error causes the notifications sender to crash and it seems to abort the rest of the queue for pending notifications.
The log shows the same error as Iāve posted in the start of the topic.
I understand that, itās probably less likely that people will both have issues with their email account and have notifications enabled via email. I have tried to explain it in the following reproduction.
Reproduction
Reference starting point
A Zammad instance with multiple agents, some of whom have email notifications configured.
In the meantime during the start of this topic and this update, we have updated Zammad to version 5.3.1-1678258494.e38c655f.focal, and this test is done on that version.
When a new ticket is created by a customer sending an email to the helpdesk, the following happens regarding notifications:
The bart user is a fictional account, created for this test. It has deliberately been created with a letter early in the alphabet, as notifications seem to be sent in alphabetical order. The dennis account is my own Zammad useraccount.
Simulating the error conditions
Adjust the email address of one of the agents (the bart account in this example) to an address which doesnāt exist on an external mailserver. This user must have email notifications configured in order to trigger the error. Let the customer update the ticket by sending an email. The following happens regarding the notifications:
Anonymized Zammad production.log during the simulated error condition, from the start of the IMAP fetch of the email up to the error during the notifications sender:
I, [2023-03-16T12:54:08.928311 #1016278-112800] INFO -- : fetching imap (imap.example.com/username@example.com port=993,ssl=true,starttls=false,folder=INBOX,keep_on_server=false,auth_type=LOGIN)
I, [2023-03-16T12:54:09.111174 #1016278-112800] INFO -- : - message 1/1
I, [2023-03-16T12:54:09.200593 #1016278-112800] INFO -- : Process email with msgid '<12345678-1234-1234-1234-1234567890@gmail.com>'
I, [2023-03-16T12:54:09.219278 #1016278-112800] INFO -- : matching content 'subject' contains TESTTICKET
I, [2023-03-16T12:54:09.228619 #1016278-112800] INFO -- : set_attributes_by_x_headers header x-zammad-article-preferences found. Assign preferences={"send-auto-response"=>true, "is-auto-response"=>false}
I, [2023-03-16T12:54:09.368808 #1016278-112800] INFO -- : ended Channel.fetch took: 0.680957435 seconds.
I, [2023-03-16T12:54:10.134020 #523-15832760] INFO -- : Started GET "/api/v1/tickets/1330?full=true&_=1678967257026" for 2aaa:aaaa:aaaa:a:aaaa:aaaa:aaa:aaaa at 2023-03-16 12:54:10 +0100
I, [2023-03-16T12:54:10.138714 #523-15832760] INFO -- : Processing by TicketsController#show as JSON
I, [2023-03-16T12:54:10.138807 #523-15832760] INFO -- : Parameters: {"full"=>"true", "_"=>"1678967257026", "id"=>"1330"}
I, [2023-03-16T12:54:10.165504 #523-15832760] INFO -- : Completed 200 OK in 27ms (Views: 1.4ms | ActiveRecord: 5.2ms | Allocations: 14836)
I, [2023-03-16T12:54:10.509511 #523-15727360] INFO -- : Started GET "/api/v1/tickets/1330?all=true&auto_assign=true&_=1678967257027" for 2aaa:aaaa:aaaa:a:aaaa:aaaa:aaa:aaaa at 2023-03-16 12:54:10 +0100
I, [2023-03-16T12:54:10.513840 #523-15727360] INFO -- : Processing by TicketsController#show as JSON
I, [2023-03-16T12:54:10.513885 #523-15727360] INFO -- : Parameters: {"all"=>"true", "auto_assign"=>"true", "_"=>"1678967257027", "id"=>"1330"}
I, [2023-03-16T12:54:10.736109 #523-15727360] INFO -- : Completed 200 OK in 222ms (Views: 10.7ms | ActiveRecord: 14.1ms | Allocations: 169093)
I, [2023-03-16T12:54:11.078613 #523-15726760] INFO -- : Started POST "/api/v1/integration/smime" for 2aaa:aaaa:aaaa:a:aaaa:aaaa:aaa:aaaa at 2023-03-16 12:54:11 +0100
I, [2023-03-16T12:54:11.084817 #523-15726760] INFO -- : Processing by Integration::SMIMEController#search as JSON
I, [2023-03-16T12:54:11.084892 #523-15726760] INFO -- : Parameters: {"ticket"=>{"group_id"=>"1", "owner_id"=>"", "state_id"=>"1", "priority_id"=>"2", "xxxxxxxxx"=>"", "xxxxxxxxxx"=>"", "xxxxxxxxx"=>"", "xxxxxxxxx"=>"", "xxxxxxxxxxxxx"=>[]}, "article"=>{"body"=>"[FILTERED]", "type"=>"note", "internal"=>true, "form_id"=>"967544056", "shared_draft_id"=>"", "subtype"=>"", "in_reply_to"=>"", "to"=>"", "cc"=>"", "subject"=>""}}
I, [2023-03-16T12:54:11.097857 #523-15726760] INFO -- : Completed 200 OK in 13ms (Views: 0.2ms | ActiveRecord: 2.9ms | Allocations: 6424)
I, [2023-03-16T12:54:11.421352 #1016278-112020] INFO -- : ProcessScheduledJobs running...
I, [2023-03-16T12:54:11.423485 #1016278-112020] INFO -- : Running job thread for 'Process ticket escalations.' (Ticket.process_escalation) status is: sleep
I, [2023-03-16T12:54:11.423704 #1016278-112020] INFO -- : Running job thread for 'Generate 'Session' data.' (Sessions.jobs) status is: sleep
I, [2023-03-16T12:54:11.423730 #1016278-112020] INFO -- : Running job thread for 'Check channels.' (Channel.fetch) status is: sleep
I, [2023-03-16T12:54:11.423748 #1016278-112020] INFO -- : Running job thread for 'Check 'Channel' streams.' (Channel.stream) status is: sleep
I, [2023-03-16T12:54:11.424014 #1016278-112020] INFO -- : Running job thread for 'Execute planned jobs.' (Job.run) status is: sleep
I, [2023-03-16T12:54:12.277405 #1016278-111980] INFO -- : 2023-03-16T12:54:12+0100: [Worker(host:helpdesk.example.com pid:1016278)] Job TransactionJob [fd8b3789-0879-4b59-9661-3d0651eae6ba] from DelayedJob(default) with arguments: [{"object"=>"Ticket", "object_id"=>1330, "user_id"=>180, "created_at"=>{"_aj_serialized"=>"ActiveJob::Serializers::TimeWithZoneSerializer", "value"=>"2023-03-16T11:54:09.248224974Z"}, "type"=>"update", "article_id"=>4183, "_aj_symbol_keys"=>["object", "object_id", "user_id", "created_at", "type", "article_id"]}, {"interface_handle"=>"scheduler", "disable"=>[], "_aj_symbol_keys"=>["interface_handle", "disable"]}] (id=3320203) (queue=default) RUNNING
I, [2023-03-16T12:54:12.891533 #523-15601600] INFO -- : Started PUT "/api/v1/taskbar/4856" for 2aaa:aaaa:aaaa:a:aaaa:aaaa:aaa:aaaa at 2023-03-16 12:54:12 +0100
I, [2023-03-16T12:54:12.896534 #523-15601600] INFO -- : Processing by TaskbarController#update as JSON
I, [2023-03-16T12:54:12.896617 #523-15601600] INFO -- : Parameters: {"key"=>"Ticket-1330", "client_id"=>"123", "callback"=>"TicketZoom", "state"=>{"ticket"=>{}, "article"=>{"form_id"=>"967544056"}}, "params"=>{"ticket_id"=>"1330", "article_id"=>"4182", "shown"=>true}, "prio"=>7, "notify"=>true, "active"=>true, "attachments"=>[], "updated_at"=>"2023-03-16T11:51:53.513Z", "id"=>"4856"}
I, [2023-03-16T12:54:12.918715 #523-15601600] INFO -- : Completed 200 OK in 22ms (Views: 0.3ms | ActiveRecord: 3.1ms | Allocations: 11602)
E, [2023-03-16T12:54:13.939089 #1016278-111980] ERROR -- : Can't use Channel::Driver::Smtp: #<Net::SMTPFatalError: 550-Callout verification failed:
>
E, [2023-03-16T12:54:13.939312 #1016278-111980] ERROR -- : 550-Callout verification failed:
(Net::SMTPFatalError)
app/models/channel/driver/smtp.rb:93:in `send'
app/models/channel.rb:257:in `deliver'
lib/notification_factory/mailer.rb:170:in `send'
lib/notification_factory/mailer.rb:223:in `notification'
app/models/transaction/notification.rb:216:in `block in perform'
app/models/transaction/notification.rb:105:in `each'
app/models/transaction/notification.rb:105:in `perform'
lib/transaction_dispatcher.rb:61:in `execute_single_backend'
app/jobs/transaction_job.rb:25:in `block in perform'
app/jobs/transaction_job.rb:21:in `perform'
lib/background_services/service/process_delayed_jobs.rb:25:in `block (2 levels) in launch'
lib/background_services/service/process_delayed_jobs.rb:23:in `block in launch'
lib/background_services/service/process_delayed_jobs.rb:20:in `loop'
lib/background_services/service/process_delayed_jobs.rb:20:in `launch'
lib/background_services/service.rb:27:in `block in run'
lib/application_handle_info.rb:19:in `use'
lib/background_services/service.rb:33:in `block in run_in_service_context'
lib/background_services/service.rb:32:in `run_in_service_context'
lib/background_services/service.rb:26:in `run'
lib/background_services.rb:67:in `block in start_as_thread'
E, [2023-03-16T12:54:13.944715 #1016278-111980] ERROR -- : Can't use Channel::Driver::Smtp: #<Net::SMTPFatalError: 550-Callout verification failed:
> (RuntimeError)
app/models/channel.rb:270:in `rescue in deliver'
app/models/channel.rb:245:in `deliver'
lib/notification_factory/mailer.rb:170:in `send'
lib/notification_factory/mailer.rb:223:in `notification'
app/models/transaction/notification.rb:216:in `block in perform'
app/models/transaction/notification.rb:105:in `each'
app/models/transaction/notification.rb:105:in `perform'
lib/transaction_dispatcher.rb:61:in `execute_single_backend'
app/jobs/transaction_job.rb:25:in `block in perform'
app/jobs/transaction_job.rb:21:in `perform'
lib/background_services/service/process_delayed_jobs.rb:25:in `block (2 levels) in launch'
lib/background_services/service/process_delayed_jobs.rb:23:in `block in launch'
lib/background_services/service/process_delayed_jobs.rb:20:in `loop'
lib/background_services/service/process_delayed_jobs.rb:20:in `launch'
lib/background_services/service.rb:27:in `block in run'
lib/application_handle_info.rb:19:in `use'
lib/background_services/service.rb:33:in `block in run_in_service_context'
lib/background_services/service.rb:32:in `run_in_service_context'
lib/background_services/service.rb:26:in `run'
lib/background_services.rb:67:in `block in start_as_thread'
I, [2023-03-16T12:54:13.946244 #1016278-111980] INFO -- : 2023-03-16T12:54:13+0100: [Worker(host:helpdesk.example.com pid:1016278)] Job TransactionJob [fd8b3789-0879-4b59-9661-3d0651eae6ba] from DelayedJob(default) with arguments: [{"object"=>"Ticket", "object_id"=>1330, "user_id"=>180, "created_at"=>{"_aj_serialized"=>"ActiveJob::Serializers::TimeWithZoneSerializer", "value"=>"2023-03-16T11:54:09.248224974Z"}, "type"=>"update", "article_id"=>4183, "_aj_symbol_keys"=>["object", "object_id", "user_id", "created_at", "type", "article_id"]}, {"interface_handle"=>"scheduler", "disable"=>[], "_aj_symbol_keys"=>["interface_handle", "disable"]}] (id=3320203) (queue=default) COMPLETED after 1.6687
I, [2023-03-16T12:54:13.947954 #1016278-111980] INFO -- : 2023-03-16T12:54:13+0100: [Worker(host:helpdesk.example.com pid:1016278)] Job TicketUserTicketCounterJob [5bb25695-68da-4cc8-af7c-49fa3555cd17] from DelayedJob(default) with arguments: [180, 180] (id=3320204) (queue=default) RUNNING
Oh, that appears to be the exact issue indeed. I searched the forum for this problem before posting it, after not finding relevant results, but I didnāt check the open issues on GitHub, sorry about that