[solved] I need help. Not working web zammad, or very slow

Used Zammad version: 2.9 package

  • Operating system: ubuntu 18.04, centos7
  • Browser + version: chrome
    Customer: ~500
    Agent: ~ 100 activ

Good afternoon.
Previously, the system was running on centos7 and worked stably with 100 agents. Friday was import added 170 agents, but the staff still did not go in. Immediately after this zammad began to hang.
Deployed a new vm ubuntu 18.04 and restored from backup, didn’t help. At night, the system is stable. We stand in the morning to agents to begin to work actively, everything absolutely dies.
12 cores
16gb ram

You are running 100 concurrent agents on 12 cores with 16 GB of RAM with elasticsearch and postgresql and claim it was running stable?

I’m having some hard time in believing that to be honest.

However, what exactly changed during the import? Did you start using organizations during import?
How is your delayed Jobs looking?

zammad run rails r 'p Delayed::Job.count'
zammad run rails r 'p Delayed::Job.first'
zammad run rails r 'p Delayed::Job.last'

What else changed during import and how exactly did you import what?

root@zammad:~# zammad run rails r ‘p Delayed::Job.count’
284
root@zammad:~# zammad run rails r ‘p Delayed::Job.first’
#<Delayed::Backend::ActiveRecord::Job id: 3626504, priority: 0, attempts: 0, handler: “— !ruby/object:ActiveJob::QueueAdapters::Delayed…”, last_error: nil, run_at: “2019-03-06 08:37:28”, locked_at: “2019-03-06 09:10:37”, failed_at: nil, locked_by: “host:zammad pid:1070”, queue: “default”, created_at: “2019-03-06 08:37:28”, updated_at: “2019-03-06 08:37:28”>
root@zammad:~# zammad run rails r ‘p Delayed::Job.last’
#<Delayed::Backend::ActiveRecord::Job id: 3626790, priority: 0, attempts: 0, handler: “— !ruby/object:ActiveJob::QueueAdapters::Delayed…”, last_error: nil, run_at: “2019-03-06 09:10:49”, locked_at: nil, failed_at: nil, locked_by: nil, queue: “default”, created_at: “2019-03-06 09:10:49”, updated_at: “2019-03-06 09:10:49”>
root@zammad:~#

During the import everything went well, but after an hour stopped working!
RAM 24GB
zammad%20htop|641x500

can I ask you to join us?

import example

id,login,firstname,lastname,email,web,phone,fax,mobile,department,street,zip,city,country,address,vip,verified,active,note,last_login,out_of_office,out_of_office_start_at,out_of_office_end_at,roles,organization
,ARKHANGELSK1@A???R.COM,Анна,Коломийцева,ARKHANGELSK1@A???R.COM,false,false,true,false,Сотрудник Архангельск,A???R
,ARKHANGELSK2@A???R.COM,Анастасия,Казакова,ARKHANGELSK2@A???R.COM,false,false,true,false,Сотрудник Архангельск,A???R
,ARKHANGELSK3@A???R.COM,Наталья,Дружинина,ARKHANGELSK3@A???R.COM,false,false,true,false,Сотрудник Архангельск,A???R

SOLVED

mount -t tmpfs -o size=8g tmpfs /opt/zammad/tmp

root@zammad:~# zammad run rails r ‘p Delayed::Job.count’
997
root@zammad:~# zammad run rails r ‘p Delayed::Job.first’
#<Delayed::Backend::ActiveRecord::Job id: 3627555, priority: 0, attempts: 0, handler: “— !ruby/object:ActiveJob::QueueAdapters::Delayed…”, last_error: nil, run_at: “2019-03-06 09:56:54”, locked_at: “2019-03-06 10:20:54”, failed_at: nil, locked_by: “host:zammad pid:29841”, queue: “default”, created_at: “2019-03-06 09:56:54”, updated_at: “2019-03-06 09:56:54”>
root@zammad:~# zammad run rails r ‘p Delayed::Job.last’
#<Delayed::Backend::ActiveRecord::Job id: 3628566, priority: 0, attempts: 0, handler: “— !ruby/object:ActiveJob::QueueAdapters::Delayed…”, last_error: nil, run_at: “2019-03-06 10:21:07”, locked_at: nil, failed_at: nil, locked_by: nil, queue: “default”, created_at: “2019-03-06 10:21:07”, updated_at: “2019-03-06 10:21:07”>

Did you upgrade Zammad from e.g. 2.8 or earlier to 2.9 as well?
If your Delayed::Job.count keeps on rising, please check if your elasticsearch is up and running (your production log might give hints as well).

If you upgraded, you might want to want to remove the first delayed job.
The following might appear in one of your scheduler_err.log

Delayed::DeserializationError: Job failed to load: undefined class/module Observer::Ticket::UserTicketCounter::BackgroundJob. Handler: "--- !ruby/object:Observer::Ticket::UserTicketCounter::BackgroundJob\ncustomer_id: 154\nupdated_by_id: 113\n"
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/delayed_job-4.1.3/lib/delayed/backend/base.rb:74:in `rescue in payload_object'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/delayed_job-4.1.3/lib/delayed/backend/base.rb:71:in `payload_object'
/opt/zammad/app/models/scheduler.rb:167:in `cleanup_delayed'
/opt/zammad/app/models/scheduler.rb:143:in `block (3 levels) in cleanup_delayed_jobs'
/opt/zammad/lib/mixin/start_finish_logger.rb:7:in `log_start_finish'
/opt/zammad/app/models/scheduler.rb:142:in `block (2 levels) in cleanup_delayed_jobs'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/activerecord-5.1.6.1/lib/active_record/relation/delegation.rb:39:in `each'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/activerecord-5.1.6.1/lib/active_record/relation/delegation.rb:39:in `each'
/opt/zammad/app/models/scheduler.rb:141:in `block in cleanup_delayed_jobs'
/opt/zammad/lib/mixin/start_finish_logger.rb:7:in `log_start_finish'
/opt/zammad/app/models/scheduler.rb:139:in `cleanup_delayed_jobs'
/opt/zammad/app/models/scheduler.rb:126:in `cleanup'
/opt/zammad/app/models/scheduler.rb:23:in `threads'
script/scheduler.rb:66:in `block in <top (required)>'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/application.rb:270:in `block in start_proc'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/application.rb:279:in `start_proc'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/application.rb:300:in `start'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/controller.rb:56:in `run'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons.rb:197:in `block in run_proc'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/cmdline.rb:92:in `catch_exceptions'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons.rb:196:in `run_proc'
script/scheduler.rb:54:in `<top (required)>'

Please note that I can not support via SSH, phone, mail or private messages, as this would be unfair to the community itself. However, we can offer commercial services to you if you want.

I have already shown your proposal-waiting for approval from the management.

Аfter mount Delayed::Job.count keeps on rising. now 5000
scheduler_err.log - empty
https://yadi.sk/d/tIRc0pMuhIfN-g - production log. Please looking

Solved

zammad run rake jobs:work

do I need to add to crontab? if you stop Delayed::Job.count keeps on rising

now

  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 1 hour
  • channel is active but not fetched for about 2 hours
  • channel is active but not fetched for about 2 hours
  • channel is active but not fetched for about 2 hours
  • channel is active but not fetched for about 2 hours
  • channel is active but not fetched for about 2 hours
  • channel is active but not fetched for about 2 hours

Thanks for the logfile, that’s definitely helping.
First of all: Please restart your Zammad-Service.

You’re experiencing several problems - the biggest one might be your database server refusing connections:

I, [2019-03-06T11:52:17.803120 #109335-47095824027420]  INFO -- : Completed 422 Unprocessable Entity in 9210ms (Views: 0.3ms | ActiveRecord: 8713.1ms)
E, [2019-03-06T11:52:17.804911 #109335-69873216173500] ERROR -- : PG::UnableToSend: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

(Regarding this, you might want to raise max_connections inside the postgresql config, something like 250 and higher should be good enough. You might also want to restart your postgresql service before your zammad service, as it might also be that your operating system is killing processes because of insuficient memory.

(The following error actually seems to proove me right)
E, [2019-03-06T12:50:32.683663 #10289-69916700177060] ERROR -- : thread_client 47182414241020 exited with error #<PG::ConnectionBad: FATAL: remaining connection slots are reserved for non-replication superuser connections

Your second problem (has been) elasticsearch - I’ve seen on the end of your log that you seemed to fixed that one so it should be fine.

E, [2019-03-06T11:53:23.019631 #1070-47204663792540] ERROR -- : Error performing SearchIndexJob (Job ID: 632215e3-1b05-4831-8059-d9f5cdba2e45) from DelayedJob(default) in 24.04ms: RuntimeError (Unable to process POST request to elasticsearch URL 'http://localhost:9200/zammad.localdomain_zammad_production/User/243?pipeline=zammad-attachment'. Elasticsearch is not reachable, probably because it's not running or even installed.

After elasticsearch errors, your database issue appeared again (you really should check that), the database server is also the reason why the scheduler can’t run - Zammad needs a working database connection, thus fetching your mail accounts did not happen/work:

E, [2019-03-06T12:31:50.033255 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 1) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:31:50.033540 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:31:52.034510 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 2) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:31:52.034782 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:31:55.035760 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 3) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:31:55.036114 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:31:59.037101 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 4) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:31:59.037386 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:04.038205 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 5) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:32:04.038451 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:10.039367 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 6) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:32:10.039652 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:17.040612 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 7) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:32:17.040907 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:25.041713 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 8) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:32:25.041948 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:34.042734 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 9) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>

If you fix the above errors, Zammad should recover and the Delayed::Job.count should decrease over time. You should be good to go then. :slight_smile:

Good day, at the moment everything has returned to normal. There are about 1000 common connections. The system works out consistently correct. Delayed:: Job.count no more than 1. Thank you for your support. There are a number of questions: Ask within this ticket or better to create new ones?

1 Like

For overview reasons it would be a good idea to use different threads for different questions/issues. :slight_smile:

Glad you could fix it!

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.