[solved] I need help. Not working web zammad, or very slow

romasoft · March 6, 2019, 8:08am

Used Zammad version: 2.9 package

Operating system: ubuntu 18.04, centos7
Browser + version: chrome
Customer: ~500
Agent: ~ 100 activ

Good afternoon.
Previously, the system was running on centos7 and worked stably with 100 agents. Friday was import added 170 agents, but the staff still did not go in. Immediately after this zammad began to hang.
Deployed a new vm ubuntu 18.04 and restored from backup, didn’t help. At night, the system is stable. We stand in the morning to agents to begin to work actively, everything absolutely dies.
12 cores
16gb ram

MrGeneration · March 6, 2019, 8:51am

You are running 100 concurrent agents on 12 cores with 16 GB of RAM with elasticsearch and postgresql and claim it was running stable?

I’m having some hard time in believing that to be honest.

However, what exactly changed during the import? Did you start using organizations during import?
How is your delayed Jobs looking?

zammad run rails r 'p Delayed::Job.count'
zammad run rails r 'p Delayed::Job.first'
zammad run rails r 'p Delayed::Job.last'

What else changed during import and how exactly did you import what?

romasoft · March 6, 2019, 9:11am

root@zammad:~# zammad run rails r ‘p Delayed::Job.count’
284
root@zammad:~# zammad run rails r ‘p Delayed::Job.first’
#<Delayed::Backend::ActiveRecord::Job id: 3626504, priority: 0, attempts: 0, handler: “— !ruby/object:ActiveJob::QueueAdapters::Delayed…”, last_error: nil, run_at: “2019-03-06 08:37:28”, locked_at: “2019-03-06 09:10:37”, failed_at: nil, locked_by: “host:zammad pid:1070”, queue: “default”, created_at: “2019-03-06 08:37:28”, updated_at: “2019-03-06 08:37:28”>
root@zammad:~# zammad run rails r ‘p Delayed::Job.last’
#<Delayed::Backend::ActiveRecord::Job id: 3626790, priority: 0, attempts: 0, handler: “— !ruby/object:ActiveJob::QueueAdapters::Delayed…”, last_error: nil, run_at: “2019-03-06 09:10:49”, locked_at: nil, failed_at: nil, locked_by: nil, queue: “default”, created_at: “2019-03-06 09:10:49”, updated_at: “2019-03-06 09:10:49”>
root@zammad:~#

romasoft · March 6, 2019, 9:12am

During the import everything went well, but after an hour stopped working!
RAM 24GB
zammad%20htop|641x500

romasoft · March 6, 2019, 9:16am

can I ask you to join us?

romasoft · March 6, 2019, 9:22am

import example

id,login,firstname,lastname,email,web,phone,fax,mobile,department,street,zip,city,country,address,vip,verified,active,note,last_login,out_of_office,out_of_office_start_at,out_of_office_end_at,roles,organization
,ARKHANGELSK1@A???R.COM,Анна,Коломийцева,ARKHANGELSK1@A???R.COM,false,false,true,false,Сотрудник Архангельск,A???R
,ARKHANGELSK2@A???R.COM,Анастасия,Казакова,ARKHANGELSK2@A???R.COM,false,false,true,false,Сотрудник Архангельск,A???R
,ARKHANGELSK3@A???R.COM,Наталья,Дружинина,ARKHANGELSK3@A???R.COM,false,false,true,false,Сотрудник Архангельск,A???R

romasoft · March 6, 2019, 10:19am

SOLVED

mount -t tmpfs -o size=8g tmpfs /opt/zammad/tmp

romasoft · March 6, 2019, 10:21am

root@zammad:~# zammad run rails r ‘p Delayed::Job.count’
997
root@zammad:~# zammad run rails r ‘p Delayed::Job.first’
#<Delayed::Backend::ActiveRecord::Job id: 3627555, priority: 0, attempts: 0, handler: “— !ruby/object:ActiveJob::QueueAdapters::Delayed…”, last_error: nil, run_at: “2019-03-06 09:56:54”, locked_at: “2019-03-06 10:20:54”, failed_at: nil, locked_by: “host:zammad pid:29841”, queue: “default”, created_at: “2019-03-06 09:56:54”, updated_at: “2019-03-06 09:56:54”>
root@zammad:~# zammad run rails r ‘p Delayed::Job.last’
#<Delayed::Backend::ActiveRecord::Job id: 3628566, priority: 0, attempts: 0, handler: “— !ruby/object:ActiveJob::QueueAdapters::Delayed…”, last_error: nil, run_at: “2019-03-06 10:21:07”, locked_at: nil, failed_at: nil, locked_by: nil, queue: “default”, created_at: “2019-03-06 10:21:07”, updated_at: “2019-03-06 10:21:07”>

MrGeneration · March 6, 2019, 11:27am

Did you upgrade Zammad from e.g. 2.8 or earlier to 2.9 as well?
If your Delayed::Job.count keeps on rising, please check if your elasticsearch is up and running (your production log might give hints as well).

If you upgraded, you might want to want to remove the first delayed job.
The following might appear in one of your scheduler_err.log

Delayed::DeserializationError: Job failed to load: undefined class/module Observer::Ticket::UserTicketCounter::BackgroundJob. Handler: "--- !ruby/object:Observer::Ticket::UserTicketCounter::BackgroundJob\ncustomer_id: 154\nupdated_by_id: 113\n"
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/delayed_job-4.1.3/lib/delayed/backend/base.rb:74:in `rescue in payload_object'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/delayed_job-4.1.3/lib/delayed/backend/base.rb:71:in `payload_object'
/opt/zammad/app/models/scheduler.rb:167:in `cleanup_delayed'
/opt/zammad/app/models/scheduler.rb:143:in `block (3 levels) in cleanup_delayed_jobs'
/opt/zammad/lib/mixin/start_finish_logger.rb:7:in `log_start_finish'
/opt/zammad/app/models/scheduler.rb:142:in `block (2 levels) in cleanup_delayed_jobs'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/activerecord-5.1.6.1/lib/active_record/relation/delegation.rb:39:in `each'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/activerecord-5.1.6.1/lib/active_record/relation/delegation.rb:39:in `each'
/opt/zammad/app/models/scheduler.rb:141:in `block in cleanup_delayed_jobs'
/opt/zammad/lib/mixin/start_finish_logger.rb:7:in `log_start_finish'
/opt/zammad/app/models/scheduler.rb:139:in `cleanup_delayed_jobs'
/opt/zammad/app/models/scheduler.rb:126:in `cleanup'
/opt/zammad/app/models/scheduler.rb:23:in `threads'
script/scheduler.rb:66:in `block in <top (required)>'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/application.rb:270:in `block in start_proc'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/application.rb:279:in `start_proc'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/application.rb:300:in `start'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/controller.rb:56:in `run'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons.rb:197:in `block in run_proc'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons/cmdline.rb:92:in `catch_exceptions'
/opt/zammad/vendor/bundle/ruby/2.4.0/gems/daemons-1.2.5/lib/daemons.rb:196:in `run_proc'
script/scheduler.rb:54:in `<top (required)>'

Please note that I can not support via SSH, phone, mail or private messages, as this would be unfair to the community itself. However, we can offer commercial services to you if you want.

romasoft · March 6, 2019, 12:59pm

I have already shown your proposal-waiting for approval from the management.

Аfter mount Delayed::Job.count keeps on rising. now 5000
scheduler_err.log - empty
https://yadi.sk/d/tIRc0pMuhIfN-g - production log. Please looking

romasoft · March 6, 2019, 1:43pm

Solved

zammad run rake jobs:work

romasoft · March 6, 2019, 1:57pm

do I need to add to crontab? if you stop Delayed::Job.count keeps on rising

romasoft · March 6, 2019, 2:04pm

now

channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 1 hour
channel is active but not fetched for about 2 hours
channel is active but not fetched for about 2 hours
channel is active but not fetched for about 2 hours
channel is active but not fetched for about 2 hours
channel is active but not fetched for about 2 hours
channel is active but not fetched for about 2 hours

MrGeneration · March 6, 2019, 5:45pm

Thanks for the logfile, that’s definitely helping.
First of all: Please restart your Zammad-Service.

You’re experiencing several problems - the biggest one might be your database server refusing connections:

I, [2019-03-06T11:52:17.803120 #109335-47095824027420]  INFO -- : Completed 422 Unprocessable Entity in 9210ms (Views: 0.3ms | ActiveRecord: 8713.1ms)
E, [2019-03-06T11:52:17.804911 #109335-69873216173500] ERROR -- : PG::UnableToSend: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

(Regarding this, you might want to raise max_connections inside the postgresql config, something like 250 and higher should be good enough. You might also want to restart your postgresql service before your zammad service, as it might also be that your operating system is killing processes because of insuficient memory.

(The following error actually seems to proove me right)
E, [2019-03-06T12:50:32.683663 #10289-69916700177060] ERROR -- : thread_client 47182414241020 exited with error #<PG::ConnectionBad: FATAL: remaining connection slots are reserved for non-replication superuser connections

Your second problem (has been) elasticsearch - I’ve seen on the end of your log that you seemed to fixed that one so it should be fine.

E, [2019-03-06T11:53:23.019631 #1070-47204663792540] ERROR -- : Error performing SearchIndexJob (Job ID: 632215e3-1b05-4831-8059-d9f5cdba2e45) from DelayedJob(default) in 24.04ms: RuntimeError (Unable to process POST request to elasticsearch URL 'http://localhost:9200/zammad.localdomain_zammad_production/User/243?pipeline=zammad-attachment'. Elasticsearch is not reachable, probably because it's not running or even installed.

After elasticsearch errors, your database issue appeared again (you really should check that), the database server is also the reason why the scheduler can’t run - Zammad needs a working database connection, thus fetching your mail accounts did not happen/work:

E, [2019-03-06T12:31:50.033255 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 1) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:31:50.033540 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:31:52.034510 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 2) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:31:52.034782 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:31:55.035760 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 3) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:31:55.036114 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:31:59.037101 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 4) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:31:59.037386 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:04.038205 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 5) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:32:04.038451 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:10.039367 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 6) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:32:10.039652 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:17.040612 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 7) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:32:17.040907 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:25.041713 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 8) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>
E, [2019-03-06T12:32:25.041948 #1070-69857077089060] ERROR -- : Can't reconnect to database #<PG::Error: invalid encoding name: utf8>
E, [2019-03-06T12:32:34.042734 #1070-69857077089060] ERROR -- : execute Channel.fetch (try_count 9) exited with error #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: BEGIN>

If you fix the above errors, Zammad should recover and the Delayed::Job.count should decrease over time. You should be good to go then.

romasoft · March 7, 2019, 9:25am

Good day, at the moment everything has returned to normal. There are about 1000 common connections. The system works out consistently correct. Delayed:: Job.count no more than 1. Thank you for your support. There are a number of questions: Ask within this ticket or better to create new ones?

MrGeneration · March 7, 2019, 9:27am

For overview reasons it would be a good idea to use different threads for different questions/issues.

Glad you could fix it!

system · July 5, 2019, 9:28am

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.