If you’re hitting the PostgreSQL max_connections
limit, you should see this in your PostgreSQL logs, e.g. something like:
FATAL: remaining connection slots are reserved for non-replication superuser connections
So inspect your database logs for errors like this. Blindly raising max_connections
to an arbitrary high number is a bit of a dumb idea. Here is a pretty good write-up on the subject.
I would start looking for I/O bottlenecks, e.g. disk or database. The data you’ve supplied doesn’t mention this and your platform’s resources seem like quite a bit of overkill, the problem isn’t likely to be in the number of CPU cores or the memory, when looking at the data, although you might need to convince the middleware (Apache/NginX, PostgreSQL, ElasticSearch) to actually utilize it, by tuning it a bit more to the massive specs. Another issue could be deadlocks in the database, due to the high number of concurrent users all wanting to write to the database tables. Configure PostgreSQL to actually start logging information about locking issues (log_lock_waits = on
) and Inspect your database logs.
Regarding PostgreSQL tuning: we have set maintenance_work_mem
and autovacuum_work_mem
to higher values. We have turned off synchronous_commit
and you should consider configuring effective_cache_size
. Here’s a bit of a write-up on that subject.