since I’ve been hammering out the messages I may as well continue.
I still haven’t done any profiling, but after a break and a sleep, I thought since updates to updated_at in the sessions and users table are seen repeatedly in the slow log, I could optimise postgresql for writes.
my postgresql config now looks like
max_connections = 300
shared_buffers = 15GB
effective_cache_size = 45GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 3276kB
max_worker_processes = 32
max_parallel_workers_per_gather = 16
min_wal_size = 1GB
max_wal_size = 2GB
wal_buffers = 16MB
# the above block from https://pgtune.leopard.in.ua/#/
# using https://stackoverflow.com/questions/5131266/increase-postgresql-write-speed-at-the-cost-of-likely-data-loss and https://www.postgresql.org/docs/9.6/static/runtime-config-wal.html as references,
synchronous_commit = off
# setting synchronous_commit to off sounded scary, but it's not too bad,
# the database isn't at rick of getting corrupted, but it is at risk of loosing
# the last wal_writer_delay of commits. I can live with that.
wal_writer_delay = 1000 # ms
commit_delay = 5000 # us
# if there's more than 5 open transactions it delays updates by 5ms
# to try an group them into a single wall flush
deadlock_timeout = 15000
# default is 1000 ms which is OK when everything is working normally.
# when it's way overloaded it causes ldap sync issues and unprocessable mails, etc.
autovacuum_analyze_scale_factor = 0.002
autovacuum_vacuum_scale_factor = 0.001
# vacuum more often due to the constant updates of updated_at of session and users table.
logging_collector = on
log_destination = 'stderr'
log_directory = log
log_rotation_age = 1d
log_min_duration_statement = 50
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_autovacuum_min_duration = 0
log_error_verbosity = default
lc_messages='C'
# handy logging, this works well with pgbadger, I think that's where I copied it from.
the result? hmm, just eyeball it, much less slow logs between 7 past and 00, but it looks like there are hourly jobs starting from xx:00 and running until xx:07 there are still heaps of slow updates, and they might even be slower? but that doesn’t make any sense. I’m going to lower the commit delay to 1000 us and see how it goes.