A few days ago I found myself in a situation where Uptime
Robot was emailing me every hour of the night to let
me know that Xevol is down. As anyone else would I jumped on the issue the very
moment I was up. The issue was in the fact that API could not resolve the
endpoints. Since it’s deployed with pm2
I was able quickly pull up the logs
with:
1
ssh -t [email protected] "pm2 logs"
the error message I saw was saying "remaining connection slots are reserved
for non-replication superuser connections"
. It was the first time I stumbled
upon an issue of this kind.
After some Google-fu I found the similar issue on SO (of course) under subject of Is there a timeout for idle PostgreSQL connections?. Turns out that AWS RDS, the managed hosting from Bezos’ eggs basket was failing to terminate idle connections with my API’s Knex.js configuration.
I decided not to purse the issue in the code of the Knex dependencies like
generic-pool
but instead resolve the issue the time-efficient way. So I used
the psql
where I entered:
1
2
3
SELECT state, COUNT(*)
FROM pg_stat_activity
GROUP BY state;
which let me know how many idle connections were present. In my case there were 180 idle connections that were maxing-out the instance limits.
After going through Client Connection Defaults list I found the
idle_in_transaction_session_timeout
field which was exactly what I needed in
my situation. So I went and changed the value in AWS RDS’ Parameter Groups
section…
…to no avail. The issue’d persisted. That was the moment I stopped doing things the logical way and opted for the simple cron job on one of the available VPSes with:
1
crontab -e
to update the cron job list with the added line:
1
*/5 * * * * psql $DB_URL < $APP_PATH/psql_idle_connections.sql
which runs every 5 minutes the psql_idle_connections.sql
which contains:
1
2
3
4
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'DATABASE_NAME'
AND pid <> pg_backend_pid();
and voila, no issues. No idle connections are kept open.
Since not enough traffic goes through the instance I’ve also downgraded its
size to micro
size which is eligible for free tier.