A few days ago I found myself in a situation where Uptime Robot was emailing me every hour of the night to let me know that Xevol is down. As anyone else would I jumped on the issue the very moment I was up. The issue was in the fact that API could not resolve the endpoints. Since it’s deployed with pm2 I was able quickly pull up the logs with:

1
ssh -t [email protected] "pm2 logs"

the error message I saw was saying "remaining connection slots are reserved for non-replication superuser connections". It was the first time I stumbled upon an issue of this kind.

After some Google-fu I found the similar issue on SO (of course) under subject of Is there a timeout for idle PostgreSQL connections?. Turns out that AWS RDS, the managed hosting from Bezos’ eggs basket was failing to terminate idle connections with my API’s Knex.js configuration.

I decided not to purse the issue in the code of the Knex dependencies like generic-pool but instead resolve the issue the time-efficient way. So I used the psql where I entered:

1
2
3
SELECT state, COUNT(*)
FROM pg_stat_activity
GROUP BY state;

which let me know how many idle connections were present. In my case there were 180 idle connections that were maxing-out the instance limits.

After going through Client Connection Defaults list I found the idle_in_transaction_session_timeout field which was exactly what I needed in my situation. So I went and changed the value in AWS RDS’ Parameter Groups section…

to no avail. The issue’d persisted. That was the moment I stopped doing things the logical way and opted for the simple cron job on one of the available VPSes with:

1
crontab -e

to update the cron job list with the added line:

1
*/5 *  * * *  psql $DB_URL < $APP_PATH/psql_idle_connections.sql

which runs every 5 minutes the psql_idle_connections.sql which contains:

1
2
3
4
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'DATABASE_NAME'
AND pid <> pg_backend_pid();

and voila, no issues. No idle connections are kept open.


Since not enough traffic goes through the instance I’ve also downgraded its size to micro size which is eligible for free tier.

« Async non-blocking Stripe.js in React
On being fired »