Tempus fugit
In checkpoint post we saw how a misbehaving session can crash PostgreSQL.
Before the version 9.6 the only option for avoiding this sort of things was to setup a script, usually via cron which killed the idle in transaction sessions after a specific timeout. In this post we’ll see in details what are the problems related with this approach and the new features implemented from PostgreSQL 9.6 can help the DBA to tackle potentially dangerous situations.
The magic of the MVCC
DISCLAIMER What follow is just a generic explanation on how PostgreSQL manages the concurrency. I'm skipping the commit statuses and the command sequence within the single transaction. If you want to dig deeper take a look to the PostgreSQL source code where the complete implementation is explained extremely well.
The multi version concurrency control is used in PostgreSQL to implement an efficient way to manage the concurrency with a minimal need for locks.
At the logical level this is completely transparent to the user. The rows are visible to the session’s transaction according to the transaction isolation level rules.
At the physical level each tuple have an header where, along with other fields, there are two fields of type XID (transaction ID) used to track the insert and the delete operations. PostgreSQL compares the values of the stored XID with the session’s XID in order to determine whether the row is visible or not.
The fields are named after the minimum and the maximum value of the transaction id attached to the tuple.
The field t_xmin is the transaction id of the insert operation, t_xmax is used for the transaction ID of the delete operation.
A tuple have always the t_xmin set. If the tuple is live then t_xmax is zero. Otherwise the value of t_max is the XID of the delete operation.
Within a transaction PostgreSQL gives always a consistent snapshot is presented to the backend.
How the snapshots are managed and eventually discarded depends by the level of transaction isolation. It’s important to understand that rows could are visible in a specific snapshot cannot be recycled by VACUUM.
Stale transactions have effect also on the WAL files as we’ve seen in the checkpoint post, but only after a write operation.
Idle (in transaction) will kill you
An idle session does little harm. It keeps the back end connected attached to shared memory segment, it keeps otherwise useful resources in use for the lazy backend, but at least there isn’t the risk of crashing the cluster.
A complete different story is for the sessions left idle with a transaction. If the session is just started with a BEGIN; PostgreSQL doesn’t assign yeth a transaction id and therefore the WAL files are correctly recycled during the checkpoint. Also the VACUUM is still possible as long as the transaction’s snapshot doesn’t need to see the rows.
An example will explain better the concept. In order to work with real data we can create a table as a copy of pg_attribute. We’ll also disable the auto vacuum for this specific table.
db_test=# CREATE TABLE test AS SELECT * FROM pg_attribute ;
SELECT 2504
db_test=# ALTER TABLE test SET (autovacuum_enabled =false, toast.autovacuum_enabled=false);
ALTER TABLEWith another session we start a transaction using the command BEGIN; then we run a select count(*) from the table test.
We don’t commit the transaction leaving the session idle in transaction.
BEGIN
db_test=# SELECT COUNT(*) FROM test;
count
-------
2504
(1 row)If in the first session we run a DELETE FROM test;, the rows in the session idle in transaction will disappear.
That’s because of transaction isolation level read committed and the session idle in transaction didn’t got assigned an XID.
db_test=# SELECT COUNT(*) FROM test;
count
-------
0
(1 row)Under those conditions we can also run a VACUUM and get all the dead rows cleaned.
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;