Sunday, November 26, 2017

pg_hexedit: Rich hex editor annotations for Postgres relfiles

I've written an experimental tool for presenting PostgreSQL relation files in a hex editor with annotations/tags and tooltips that show the structure of the data and its content, including bit field values. This tool is called pg_hexedit, and is available from:

https://github.com/petergeoghegan/pg_hexedit

pg_hexedit is built on top of the open source, cross-platform GUI hex editor wxHexEditor. Since it's an experimental tool that is primarily made available for educational purposes, you are well advised to not use it on any data directory that isn't entirely disposable. It may cause data corruption. Opening a Postgres relation file in a hex editor while the server is running is a fundamentally unsafe thing to do if you care about your data. Use of the tool should be limited to throwaway installations on users' personal machines.

Thursday, October 26, 2017

amcheck “table-matches-index” enhancement now available, detects "freeze-the-dead" corruption

I’m pleased to announce that v1.2 of amcheck, a tool for detecting that PostgreSQL relations are logically consistent (that they do not appear to be corrupt) is now generally available. This version adds a big enhancement - the optional ability to check if every tuple that should have an entry in the index does in fact have such an entry. Specifically, we check for a table entry with matching data, as well as a matching heap TID. This happens at the end of the existing tests, as an optional extra step.

This enhancement is significant because it seems much more likely to catch corruption in the wild. In general, inconsistencies between a table and its indexes are more likely to occur than inconsistencies between blocks within an index for many reasons. There is simply a much larger window for an inconsistency to arise when something is amiss with database storage that breaks the assumptions PostgreSQL makes during crash recovery, for example.

The enhancement is also significant because it played a role in identifyingPostgreSQL data corruption bug that will be fixed in the next point release, scheduled for November 9th, 2017. This bug affects all supported PostgreSQL versions. It was informally dubbed the “freeze-the-dead” bug.

November 6 2017 update: The fix was reverted due to additional concerns that came to light. The community is working on a new, more comprehensive fix for the next point release.

Packages for v1.2 are available from the community Debian/Ubuntu apt repository, as well as packages from the community Redhat/CentOS/SLES yum repository. Full details on installing these packages are available from the README:

https://github.com/petergeoghegan/amcheck/

Sunday, October 15, 2017

amcheck for Postgres 9.4+ now available from PGDG apt and yum repositories

amcheck, a tool for index corruption detection, now has packages available from the community Debian/Ubuntu apt repository, as well as packages from the community Redhat/CentOS/SLES yum repository.

This means that installations built on those community resources can easily install amcheck, even on PostgreSQL versions before PostgreSQL 10, the release that contrib/amcheck actually first appears in.

Full details on installing these packages are available from the README: https://github.com/petergeoghegan/amcheck/

It's also possible to install the packages on PostgreSQL 10, because the extension these packages install is actually named "amcheck_next" (not "amcheck"). Currently, it isn't really useful to install "amcheck_next" on PostgreSQL 10, because its functionality is identical to contrib/amcheck. That's expected to change soon, though. I will add a new enhancement to amcheck_next in the coming weeks, allowing verification functions to perform "heap matches index" verification on top of what is already possible.

Many thanks to Christoph Berg and Devrim Gündüz for their help with the packaging.

Tuesday, July 18, 2017

PostgreSQL Index bloat under a microscope

I've posted a snippet query to the PostgreSQL Wiki that "summarizes the keyspace" of a target B-Tree index. This means that it displays which range of indexed values belong on each page, starting from the root. It requires pageinspect. The query recursively performs a breadth-first search. Along the way, it also displays information about the space utilization of each page, and the number of distinct key values that actually exist on the page, allowing you to get a sense of how densely filled each page is relative to what might be expected.

The query is available from:

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index