Tatsuo Ishii: log_client_messages in Pgpool-II 4.0

planet postgresql - 2018-12-04(火) 16:59:00
Pgpool-II 4.0 adds new logging feature called "log_client_messages". This allows to log messages coming from frontend. Up to 3.7 the only way to log frontend messages was enable debugging log, which produced tremendous amount of logs.

For example, with log_client_messages enabled, "pgbench -S -M parepared -t 2" produces frontend logs below:

2018-12-04 16:43:45: pid 6522: LOG:  Parse message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  statement: "P0_1", query: "SELECT abalance FROM pgbench_accounts WHERE aid = $1;"
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.
2018-12-04 16:43:45: pid 6522: LOG:  Bind message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: "", statement: "P0_1"
2018-12-04 16:43:45: pid 6522: LOG:  Describe message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Execute message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.

As you can see, pgbench sends a query "SELECT abalance FROM pgbench_accounts WHERE aid = $1;" using prepared statement "P0_1", then bind message to bind parameter to be bound to "$1".
It then sends describe message to obtain meta data, and finally sends execute message to run the query.

Below are the second execution of query (remember that we add "-t 2" parameter to execute 2 transactions).

2018-12-04 16:43:45: pid 6522: LOG:  Bind message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: "", statement: "P0_1"
2018-12-04 16:43:45: pid 6522: LOG:  Describe message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Execute message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.
2018-12-04 16:43:45: pid 6522: LOG:  Terminate message from frontend.

This time no parse message is sent because pgbench reuses the named statement "P0_1", which eliminates the p[...]
カテゴリー: postgresql

Bruce Momjian: Views vs. Materialized Views

planet postgresql - 2018-12-04(火) 00:45:01

Views and materialized views are closely related. Views effectively run the view query on every access, while materialized views store the query output in a table and reuse the results on every materialized view reference, until the materialized view is refreshed. This cache effect becomes even more significant when the underlying query or tables are slow, such as analytics queries and foreign data wrapper tables. You can think of materialized views as cached views.

カテゴリー: postgresql

Bruce Momjian: Extensibility

planet postgresql - 2018-12-01(土) 06:30:01

Extensibility was built into Postgres from its creation. In the early years, extensibility was often overlooked and made Postgres server programming harder. However, in the last 15 years, extensibility allowed Postgres to adapt to modern workloads at an amazing pace. The non-relational data storage options mentioned in this presentation would not have been possible without Postgres's extensibility.

カテゴリー: postgresql

Nickolay Ihalainen: PostgreSQL Streaming Physical Replication With Slots

planet postgresql - 2018-12-01(土) 02:05:32

PostgreSQL streaming physical replication with slots simplifies setup and maintenance procedures. Usually, you should estimate disk usage for the Write Ahead Log (WAL) and provide appropriate limitation to the number of segments and setup of the WAL archive procedure. In this article, you will see how to use replication with slots and understand what problems it could solve.


PostgreSQL physical replication is based on WAL. Th Write Ahead Log contains all database changes, saved in 16MB segment files. Normally postgres tries to keep segments between checkpoints. So with default settings, just 1GB of WAL segment files is available.

Replication requires all WAL files created after backup and up until the current time. Previously, it was necessary to keep a huge archive directory (usually mounted by NFS to all slave servers). The slots feature introduced in 9.4 allows Postgres to track the latest segment downloaded by a slave server. Now, PostgreSQL can keep all segments on disk, even without archiving, if a slave is seriously behind its master due to downtime or networking issues. The drawback: the disk space could be consumed infinitely in the case of configuration error. Before continuing, if you need a better understanding of physical replication and streaming replication, I recommend you read “Streaming Replication with PostgreSQL“.

Create a sandbox with two PostgreSQL servers

To setup replication, you need at least two PostgreSQL servers. I’m using pgcli (pgc) to setup both servers on the same host. It’s easy to install on Linux, Windows, and OS X, and provides the ability to download and run any version of PostgreSQL on your staging server or even on your laptop.

python -c "$(curl -fsSL" mv bigsql master cp -r master slave $ cd master master$ ./pgc install pg10 master$ ./pgc start pg10 $ cd ../slave slave$ ./pgc install pg10 slave$ ./pgc start pg10

First of all you should allow the replication user to connect:

master$ echo "host replication replic[...]
カテゴリー: postgresql

Liaqat Andrabi: Webinar : Introduction to OmniDB [Follow Up]

planet postgresql - 2018-11-30(金) 22:59:58

A database management tool that simplifies what is complex and drives performance. OmniDB is one such tool with which you can connect to several different databases – including PostgreSQL, Oracle, MySQL and others.

2ndQuadrant recently hosted a webinar on this very topic: Introduction to OmniDB. The webinar was presented by OmniDB co-founders and PostgreSQL consultants at 2ndQuadrant, Rafael Castro & William Ivanski.

The recording of the webinar is now available here.

Questions that Rafael and William couldn’t respond to during the live webinar have been answered below.

Q1: There are other open source GUI tools around to manage PostgreSQL. Why are you investing efforts on a new tool?

A1: When OmniDB was created we wanted a web tool, and not all available tools offered this architecture. Also, as advanced SQL developers, we wanted fewer forms and more SQL templates. Finally, we also wanted the freedom to develop features that don’t exist in other tools, or existed but were unmaintained or not up-to-date, such as customizable monitoring dashboard, console tab and the debugger which now supports PG 11 procedures.

Q2: Currently it is not possible to import data from a file into a database. Do you plan to implement such feature?

A2: Yes, we will implement this soon. There will be an interface for the user to upload and configure data to be imported, and also in the Console Tab there will be a new \copy command.

Q3: Is it possible to view the query plan ?

A3: Yes, it is possible to view the query plan using the magnifying glass icons in the Query Tab. The first one will do an EXPLAIN, and the second an EXPLAIN ANALYZE. The output can be seen as a list or as a tree.

Q4: Is it possible to pass parameters in the EXPLAIN command ?

A4: You can always manually execute EXPLAIN with any parameters that you need. However, the graphical component to view the plan only allows EXPLAIN or EXPLAIN ANALYZE. We will investigate the possibility to make the EXPLAIN command customizable for the graphical component.

For any questions, co

カテゴリー: postgresql

Marco Slot: Why the RDBMS is the future of distributed databases, ft. Postgres and Citus

planet postgresql - 2018-11-30(金) 17:15:00

Around 10 years ago I joined Amazon Web Services and that’s where I first saw the importance of trade-offs in distributed systems. In university I had already learned about the trade-offs between consistency and availability (the CAP theorem), but in practice the spectrum goes a lot deeper than that. Any design decision may involve trade-offs between latency, concurrency, scalability, durability, maintainability, functionality, operational simplicity, and other aspects of the system—and those trade-offs have meaningful impact on the features and user experience of the application, and even on the effectiveness of the business itself.

Perhaps the most challenging problem in distributed systems, in which the need for trade-offs is most apparent, is building a distributed database. When applications began to require databases that could scale across many servers, database developers began to make extreme trade-offs. In order to achieve scalability over many nodes, distributed key-value stores (NoSQL) put aside the rich feature set offered by the traditional relational database management systems (RDBMS), including SQL, joins, foreign keys, and ACID guarantees. Since everyone wants scalability, it would only be a matter of time before the RDBMS would disappear, right? Actually, relational databases have continued to dominate the database landscape. And here’s why:

The most important aspect to consider when making trade-offs in a distributed system (or any system) is development cost.

The trade-offs made by your database software will have significant impact on the development cost of your application. Handling data in an advanced application that needs to be usable, reliable, and performant is a problem that is inherently complex to solve. The number of man hours required to successfully address every little subproblem can be enormous. Fortunately, a database can take care of many of these subproblems, but database developers face the cost problem as well. It actually takes many decades to build the functionality, gu

カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - DOS prevention

planet postgresql - 2018-11-30(金) 13:50:22

A couple of months ago a thread has begun on the PostgreSQL community mailing lists about a set of problems where it is possible to lock down PostgreSQL from connections just by running a set of queries with any user, having an open connection to the cluster being enough to do a denial of service.

For example, in one session do the following by scanning pg_stat_activity in a transaction with any user:

BEGIN; SELECT count(*) FROM pg_stat_activity;

This has the particularity to take an access share lock on the system catalog pg_authid which is a critical catalog used for authentication. And then, with a second session and the same user, do for example VACUUM FULL on pg_authid, like that:

VACUUM FULL pg_authid;

This user is not an owner of the relation so VACUUM will fail. However, at this stage the second session will be stuck until the first session commits as an attempt to take a lock on the relation will be done, and a VACUUM FULL takes an exclusive lock, which prevents anything to read or write it. Hence, in this particular case, as pg_authid is used for authentication, then no new connections can be done to the instance until the transaction of the first session has committed.

As the thread continued, more commands have been mentioned as having the same kind of issues:

  • As mentioned above, VACUUM FULL is a pattern. In this case, queuing for a lock on a relation for which an operation will fail should not happen. This takes an exclusive lock on the relation.
  • TRUNCATE, for reasons similar to VACUUM FULL.
  • REINDEX on a database or a schema.

The first two cases have been fixed for PostgreSQL 12, with commit a556549 for VACUUM and commit f841ceb for TRUNCATE. Note that similar work has been done a couple of years ado with for example CLUSTER in commit cbe24a6. In all those cases, the root of the problem is to make sure that the user has the right to take a lock on a relation before attempting it and locking it, so this has basically required a bit of refactoring so as the code involved makes use of RangeVa

カテゴリー: postgresql

Pavel Stehule: plpgsql_check can detect bad default volatility flag

planet postgresql - 2018-11-29(木) 01:20:00
Common performance problem of plpgsql function when these functions are used from some more complex queries is using default VOLATILE flag. There are not possible to do more aggressive optimization of this function call. plpgsql_check can detect this issue now:

CREATE OR REPLACE FUNCTION public.flag_test1(integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
return $1 + 10;

CREATE OR REPLACE FUNCTION public.flag_test2(integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
return (select * from fufu where a = $1 limit 1);

postgres=# select * from plpgsql_check_function('flag_test1(int)', performance_warnings => true);
│ plpgsql_check_function │
│ performance:00000:routine is marked as STABLE, should be IMMUTABLE │
(1 row)

postgres=# select * from plpgsql_check_function('flag_test2(int)', performance_warnings => true);
│ plpgsql_check_function │
│ performance:00000:routine is marked as VOLATILE, should be STABLE │
(1 row)
カテゴリー: postgresql

Bruce Momjian: Data Storage Options

planet postgresql - 2018-11-29(木) 00:30:01

Since I have spent three decades working with relational databases, you might think I believe all storage requires relational storage. However, I have used enough non-relational data stores to know that each type of storage has its own benefits and costs.

It is often complicated to know which data store to use for your data. Let's look at the different storage levels, from simplest to most complex:

  1. Flat files: Flat files are exactly what they sound like — an unstructured stream of bytes stored in a file. There is no structure defined in the file itself — structure must be implemented in the application that reads the file. This is obviously the simplest way to store data, and works well for small data volumes when only a single user and a few well-coordinated applications need access. File locking is required to serialize multi-user access. Changes typically require a complete file rewrite.
  2. Word processing documents: This is similar to flat files, but defines structure in the data file, e.g., highlighting, sections. The same flat file limitations apply.
  3. Spreadsheet: This is similar to word processing documents, but adds more capabilities, including computations and the definition of relationships between data elements. Data is more atomized in this format than in the previous one.
  4. NoSQL stores: This removes many of the limitations from previous data stores. Multi-user access is supported, including locking, and modification of single data elements does not require rewriting all data.
  5. Relational databases: This is the most complex data storage option. Rigid structure is enforced internally, though unstructured options exist. Data access occurs using a declarative language that is dynamically optimized based on that structure. Multi-user and multi-application access is efficient.

You might think that since relational databases have the most features, everything should use it. However, with features come complexity and rigidity. Therefore, all levels are valid for some use cases:

  • Flat files are ideal for read-onl
カテゴリー: postgresql

Alexey Lesovsky: Global shortcuts and PostgreSQL queries.

planet postgresql - 2018-11-28(水) 19:42:00
Using your favorite hotkeys on queries in Linux One of my colleagues often talks about using hot keys for his favourite SQL queries and commands in iterm2 (e.g. for checking current activity or to view lists of largest tables). 
Usually, I listen to this with only half an ear, because iterm2 is available only for MacOS and I am a strong Linux user. Once this topic came up again I thought perhaps this function could be realised not only through iterm2 but through an alternative tool or settings in desktop environment.
Due to me being an old KDE user, I opened “System settings” and began to check all settings related to keyboard, input, hotkeys and so on. What I have found is number of settings that allow to emulate text input from the keyboard. Using this feature I configured bindings for my favourite queries. So now, for executing needed query I don’t need to search it in my queries collection, copy and paste… I just press hotkey and got it in active window, it can be psql console, work chat, text editor or something else.

Here is how these bindings are configured:
Open “System settings” application and go to “Shortcuts”. There is “Custom Shortcuts” menu. Here, optionally, we should create a dedicated group for our shortcuts. I named my group “PostgreSQL hot queries”. When creating a shortcut, select “Global shortcut” and next “Send Keyboard Input”.

Now, we need to setup a new shortcut, give it a name and description. And here is the most interesting part. From the different Linux users, sometimes I’ve heard that KDE must have been written by aliens, and that statement has not been completely clear for me, until this moment since I never had serious issues with KDE. Now, after configuring the shortcuts, I tend to agree with this statement more and more.

Ok, here we go, next we should type text of a query which should appear when hotkey will be pressed. So, instead of plain query text you have to input alien sequences of symbols.

Check out the screenshot with example of query that should show current activit[...]
カテゴリー: postgresql

Viorel Tabara: What's New in PostgreSQL 11

planet postgresql - 2018-11-28(水) 19:34:22

PostgreSQL 11 was released on October 10th, 2018, and on schedule, marking the 23rd anniversary of the increasingly popular open source database.

While a complete list of changes is available in the usual Release Notes, it is worth checking out the revamped Feature Matrix page which just like the official documentation has received a makeover since its first version which makes it easier to spot changes before diving into the details.

For example on the Release Notes page, the “Channel binding for SCAM authentication” is buried under the Source Code while the matrix has it under the Security section. For the curious here’s a screenshot of the interface:

PostgreSQL Feature Matrix

Additionally, the Bucardo Postgres Release Notes page linked above, is handy in its own way, making it easy to search for a keyword across all versions.

What’s New? With literally hundreds of changes, I will go through the differences listed in the Feature Matrix.

Covering Indexes for B-trees (INCLUDE)

CREATE INDEX received the INCLUDE clause which allows indexes to include non-key columns. Its use case for frequent identical queries, is well described in Tom Lane’s commit from November 22nd, which updates the development documentation (meaning that the current PostgreSQL 11 documentation doesn’t have it yet), so for the full text refer to section 11.9. Index-Only Scans and Covering Indexes in the development version.

Parallelized CREATE INDEX for B-tree Indexes

As alluded in the name, this feature is only implemented for the B-tree indexes, and from Robert Haas’ commit log we learn that the implementation may be refined in the future. As noted from the CREATE INDEX documentation, while both parallel and concurrent index creation methods take advantage of multiple CPUs, in the case of CONCURRENT only the first table scan will be performed in parallel.

Related to this new feature are the configuration parameters maintenance_work_mem and maintenance_parallel_maintenance_workers.

Lastly, the number of parallel workers can be set per tabl

カテゴリー: postgresql

第 155 回理事会議事録 (2018-11) news - 2018-11-28(水) 19:22:52
第 155 回理事会議事録 (2018-11) anzai 2018/11/28 (水) - 19:22
カテゴリー: postgresql

Hans-Juergen Schoenig: Transactions in PostgreSQL: READ COMMITTED vs. REPEATABLE READ

planet postgresql - 2018-11-28(水) 18:00:11

The ability to run transactions is the core of every modern relational database system. The idea behind a transaction is to allow users to control the way data is written to PostgreSQL. However, a transaction is not only about writing – it is also important to understand the implications on reading data for whatever purpose (OLTP, data warehousing, etc.).

Understanding transaction isolation levels

One important aspect of transactions in PostgreSQL and therefore in all other modern relational databases is the ability to control when a row is visible to a user and when it is not. The ANSI SQL standard proposes 4 transaction isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ and SERIALIZABLE) to allow users to explicitly control the behavior of the database engine. Unfortunately the existence of transaction isolation levels is still not as widely known as it should be, and therefore I decided to blog about this important topic to give more PostgreSQL users the chance to apply this very important, yet under-appreciated feature.

The two most commonly used transaction isolation levels are READ COMMITTED and REPEATABLE READ. In PostgreSQL READ COMMITTED is the default isolation level and should be used for normal OLTP operations. In contrast to other systems such as DB2 or Informix, PostgreSQL does not provide support for READ UNCOMMITTED, which I personally consider to be a thing of the past anyway.


In READ COMMITTED mode, every SQL statement will see changes which have already been committed (e.g. new rows added to the database) by some other transactions. In other words: If you run the same SELECT statement multiple times within the same transaction, you might see different results. This is something you have to take into account when writing an application.

However, within a statement the data you see is constant – it does not change. A SELECT statement (or any other statement) will not see changes committed WHILE the statement is running. Within an SQL statement, data and t

カテゴリー: postgresql

Pavel Stehule: Orafce - simple thing that can help

planet postgresql - 2018-11-28(水) 17:23:00
I merged small patch to master branch of Orafce. This shows a wide PostgreSQL possibilities and can decrease a work necessary for migration from Oracle to Postgres.

One small/big differences between Oracle and any other databases is meaning of empty string. There are lot of situation, when Oracle use empty string as NULL, and NULL as empty string. I don't know any other database, that does it.

Orafce has native type (not domain type) varchar2 and nvarchar2. Then it is possible to define own operators. I implemented || concat operator as null safe for these types. So now it is possible to write:
postgres=# select null || 'xxx'::varchar2 || null;
│ ?column? │
│ xxx │
(1 row)

When you port some application from Oracle to Postgres, then is good to disallow empty strings in Postgres. One possible solution is using generic C trigger function replace_empty_string(). This trigger function can check any text type field in stored rows and can replace empty strings by NULLs. Sure, you should to fix any check like colname = '' or colname '' in your application, and you should to use just only colname IS [NOT] NULL. Then the code will be same on Oracle and PostgreSQL too, and you can use automatic translation by ora2pg.
カテゴリー: postgresql

Pavel Stehule: PostgreSQL 12 - psql - csv output

planet postgresql - 2018-11-28(水) 14:51:00
After some years and long discussion, a psql console has great feature - csv output (implemented by Daniel Vérité).

Usage is very simple, just use --csv option.

[pavel@nemesis postgresql.master]$ psql --csv -c "select * from pg_namespace limit 10" postgres
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add CSV table output mode in psql.

planet postgresql - 2018-11-28(水) 11:49:05
On 26th of November 2018, Tom Lane committed patch: Add CSV table output mode in psql.   "\pset format csv", or --csv, selects comma-separated values table format. This is compliant with RFC 4180, except that we aren't too picky about whether the record separator is LF or CRLF; also, the user may choose a field … Continue reading "Waiting for PostgreSQL 12 – Add CSV table output mode in psql."
カテゴリー: postgresql

Stefan Fercot: Combining pgBackRest and Streaming Replication

planet postgresql - 2018-11-28(水) 09:00:00

pgBackRest is a well-known powerful backup and restore tool. It offers a lot of possibilities.

While pg_basebackup is commonly used to setup the initial database copy for the Streaming Replication, it could be interesting to reuse a previous database backup (eg. taken with pgBackRest) to perform this initial copy.

Furthermore, the --delta option provided by pgBackRest can help us to re-synchronize an old secondary server without having to rebuild it from scratch.

To reduce the load on the primary server during a backup, pgBackRest even allows to take backups from a standby server.

We’ll see in this blog post how to do that.

For the purpose of this post, we’ll use 2 nodes called primary and secondary. Both are running on CentOS 7.

We’ll cover some pgBackRest tips but won’t go deeper in the PostgreSQL configuration, nor in the Streaming Replication best practices.


On both primary and secondary server, install PostgreSQL and pgBackRest packages directly from the PGDG yum repositories:

$ sudo yum install -y\ rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm $ sudo yum install -y postgresql11-server postgresql11-contrib pgbackrest

Check that pgBackRest is correctly installed:

$ pgbackrest pgBackRest 2.07 - General help Usage: pgbackrest [options] [command] Commands: archive-get Get a WAL segment from the archive. archive-push Push a WAL segment to the archive. backup Backup a database cluster. check Check the configuration. expire Expire backups that exceed retention. help Get help. info Retrieve information about backups. restore Restore a database cluster. stanza-create Create the required stanza data. stanza-delete Delete a stanza. stanza-upgrade Upgrade a stanza. start Allow pgBackRest processes to run. stop Stop pgBackRest processes from running. version Get version. Use 'pgbackrest[...]
カテゴリー: postgresql

Craig Kerstiens: How Postgres is more than a relational database: Extensions

planet postgresql - 2018-11-28(水) 05:51:00

Postgres has been a great database for decades now, and has really come into its own in the last ten years. Databases more broadly have also gotten their own set of attention as well. First we had NoSQL which started more on document databases and key/value stores, then there was NewSQL which expanding things to distributed, graph databases, and all of these models from documents to distributed to relational were not mutually exclusive. Postgres itself went from simply a relational database (which already had geospatial capabilities) to a multi modal database by adding support for JSONB.

But to me the most exciting part about Postgres isn’t how it continues to advance itself, rather it is how Postgres has shifted itself from simply a relational database to more of a data platform. The largest driver for this shift to being a data platform is Postgres extensions. Postgres extensions in simplified terms are lower level APIs that exist within Postgres that allow to change or extend it’s functionality. These extension hooks allow Postgres to be adapted for new use cases without requiring upstream changes to the core database. This is a win in two ways:

  1. The Postgres core can continue to move at a very safe and stable pace, ensuring a solid foundation and not risking your data.
  2. Extensions themselves can move quickly to explore new areas without the same review process or release cycle allowing them to be agile in how they evolve.

Okay, plug-ins and frameworks aren’t new when it comes to software, what is so great about extensions for Postgres? Well they may not be new to software, but they’re not new to Postgres either. Postgres has had extensions as long as I can remember. In Postgres 9.1 we saw a new sytax to make it easy to CREATE EXTENSION and since that time the ecosystem around them has grown. We have a full directory of extensions at PGXN. Older forks such as which were based on older versions are actively working on catching up to a modern release to presumably become a pure extension. By being a pure exten

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Integrate recovery.conf into postgresql.conf

planet postgresql - 2018-11-28(水) 05:29:53
On 25th of November 2018, Peter Eisentraut committed patch: Integrate recovery.conf into postgresql.conf   recovery.conf settings are now set in postgresql.conf (or other GUC sources). Currently, all the affected settings are PGC_POSTMASTER; this could be refined in the future case by case.   Recovery is now initiated by a file recovery.signal. Standby mode is initiated … Continue reading "Waiting for PostgreSQL 12 – Integrate recovery.conf into postgresql.conf"
カテゴリー: postgresql

Tomas Vondra: Sequential UUID Generators

planet postgresql - 2018-11-28(水) 01:10:15

UUIDs are a popular identifier data type – they are unpredictable, and/or globally unique (or at least very unlikely to collide) and quite easy to generate. Traditional primary keys based on sequences won’t give you any of that, which makes them unsuitable for public identifiers, and UUIDs solve that pretty naturally.

But there are disadvantages too – they may make the access patterns much more random compared to traditional sequential identifiers, cause WAL write amplification etc. So let’s look at an extension generating “sequential” UUIDs, and how it can reduce the negative consequences of using UUIDs.

Let’s assume we’re inserting rows into a table with an UUID primary key (so there’s a unique index), and the UUIDs are generated as random values. In the table the rows may be simply appended at the end, which is very cheap. But what about the index? For indexes ordering matters, so the database has little choice about where to insert the new item – it has to go into a particular place in the index. As the UUID values are generated as random, the location will be random, with uniform distribution for all index pages.

This is unfortunate, as it works against adaptive cache management algorithms – there is no set of “frequently” accessed pages that we could keep in memory. If the index is larger than memory, the cache hit ratio (both for page cache and shared buffers) is doomed to be poor. And for small indexes, you probably don’t care that much.

Furthermore, this random write access pattern inflates the amount of generated WAL, due to having to perform full-page writes every time a page is modified for the first time after a checkpoint. (There is a feedback loop, as FPIs increase the amount of WAL, triggering checkpoints more often – which then results in more FPIs generated, …)

Of course, UUIDs influence read patterns too. Applications typically access a fairly limited subset of recent data. For example, an e-commerce site mostly ares about orders from the last couple of days, rarely accessing data beyond this

カテゴリー: postgresql