planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 1時間 52分 前

Sebastian Insausti: How to Create a Single Endpoint for your PostgreSQL Replication setup using HAProxy

2018-10-31(水) 23:41:48

Managing traffic to the database can get harder and harder as it increases in amount and the database is actually distributed across multiple servers. PostgreSQL clients usually talk to a single endpoint. When a primary node fails, the database clients will keep retrying the same IP. In case you have failed over to a secondary node, the application needs to be updated with the new endpoint. This is where you would want to put a load balancer between the applications and the database instances. It can direct applications to available/healthy database nodes and failover when required. Another benefit would be to increase read performance by using replicas effectively. It is possible to create a read-only port that balances reads across replicas. In this blog, we will cover HAProxy. We’ll see what is, how it works and how to deploy it for PostgreSQL.

What is HAProxy?

HAProxy is an open source proxy that can be used to implement high availability, load balancing and proxying for TCP and HTTP based applications.

As a load balancer, HAProxy distributes traffic from one origin to one or more destinations and can define specific rules and/or protocols for this task. If any of the destinations stops responding, it is marked as offline, and the traffic is sent to the rest of the available destinations.

How to install and configure HAProxy manually

To install HAProxy on Linux you can use the following commands:

On Ubuntu/Debian OS:

$ apt-get install haproxy -y

On CentOS/RedHat OS:

$ yum install haproxy -y

And then we need to edit the following configuration file to manage our HAProxy configuration:

$ /etc/haproxy/haproxy.cfg

Configuring our HAProxy is not complicated, but we need to know what we are doing. We have several parameters to configure, depending on how we want HAProxy to work. For more information, we can follow the documentation about the HAProxy configuration.

Let's look at a basic configuration example. Suppose that you have the following database topology:

Database Topology Example

We want to create a HA

カテゴリー: postgresql

Rafia Sabih: Using parallel sequential scan in PostgreSQL

2018-10-31(水) 22:29:00
Parallel sequential scan is the first parallel access method in PostgreSQL and is introduced in version 9.6.  The committer of this feature and my colleague at EnterpriseDB Robert Haas wrote an awesome blog on it, there is another great blog by another PostgreSQL committer and my colleague Amit Kapila. Both of these blogs explain this access method, its design, usage, and related parameters. 
Still, I could not help but notice that there are curiosities around the usage of this access method. Every now and then I could see a complaint saying parallel sequential scan is not getting selected or it is degrading the performance of a query.  So, I decided to write this blog to cater more practical scenarios and specifically focus on its less talked about aspect  -- where parallel sequential scan would (should) not improve the performance.
Before diving into the details of parallel SeqScan, let's first understand the basic infrastructure and terminology related to it in PostgreSQL. The processes that run in parallel and scan the tuples of a relation are called parallel workers or workers in short. There is one special worker namely leader which co-ordinates and collects the output of the scan from each  of the worker. This worker may or may not participate in scanning the relation depending on it's load in dividing and combining processes. End users can also control the involvement of leader in relation scan by GUC parameter parallel_leader_participation, it is a boolean parameter. 
Now, let's understand the concept of parallel scan in PostgreSQL by a simple example.
  • Let there be a table T (a int, b int) containing 100 tuples
  • Let's say we have two workers and one leader,
  • Cost of scanning one tuple is 10
  • Cost of communicating a tuple from worker to leader is 20
  • Cost of dividing the tuples among workers is 30
  • For simplicity, let's assume that leader gives 50 tuples to each of the worker
Now, let's analyse if parallel scan will be faster than non parallel scan,
Cost of SeqScan = 10*100 = 1000
Cost of Parallel SeqSc[...]
カテゴリー: postgresql

Liaqat Andrabi: Webinar: PostgreSQL is NOT your traditional SQL database [Follow Up]

2018-10-31(水) 16:00:58

PostgreSQL is referred to as “The world’s most advanced open source database” – but what does PostgreSQL have that other open source relational databases don’t?  

2ndQuadrant recently hosted a webinar on this very topic: PostgreSQL is NOT your traditional SQL database, presented by Gülçin Yıldırım Jelínek, Cloud Services Manager at 2ndQuadrant.

The recording of the webinar is now available here.

Questions that Gülçin couldn’t respond to during the live webinar have been answered below.

Q1: What exactly is the role of postgresql for a marketplace like ebay or rakuten?

A1: This question is not very clear. If the question is about whether Postgres can be used in an e-commerce website, the answer is yes.


Q2: I’m in process of switching from MS SQL Server to Postgres and I have an issue:
Simple search in text columns with diacritics chart.

Ex: table Person

When I search:

SELECT * FROM pers WHERE Name LIKE 'ste%';

I want to retrieve all records from above.
In SQL Server there’s a simple way to accomplish this – I use: COLLATE Latin1_General_100_CI_AI when I define column, and that it’s.
Do you have recommendations to accomplish the same task in Postgres?

A2: The unaccenting collations are not supported in PostgreSQL. You can query like this to get the same result:

SELECT * FROM pers WHERE unaccent(Name) ILIKE 'ste%';

For any questions, comments, or feedback, please visit our website or send an email to

カテゴリー: postgresql

Quinn Weaver: Remember your history

2018-10-31(水) 14:56:00
PostgreSQL keeps track of which WAL files go with which timelines in small history files. Each time you make a base backup, a history file is born. The file is written once and never updated. It's a simple system, and it works well and silently.

In fact, sometimes it works a little too silently.

At PostgreSQL Experts we've run into the problem where a client's history files disappear because they are stored in S3, and there's a lifecycle configuration in place that says to move everything over a certain age to Glacier. That's a good policy for WAL files!

Unfortunately, it's not a good policy for history files: without the latest history file you can't restore the latest backup, and without past history files, you are unable to do PITR to certain points in time.

The solution we used was to move the whole WAL archive to S3 Standard-Infrequent Access storage, dissolving the problem with lifecycle configurations while controlling costs. But you could also fix this by editing the lifecycle configuration.

The important thing is this: hold on to all history files. They're tiny text files, and when you need them, you really need them. This is also a good reason to test restores, not just of the latest backup, but of database states at arbitrary points in time.

*    *    *
Addendum: another very common problem we see is WAL archives that become corrupted because a client accidentally pointed a two primaries at the same WAL archive (for instance, they might have copied a postgresql.conf file by hand, or via a DevOps tool like Puppet). In this case, the whole archive is corrupted, and you're best off starting with a fresh S3 bucket or an empty directory and doing a new base backup immediately.

One of the many nice features of pgBackRest is that it will notice this and prevent you from doing it. Fewer footguns → better backups.
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Foreign Key to partitioned table – part 2

2018-10-31(水) 10:45:14
Previously I wrote about how to create foreign key pointing to partitioned table. Final solution in there required four separate functions and four triggers for each key between two tables. Let's see how fast it is, and if it's possible to make it simpler. First bit – performance test (for making it simpler you will … Continue reading "Foreign Key to partitioned table – part 2"
カテゴリー: postgresql

Luca Ferrari: pgenv gets patching support

2018-10-31(水) 09:00:00

pgenv does now support a customizable patching feature that allows the user to define which patches to apply when an instance is built.

pgenv gets patching support

pgenv, the useful tool for managing several PostgreSQL installations, gets support for customizable patching.

What is all about? Well, it happens that you could need to patch PostgreSQL source tree before you build, and it could be because something on your operating system is different than the majority of the systems PostgreSQL is built against. Nevermind, you need to patch it!

pgenv did support a very simple patching mechanism hardcoded within the program itself, but during the last days I worked on a different and more customizable approach. The idea is simple: the program will apply every patch file listed in an index for the particular version. So, if you want to build the outshining 11.0 and need to patch it, build an index text file and list there all the patches, and the pgenv build process will apply them before compiling.

Of course, what if you need to apply the same patches over and over to different versions? You will end up with several indexes, one for each version you need to patch. Uhm…not so smart! To avoid this, I designed the patching index selection in a way that allows you to group patches for operating system and brand.

Allow me to explain more in detail with an example. Suppose you are on a Linux machine and need to patch version 11.0: the program will search for a file that matches any of the following:

$PGENV_ROOT/patch/index/patch.11.0.Linux $PGENV_ROOT/patch/index/patch.11.0 $PGENV_ROOT/patch/index/patch.11.Linux $PGENV_ROOT/patch/index/patch.11

This desperate searching for works selecting the first file that matches the operating system and PostgreSQL version or a combination of the two...

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add pg_partition_tree to display information about partitions

2018-10-31(水) 05:33:03
On 30th of October 2018, Michael Paquier committed patch: Add pg_partition_tree to display information about partitions     This new function is useful to display a full tree of partitions with a partitioned table given in output, and avoids the need of any complex WITH RECURSIVE query when looking at partition trees which are deep … Continue reading "Waiting for PostgreSQL 12 – Add pg_partition_tree to display information about partitions"
カテゴリー: postgresql

Nickolay Ihalainen: PostgreSQL locking, part 3: lightweight locks

2018-10-30(火) 22:31:40

PostgreSQL lightweight locks, or LWLocks, control memory access. PostgreSQL uses multi-process architecture and should allow only consistent reads and writes to shared memory structures. LWLocks have two levels of locking: shared and exclusive. It’s also possible to release all acquired LWLocks to simplify clean up. Other databases often call primitives similar to LWLocks “latches”. Because LWLocks is an implementation detail, application developers shouldn’t pay much attention to this kind of locking.

This is the third and final part of a series on PostgreSQL locking, related to latches protecting internal database structures. Here are the previous parts: Row-level locks and table-level locks.


Starting from PostgreSQL 9.6, LWLocks activity can be investigated with the pg_stat_activity system view. It could be useful under high CPU utilization. There are system settings to help with contention on specific lightweight locks.

Before PostgreSQL 9.5, the LWLocks implementation used spin-locks.  It was a bottleneck. This was fixed in 9.5 with atomic state variable.

Potential heavy contention places
  • WALInsertLock: protects WAL buffers. You can increase the number of wal buffers to get a slight improvement. Incidentally, synchronous_commit=off increases pressure on the lock even more, but it’s not a bad thing. full_page_writes=off reduces contention, but it’s generally not recommended.
  • WALWriteLock: accrued by PostgreSQL processes while WAL records are flushed to disk or during a WAL segments switch. synchronous_commit=off removes the wait for disk flush, full_page_writes=off reduces the amount of data to flush.
  • LockMgrLock: appears in top waits during a read-only workload. It latches relations regardless of its size. It’s not a single lock, but at least 16 partitions. Thus it’s important to use multiple tables during benchmarks and avoid single table anti-pattern in production.
  • ProcArrayLock: Protects the ProcArray structure. Before PostgreSQL 9.0, every transaction acquired this lock exclusively befo
カテゴリー: postgresql

Craig Kerstiens: The biggest mistake Postgres ever made

2018-10-30(火) 22:30:34

Postgres has experienced a long and great run. It’s over 20 years old and has a track record of being safe and reliable (which is the top thing I care about in a database). In recent years it’s become more cool with things like JSONB, JIT support, and a powerful extension ecosystem. But, Postgres has made some mistakes along the way, the most notable being it’s name.

Postgres gets its name from Ingress. Ingress was one of the first databases and was lead by Michael Stonebreaker who won a Turing award for Postgres and other works. Ingress began in the early 70s at UC Berkeley, which is still to this day known as a top university when it comes to databases. Out of Ingress came a number of databases you’ll still know today such as SQL Server and Sybase. It also as you may have guessed by now spawned Postgres which means Post-Ingress.

In the early days of Postgres there was no SQL. No not NoSQL, there was not SQL. Postgres had it’s own query language. It wasn’t until 1995 that Postgres received SQL support, and with its addition of SQL support it updated it’s name to PostgreSQL.

You see, with Postgres becoming PostgreSQL we began a journey of Postgres being mispronounced for it’s forseeable future and it is still currently the case. Is it really that big of an issue? Well it’s big enough that the PostgreSQL website has a FAQ including “How to pronounce PostgreSQL”. As it stands today there are two generally accepted names:

  • post-GRES-que-ell
  • Postgres

With one of the above there is far less confusion. And in fact I’m not the only one to share this opinion. Tom Lane is a major contributor to every Postgres release for more than the last decade. He’s one of the top 10 contributors to open source in general having worked on the JPEG/PNG/TIFF image formats before coming over to database land. Tom has this classic email in the PostgreSQL mailing list:


[>> Can i get data in postgre from non-postgre db? > The name is PostgreSQL or Postgres, not postgre. It might help to explain that the pronunciation is "post-g[...]
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add pg_promote function

2018-10-30(火) 22:08:49
On 25th of October 2018, Michael Paquier committed patch: Add pg_promote function     This function is able to promote a standby with this new SQL-callable function. Execution access can be granted to non-superusers so that failover tools can observe the principle of least privilege.   Catalog version is bumped.   Author: Laurenz Albe   … Continue reading "Waiting for PostgreSQL 12 – Add pg_promote function"
カテゴリー: postgresql

Kaarel Moppel: Peeking at query performance of the upcoming version 11 of PostgreSQL

2018-10-30(火) 18:00:25

Last weeks the first release candidate of the upcoming Postgres version 11 was released, stating that everything is going more or less as planned – great news! The 11 release is promising a lot of very cool stuff as usual, including – better partitioning, more parallization and JIT capabilities (which are not activated by default though). Besides those headline features there of course hundreds or rather thousands (`git diff –shortstat REL_10_5..REL_11_RC1` says “3192 files changed, 324206 insertions, 178499 deletions”) of smaller changes with some of them for sure boosting performance also on some typical queries. Release notes didn’t state any numerical performance indicators though as usually, so people have to find it out on their own – so that’s what this blog post is about. I’m basically running 4 quite simple analytical/aggregate type of queries and laying out the numbers for you to evaluate – so jump to the summarizing table end of the post or get into the details by reading on.

Test Queries

3 out of 4 test queries I actually just carried over from the last year’s similar test for 9.6 vs 10 (check it out here if you’re into archeology) and added one query to test index traversing performance. Queries are all tied to the schema generated by our good old friend pgbench, with one minor adjustment – creating a copy of the pgbench_accounts table be able to simulate a join on two 1 million rows tables. This time I also didn’t disable the parallel features as they’re “on” by default for both versions and thus won’t be probably touched for most setups.

/* Sum up 50mio rows */ SELECT sum(abalance) FROM pgbench_accounts CROSS JOIN generate_series(1, 5) /* Unique columns joining */ SELECT count(*) FROM pgbench_accounts JOIN pgbench_accounts_copy using (aid) /* Using Grouping Sets analytical feature */ SELECT count(*) FROM (SELECT aid, bid, count(*) FROM pgbench_accounts GROUP BY CUBE (aid, bid)) a; /* B-tree index traversing */ SELECT COUNT(DISTINCT aid) FROM pgbench_accounts where aid % 2 = 0;

For pbench “sc

カテゴリー: postgresql

Quinn Weaver: It's just an expression

2018-10-30(火) 15:23:00
PostgreSQL has lots of great features for text search.

In particular, there are a bunch of ways to do case-insensitive searches for text:
  • There's standard SQL ILIKE… but than can be expensive — especially if you put %'s at both start and end — and it's overkill if you want an exact string match.
  • The same goes for case-insensitive regexp matching: overkill for simple case-insensitive matches. It does work with indexes, though!
  • Then there's the citext extension, which is pretty much the perfect answer. It lets you use indexes and still get case-insensitive matching, which is really cool. It Just Works.
OK, but what if you didn't have the foresight to use citext? And what if you don't want to go through the pain of changing the data type of that column? In a case like this, an expression index can be really handy.

Without an index, a case-insensitive match like this can be quite expensive indeed:
sandbox# select addy from email_addresses where lower(addy) = '';

                                               QUERY PLAN
 Seq Scan on email_addresses  (cost=0.00..1.04 rows=1 width=32) (actual time=0.031..0.032 rows=1 loops=1)
   Filter: (lower(addy) = ''::text)
   Rows Removed by Filter: 3
 Planning time: 0.087 ms
 Execution time: 0.051 ms
(5 rows)

(That's a sequential scan of the entire table, lowercasing the addy column of each row before comparing it to the desired address.)

And a regular index on email_addresses(addy) won't help, because the lower() operation forces a sequential scan.
But an expression index will do the trick:

sandbox# create index email_addresses__lower__addy on email_addresses (lower(addy));
sandbox# explain analyze select addy from email_addresses where lower(addy) = '';
                                                                  QUERY PLAN
カテゴリー: postgresql

damien clochard: Introducing PostgreSQL Anonymizer

2018-10-29(月) 19:17:36

I’ve just released today an extension called PostgreSQL Anonymizer that will mask or replace personally identifiable information (PII) or commercially sensitive data from a PostgreSQL database.

The project is open source and available here :

I strongly believe in a declarative approach of anonymization : the location of sensitive information inside database and the rules to hide this information should be declared directly using the Data Definition Language (DDL). In the age of GDPR, developpers should specify an anonymization strategy inside the table definitions, just like they specify data types, foreign keys and constraints.

This project is a prototype designed to show the power of implementing data masking directly inside PostgreSQL. Currently it is based on the COMMENT statement (probably the most unused PostgreSQL syntax) and an event trigger. In the near future, I’d like to propose a new syntax for dynamic data masking (MS SQL Server already has it)

The extension can be used to put dynamic masks on certain users or permanently modify sensitive data. Various masking techniques are available : randomization, partial scrambling, custom rules, etc.

Here’s a basic example :

Imagine a people table

=# SELECT * FROM people; id | name | phone ------+----------------+------------ T800 | Schwarzenegger | 0609110911
  1. Activate the masking engine
  1. Declare a masked user
  1. Declare the masking rules
=# COMMENT ON COLUMN IS 'MASKED WITH FUNCTION anon.random_last_name()'; =# COMMENT ON COLUMN IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';
  1. Connect with the masked user
=# \! psql test -U skynet -c 'SELECT * FROM people;' id | name | phone ------+----------+------------ T800 | Nunziata | 06******11

Of course this project is a work in progress. I need yo

カテゴリー: postgresql

brian davis: A Poor Man's Column Oriented Database in PostgreSQL

2018-10-29(月) 10:16:00

Let's get this out of the way. If you need a real column oriented database, use one. Don't even think about using the insanity below in production.

Having said that, I've been wondering for a while if it would be possible to demonstrate the properties of a column oriented db by simulating one using only native functionality found in good old, row oriented, PostgreSQL.

Two properties of a column oriented database that make it beneficial in OLAP environments are

  1. For queries that involve only a subset of a table's columns, it only needs to read the data for those columns off disk, and no others, saving on IO
  2. Storing each column separately means it can compress that data better since it's all of the same type, further reducing IO

The reduction in IO saves time for queries aggregating data over a large fraction of the table, improving performance.

PostgreSQL, of course, is row oriented. It stores and reads data a whole row at a time, and this is great when you want to access a majority of a table's columns for a small percentage of the table. i.e. OLTP type queries.

The challenge will be, can we structure things so PostgreSQL behaves more like a column oriented db?


The strategy will be to use multiple tables behind the scenes to store the data for each column individually, and throw a view on top to tie them all together and make it look like a single table. If PostgreSQL's query planner is smart enough, it should be able to see that queries involving only a subset of columns only need to access those particular underlying tables, save on IO, and beat out a traditional PostgreSQL table of the same shape. We can even allow for modifying this pseudo-table by creating INSTEAD OF triggers on the view to take INSERT, UPDATE, DELETE statements, chop them up, and perform the necessary operations on the underlying tables.

Here's how the underlying tables will look. We'll have a single 'primary table' that contains only the primary key...

Table "public.primary_table" Column | Type |[...]
カテゴリー: postgresql