planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 25分 31秒 前

Mark Wong: PostgreSQL Award

2019-07-24(水) 00:00:58
PostgreSQL was presented with the 2019 O’Reilly Open Source Award for Lifetime Achievement in Portland, Oregon, Thursday July 18th during OSCON. I had the honor of accepting that award alongside Bruce Momjian and Christophe Pettus. Starting last year in 2018, these awards were presented to projects and the first Lifetime Achievement award was given to […]
カテゴリー: postgresql

Avinash Kumar: Automatic Index Recommendations in PostgreSQL using pg_qualstats and hypopg

2019-07-23(火) 01:13:59

In my previous blog post, we have seen how the extension hypopg can be helpful in creating hypothetical indexes in PostgreSQL. If you have read it already, you know that it is very useful in verifying whether an Index can improve the performance of an SQL without having to create it in reality. Considering this, is there also a way to automatically suggest which indexes can improve the performance of some of your SQL’s? The answer is… YES!

In this blog post, we will see how we can get a direct recommendation to improve a specific query, as seen in the following snippet.

query | recmnded_index | percent_improvd ---------------------------------------------------+------------------------------------------------+----------------- select * from foo.bar where id2 = $1 and id4 = $2 | CREATE INDEX ON foo.bar USING btree (id2, id4) | 99.96 select * from foo.bar where id3 = $1 | CREATE INDEX ON foo.bar USING btree (id3) | 99.93 (2 rows)

Without any further delay, let’s discuss the extension

pg_qualstatswhich enables us to achieve this requirement for PostgreSQL versions 9.4 or later. Following that, we will take a look at a logic which could automatically suggest what indexes would be helpful for query optimizations – without much manual work.

All of the logic discussed in this blog post is reproducible, so please feel free to do so using the commands and try tuning your custom queries in test environments.

pg_qualstats

pg_qualstats is an extension developed by the POWA Team to uncover the need for storing predicates (quals). It stores the predicates found in WHERE clauses and JOIN conditions. This helps us analyze a query execution and opens up the possibility of automatic query optimizations.

When we query

pg_stat_statementslike extensions, we only see the prepared SQL or the SQL without any bind variable values. As you cannot perform an EXPLAIN (to see the execution plan of a Query) without the qu[...]
カテゴリー: postgresql

Jonathan Katz: Certificate Authentication Recipe for PostgreSQL Docker Containers

2019-07-23(火) 00:32:32

One of the reasons that PostgreSQL supports many authentication methods is to help ensure that it can work with multiple external identity management providers. While a lot of people are familiar with having PostgreSQL request a password for logging in, there are other 

One method that can be used in larger enterprise environments is using certificates to authenticate between a PostgreSQL client (or user) and the server. Using certificates for authentication can have a lot of advantages, as they can provide a way to verify a user without requiring a password! In other words, you can avoid some of the challenges of having to secure shared passwords that are distributed across your infrastructure. 

This articles on getting your own certificate-based authentication system set up in a PostgreSQL container with OpenSSL by going through the steps of setting up such a configuration and then exploring the extra variables you need to set in order to authenticate. If you're interested in setting up a production-level system, my colleague Stephen Frost gave an excellent talk at PGCon 2019 on  how to set up an enterprise certificate-based authentication system  (while also spending significant time on authenticating to PostgreSQL via the GSS-API and Kerberos),

tl;dr: What's the recipe?
カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - Recovery changes

2019-07-22(月) 17:16:11

PostgreSQL 12 has changed the way recovery configuration works, and has introduced a couple of incompatible issues as mentioned in the main commit which has done the switch:

commit: 2dedf4d9a899b36d1a8ed29be5efbd1b31a8fe85 author: Peter Eisentraut date: Sun, 25 Nov 2018 16:31:16 +0100 Integrate recovery.conf into postgresql.conf recovery.conf settings are now set in postgresql.conf (or other GUC sources). Currently, all the affected settings are PGC_POSTMASTER; this could be refined in the future case by case. Recovery is now initiated by a file recovery.signal. Standby mode is initiated by a file standby.signal. The standby_mode setting is gone. If a recovery.conf file is found, an error is issued. The trigger_file setting has been renamed to promote_trigger_file as part of the move. The documentation chapter "Recovery Configuration" has been integrated into "Server Configuration". pg_basebackup -R now appends settings to postgresql.auto.conf and creates a standby.signal file. Author: Fujii Masao Author: Simon Riggs Author: Abhijit Menon-Sen Author: Sergei Kornilov Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/

From the point of view of code and maintenance, this has the huge advantage of removing the duplication caused by the parsing of recovery.conf which followed its own set of rules that were rather close to what is used for the generic GUC parameters, and this adds all the benefits behind GUCs, as it becomes possible:

  • To reload parameters.
  • To monitor the values with SHOW.
  • To apply changes with ALTER SYSTEM.

However this introduces a set of compatibility changes to be aware of in order to adapt to the new rules.

First note that upgrading a standby instance to 12 and newer versions will cause an immediate failure, as the startup process would complain about the presence of recovery.conf:

FATAL: using recovery command [...]
カテゴリー: postgresql

Hans-Juergen Schoenig: PostgreSQL: How are functions and stored procedures replicated?

2019-07-22(月) 16:46:22

When looking at our PostgreSQL support activities, we notice that people often ask about functions, stored procedures and replication. Are functions replicated? How are functions replicated? What happens to shared libraries and do we have to install an extension everywhere? This topic seems to be really interesting to many people and so I decided to write a small post about this issue to hopefully shed some light on it.

How PostgreSQL “stores” functions and procedures

To understand how PostgreSQL “replicates” functions, it is first important to understand how functions and procedures are actually stored. To demonstrate this, I have created a small function that just returns a number and quits:

test=# CREATE FUNCTION demo(int) RETURNS int AS $$ SELECT 1; $$ LANGUAGE 'sql'; CREATE FUNCTION

PostgreSQL stores the code of a function in a system table. In case the function is written in an “interpreted” language, the code is just stored there in plain text. The following listing shows, how this works:

test=# SELECT proname, prolang, prokind, prosrc, probin FROM pg_proc WHERE proname = 'demo'; proname | prolang | prokind | prosrc | probin ---------+---------+---------+-------------+-------- demo | 14 | f | SELECT 1; | (1 row)

What we see here is that “demo” has been written in language number 14 (which happens to be SQL), it is a “function” and not a procedure. Then there is the code of the function. Note that “probin” is empty – I will get back to that one a bit later. Now: What does that imply? If you use streaming replication (perhaps together with Patroni or some other tools) the system catalog will be replicated just like any other table. The function will therefore be sent to the remote side just like any other change. So the function will be fully replicated and there is nothing else to do.

Dealing with compiled stored procedure languages

What happens if you want to use compiled functions?

[...]
カテゴリー: postgresql

Luca Ferrari: yum upgrade postgresql11 panic!

2019-07-22(月) 09:00:00

I have to say, I don’t use CentOS very much and I’m not a good user of systemd, that is the reason why I got five minutes of pure fear!

yum upgrade postgresql11 panic!

How hard could it be to upgrade PostgreSQL within minor versions?
Usually it is very simple, and it is very simple but not when you don’t know your tools!
And in this case that’s my fault.
However, I’m writing this short note in order to avoid other people experience the same problem I had.

The current setup

The machine is a CentOS 7 running PostgreSQL 11.1 installed by packages provided by the PostgreSQL Global Development Group.

Preparing to upgrade

Of course, I took a full backup before proceeding, just in case. The cluster I’m talking about is a low traffice cluster with roughly ~12 GB~ of data, that is the backup and restore are not a zero downtime (and no, I’m not in the position of having a WAL based backup, but that’s another story).
Having a backup helps keeping the amount of panic at a fair level.

Performing the upgrade

I do like yum(8) and its transactional approach. Doing the upgrade was a matter of:

% sudo yum upgrade postgresql11

and all dependencies are, of course, calculated and applied. Then I confirmed, waited a couple of minutes for the upgrade to apply, and I started keeping my breath:

psql: could not connect to server: Connection refused Is the server running on host "xxx" (192.168.222.123) and accepting TCP/IP connections on port 5432? Inspecting and solving the problem

Apparently PostgreSQL has not been restarted after the upgrade,...

カテゴリー: postgresql

Stefan Fercot: pgBackRest S3 configuration

2019-07-19(金) 09:00:00

pgBackRest is a well-known powerful backup and restore tool.

While the documentation describes all the parameters, it’s not always that simple to imagine what you can really do with it.

In this post, I will introduce some of the parameters needed to configure the access to an Amazon S3 bucket.

MinIO

For the purpose of this demo setup, we’ll install a MinIO server, which is an Amazon S3 Compatible Object Storage.

To do so, I followed the guide of centosblog.com.

Installation

On a fresh CentOS 7 server:

$ sudo useradd -s /sbin/nologin -d /opt/minio minio $ sudo mkdir -p /opt/minio/bin $ sudo mkdir -p /opt/minio/data $ sudo yum install -y wget $ sudo wget https://dl.min.io/server/minio/release/linux-amd64/minio -O /opt/minio/bin/minio $ sudo chmod +x /opt/minio/bin/minio $ cat<<EOF | sudo tee "/opt/minio/minio.conf" MINIO_VOLUMES=/opt/minio/data MINIO_DOMAIN=minio.local MINIO_OPTS="--certs-dir /opt/minio/certs --address :443 --compat" MINIO_ACCESS_KEY="accessKey" MINIO_SECRET_KEY="superSECRETkey" EOF $ sudo chown -R minio:minio /opt/minio

MinIO is installed in /opt/minio with a specific system user. The domain name we’ll use will be minio.local and the bucket name will be pgbackrest.

The data will be stored in /opt/minio/data.

We then have to setup the hosts file accordingly:

$ cat<<EOF | sudo tee "/etc/hosts" 127.0.0.1 pgbackrest.minio.local minio.local s3.eu-west-3.amazonaws.com EOF Https

Since we’ll need to run MinIO in https mode to be able to work with pgBackRest, let’s create some self-signed certificates:

$ mkdir ~/certs $ cd ~/certs $ openssl genrsa -out ca.key 2048 $ openssl req -new -x509 -extensions v3_ca -key ca.key -out ca.crt -days 99999 -subj "/C=BE/ST=Country/L=City/O=Organization/CN=some-really-cool-name" $ openssl genrsa -out server.key 2048 $ openssl req -new -key server.key -out server.csr -subj "/C=BE/ST=Country/L=City/O=Organization/CN=some-really-cool-name" $ openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreate[...]
カテゴリー: postgresql

Jeff McCormick: Using the PostgreSQL Operator with Rook Ceph Storage

2019-07-19(金) 06:43:00

The Crunchy PostreSQL Operator supports various forms of storage for provisioning PostgreSQL clusters in a Kubernetes environment. One such provider is Rook, which provides an abstract layer around multiple storage systems available in Kubernetes, which makes it even more convenient to choose between multiple storage engines. One storage engine that Rook supports is Ceph, which provides several types of distributed storage platforms including block-level storage, which is very helpful for expanding cloud-based workloads.

This post explores the use of the Rook storage engine with the PostgreSQL Operator, specifically demonstrating how the PostgreSQL Operator can be utilized to create a PostgreSQL cluster that is backed by Rook Ceph blockstorage.

For this example the rook-ceph-block storage class will be created and utilized in conjunction with the PostgreSQL Operator to dynamically provision Ceph block storage for use by a PostgreSQL cluster and it's supporting services. This will effectively demonstrate how Rook can be utilized to deploy a Ceph cluster in your Kubernetes environment, therefore allowing you to leverage the power of Ceph storage, e.g. highly-available and scalable block storage, in your PostgreSQL clusters.

Many thanks to Andrew L'Ecuyer for helping with the methodology and testing that this post presents. For more information about PostgreSQL Operator storage configuration, please see the  documentation.

カテゴリー: postgresql

Konstantin Evteev: Standby in production: scaling application in second largest classified site in the world.

2019-07-18(木) 23:29:42
Standby in production: scaling application in the second largest classified site in the world

Hi. My name is Konstantin Evteev, I’m a DBA Unit Leader of Avito. Avito is the biggest Russian classified site, and the second largest classified site in the world (after Craigslist of USA). Items offered for sale on Avito can be brand new or used. The website also publishes job vacancies and CVs.

Via its web and mobile apps, the platform monthly serves more than 35 million users. They add approximately a million new ads a day and close over 100,000 transactions per day. The back office has accumulated more than a billion ads. According to Yandex, in some Russian cities (for example, in Moscow), Avito is considered a high load project in terms of page views. Some figures can give a better idea of the project’s scale:

  • 600+ servers;
  • 4.5 Gbit/sec TX, 2 Gbit/sec RX without static;
  • about a million queries per minute to the backend;
  • 270TB of images;
  • >20 TB in Postgres on 100 nodes:
  • 7–8K TPS on most nodes;
  • the largest — 20k TPS, 5 TB.

At the same time, these volumes of data need not only to be accumulated and stored but also processed, filtered, classified and made searchable. Therefore, expertise in data processing is critical for our business processes.

The picture below shows the dynamic of pageviews growth.

Our decision to store ads in PostgreSQL helps us to meet the following scaling challenges: the growth of data volume and growth of number of requests to it, the scaling and distribution of the load, the delivery of data to the DWH and the search subsystems, inter-base and internetwork data synchronization, etc. PostgreSQL is the core component of our architecture. Reach set of features, legendary durability, built-in replication, archive, reserve tools are found a use in our infrastructure. And professional community helps us to effectively use all these features.

In this report, I would like to share Avito’s experience in different cases of s

[...]
カテゴリー: postgresql

Sebastian Insausti: Scaling Postgresql for Large Amounts of Data

2019-07-18(木) 18:28:01

Nowadays, it’s common to see a large amount of data in a company’s database, but depending on the size, it could be hard to manage and the performance could be affected during high traffic if we don’t configure or implement it in a correct way. In general, if we have a huge database and we want to have a low response time, we’ll want to scale it. PostgreSQL is not the exception to this point. There are many approaches available to scale PostgreSQL, but first, let’s learn what scaling is.

Scalability is the property of a system/database to handle a growing amount of demands by adding resources.

The reasons for this amount of demands could be temporal, for example, if we’re launching a discount on a sale, or permanent, for an increase of customers or employees. In any case, we should be able to add or remove resources to manage these changes on the demands or increase in traffic.

In this blog, we’ll look at how we can scale our PostgreSQL database and when we need to do it.

Horizontal Scaling vs Vertical Scaling

There are two main ways to scale our database...

  • Horizontal Scaling (scale-out): It’s performed by adding more database nodes creating or increasing a database cluster.
  • Vertical Scaling (scale-up): It’s performed by adding more hardware resources (CPU, Memory, Disk) to an existing database node.

For Horizontal Scaling, we can add more database nodes as slave nodes. It can help us to improve the read performance balancing the traffic between the nodes. In this case, we’ll need to add a load balancer to distribute traffic to the correct node depending on the policy and the node state.

Related resources  ClusterControl for PostgreSQL  Scaling Connections in PostgreSQL using Connection Pooling  How to Deploy PostgreSQL for High Availability

To avoid a single point of failure adding only one lo

[...]
カテゴリー: postgresql

Luca Ferrari: Checking PostgreSQL Version in Scripts

2019-07-18(木) 09:00:00
Unable to clean HTML
カテゴリー: postgresql

Craig Kerstiens: Postgres tips for the average and power user

2019-07-17(水) 23:05:00

Personally I’m a big fan of email, just like blogging. To me a good email thread can be like a good novel where you’re following along always curious for what comes next. And no, I don’t mean the ones where there is an email to all-employees@company.com and someone replies all, to only receive reply-all’s to not reply-all. I mean ones like started last week internally among the Azure Postgres team.

The first email was titled: Random Citus development and psql tips, and from there it piled on to be more and more tips and power user suggestions for Postgres. Some of these tips are relevant if you’re working directly on the Citus codebase, others relevant as anyone that works with Postgres, and some useful for debugging Postgres internals. While the thread is still ongoing here is just a few of the great tips:

In psql, tag your queries and use Ctrl+R

Psql supports Ctrl+R to search previous queries you ran. For demos and when testing complex scenarios, I like adding a little comment to queries that then becomes the tag by which I can later find the query:

# SELECT count(*) FROM test; -- full count ┌───────┐ │ count │ ├───────┤ │ 0 │ └───────┘ (1 row) Time: 127.124 ms (reverse-i-search)`f': SELECT count(*) FROM test; -- full count

In most cases, 2-3 letters is going to be enough to find the query.

Better psql output

I find \x lacking, but pspg is great. It is available from PGDG via sudo yum install -y pspg or the equivalent on your system. I have the following .psqlrc which sets up pspg with a very minimalistic configuration:

$ cat > ~/.psqlrc \timing on \pset linestyle unicode \pset border 2 \setenv PAGER 'pspg --no-mouse -bX --no-commandbar --no-topbar' \set HISTSIZE 100000 Get a stack trace for an error

In psql:

# SELECT pg_backend_pid(); ┌────────────────┐ │ pg_backend_pid │ ├────────────────┤ │ 156796 │ └────────────────┘ (1 row)

In another shell:

$ gdb -p 156796 (gdb) b errfinish Breakpoint 1 at 0x83475b: file elog.c, line 251. (gdb) c Continuing.

Ba

[...]
カテゴリー: postgresql

Kaarel Moppel: The mysterious “backend_flush_after” configuration setting

2019-07-17(水) 16:30:06

The above-mentioned PostgreSQL server configuration parameter was introduced already some time ago, in version 9.6, but has been flying under the radar so to say and had not caught my attention previously. Until I recently was pasted (not being on Twitter) a tweet from one of the Postgres core developers Andres Freund, that basically said – if your workload is bigger than Shared Buffers, you should enable the “ backend_flush_after” parameter for improved throughput and also jitter. Hmm, who wouldn’t like an extra boost on performance for free? FOMO kicked in… but before adding this parameter to my “standard setup toolbox” I hurried to test things out – own eye is king! So here a small test and my conclusion on effects of enabling (not enabled by default!) “backend_flush_after”.

What does this parameter actually do?

Trying to interpret the documentation (link here) in my own wording – “backend_flush_after” is basically designed to enable sending “hints” to the OS, that if user has written more than X bytes (configurable from 0 to max. 2MB) it would be very nice if the kernel could already do some flushing of recently changed data files in the background, so that when the “checkpointer” comes or the kernel’s “dirty” limit is reached, there would be less bulk “fsyncing” to do – meaning less IO contention (spikes) for our user sessions, thus smoother response times.

Be warned though – unlike most Postgres settings this one actually is not guaranteed to function, and currently only can work on Linux systems, having sync_file_range() functionality available – which again depends on kernel version and used file system. So in short this explains why the parameter has not gotten too much attention. Similar story actually also with the “sister” parameters – “bgwriter_flush_after”, “checkpoint_flush_after”, “wal_writer_flush_after”…with the difference that they are already enabled by default!

NB! Also note that this parameter, being controlled and initiated by Postgres, might be the only way to influence

[...]
カテゴリー: postgresql

Luca Ferrari: Suggesting Single-Column Primary Keys (almost) Automatically

2019-07-17(水) 09:00:00
Unable to clean HTML
カテゴリー: postgresql

Jobin Augustine: BRIN Index for PostgreSQL: Don’t Forget the Benefits

2019-07-17(水) 01:30:22

BRIN Index was introduced in PostgreSQL 9.5, but many users postponed the usage of it in their design and development just because it was “new”. But now we understand that it has stood the test-of-time! It is time to reconsider BRIN if you have not done it yet. I often see users who forget there is a provision to select the type of Index by specifying USING clause when creating an index.

BRIN Index is a revolutionary idea in indexing first proposed by PostgreSQL contributor Alvaro Herrera. BRIN stands for “Block Range INdex”. A block range is a group of pages adjacent to each other, where summary information about all those pages is stored in Index.  For example, Datatypes like integers – dates where sort order is linear – can be stored as min and max value in the range. Other database systems including Oracle announced similar features later. BRIN index often gives similar gains as Partitioning a table.

BRIN usage will return all the tuples in all the pages in the particular range. So the index is lossy and extra work is needed to further filter out records. So while one might say that is not good, there are a few advantages.

  1. Since only summary information about a range of pages is stored, BRIN indexes are usually very small compared to B-Tree indexes. So if we want to squeeze the working set of data to shared_buffer, this is a great help.
  2. Lossiness of BRIN can be controlled by specifying pages per range (discussed in a later section)
  3. Offloads the summarization work to vacuum or autovacuum. So the overhead of index maintenance on transaction / DML operation is minimal.
Putting BRIN into a test

Let’s take a simple example to examine the benefits of BRIN index by creating a simple table.

postgres=# CREATE TABLE testtab (id int NOT NULL PRIMARY KEY,date TIMESTAMP NOT NULL, level INTEGER, msg TEXT); CREATE TABLE

Now let’s Insert some data into this table.

postgres=# INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP + ( g || 'minute' ) :: interval, random[...]
カテゴリー: postgresql

Kirk Roybal: Partitioning enhancements in PostgreSQL 12

2019-07-16(火) 01:18:50
Declarative partitioning got some attention in the PostgreSQL 12 release, with some very handy features. There has been some pretty dramatic improvement in partition selection (especially when selecting from a few partitions out of a large set), referential integrity improvements, and introspection. In this article, we’re going to tackle the referential integrity improvement first. This […]
カテゴリー: postgresql

Jonathan Katz: How to Upgrade Your PostgreSQL Passwords to SCRAM

2019-07-13(土) 03:27:00

In a lot of PostgreSQL environments, it’s common practice to protect user accounts with a password. Starting with PostgreSQL 10, the way PostgreSQL manages password-based authentication got a major upgrade with the introduction of SCRAM authentication, a well-defined standard that is a significant improvement over the current system in PostgreSQL. What’s better is that almost all PostgreSQL drivers now support this new method of password authentication, which should help drive further adoption of this method.

While it may be easy to take advantage of SCRAM authentication in new PostgreSQL deployments, there are a few steps involved in upgrading your existing systems to utilize this method. This article will briefly explain how SCRAM works in PostgreSQL (to try to encourage you to upgrade!) and then walk you through the steps of how to upgrade your existing PostgreSQL clusters to use SCRAM authentication.

A Very Brief Overview of SCRAM
カテゴリー: postgresql

Kaarel Moppel: Implementing Autonomous Transactions in Postgres

2019-07-12(金) 16:30:13

Having recently witnessed quite an ingenious hack to implement some good old “println” style development debugging from stored procedures into a file, it prompted me to post knowledge to the Interwebs on two other ways how such a goal can be implemented more transparently. Also with help of some other good old legacy technology in one case. By the way, the main reason for them going for the hack was that being relatively new to databases they didn’t know how to name this thing that they were implementing, so Google wasn’t able to help – once again proof that naming things is one of the hardest problems of computing

What’s an “autonomous transaction”?

But to start with let’s explain what is an “autonomous transaction” as most people working with databases probably haven’t needed / heard of them and actually they’re not a standard or anything also, thus not too supported by various database engines. The phrase itself comes from the Oracle world I believe and it basically denotes “fire and forget” (sub) transactions that are not connected to the main transactions. And sadly also Postgres does not have direct built-in support for that – you’re always in a real transaction and need to invent a bit if the need arises.

And what would be the common use case? Mostly some logging / auditing / progress tracking into tables, in such a way that the information on the attempt would persist even when the main transaction is rolled back due to an error for example. Remember – in a standard transaction everything is thrown away in case no special measures (savepoints or exception handling sub-blocks in stored procedures) are taken.

The hacky way

So how did the initial implementation that wowed me a bit, looked like?

CREATE FUNCTION public.log(appid text, msg text) RETURNS void LANGUAGE plpgsql SECURITY DEFINER AS $function$ BEGIN execute $$ copy (select $$ || quote_literal(msg) || $$) to program 'tee -a /tmp/$$ ||appid|| $$.log' $$; END; $function$;

Not bad – it works and is relatively short and

[...]
カテゴリー: postgresql

Magnus Hagander: PGConf.EU 2019 - Registration and training open

2019-07-12(金) 00:19:19

PostgreSQL Conference Europe 2019 in Milan, Italy, on October 15-18 is now open for registrations.

The Tuesday training sessions have also been finalized, and are now available for registration. Six sessions in a mix of full and half day sessions are available. Training sessions are available at an extra cost of €150 per half day. Attendees of training sessions will also receive a €90 discount on the regular conference fee. Seats to these sessions are limited to smaller groups, so make sure you register early!

Our call for papers is still open! If you have already submitted a talk, or are planning to submit one, we suggest you wait to register until you have received a confirmation if the talk was accepted or not. If your talk is accepted, attendance is of course free!

See you in Milan!

カテゴリー: postgresql

Vasilis Ventirozos: Comparing pg 9.4 with pg 12, CTE edition

2019-07-11(木) 22:14:00
Postgres 12 is around the corner, and as always is packed with new features. One of them being the option to allow the user to control the behaviour of CTE materialization. You can find the commit from Tom Lane here, which explains everything about this new change but TLDR is that so far, CTE's where fully materialized, so restrictions from the query that uses it won't apply to the CTE. Which is the right way doing this when you are using CTEs to INSERT/UPDATE/DELETE or when they are recursive. This means that when a CTE is side-effect-free and non-recursive it's safe to push the restrictions from the outer query.
So, from postgres 12, when it's safe or when the CTE is called only once, postgres will inline the CTE to the outer query, removing the optimization fence. User will be able to override this behaviour by using  MATERIALIZED / NOT MATERIALIZED keywords.

Here's an example :

drop table if exists test_cte;
create table test_cte as
select generate_series(1,1000000) as id,
floor(random() * 10 + 1)::int as random;
-- OLD WAY
EXPLAIN analyze
WITH a_cte AS MATERIALIZED
(
SELECT random,count(*) FROM test_cte group by random
)
SELECT * FROM a_cte WHERE random = 5;
-- NEW
EXPLAIN analyze
WITH a_cte AS NOT MATERIALIZED
(
SELECT random,count(*) FROM test_cte group by random
)
SELECT * FROM a_cte WHERE random = 5;

It's not really important to show all the differences in explain plans but rather to see how many rows it had to process in order to create the CTE by applying the condition directly:

-- Materialized (old):
-> Parallel Seq Scan on test_cte (cost=0.00..8591.88 rows=416688 width=4) (actual time=0.031..19.077 rows=333333 loops=3)

-- Not Materialized (New):
-> Parallel Seq Scan on test_cte (cost=0.00..9633.59 rows=2083 width=4) (actual time=0.021..24.469 rows=33222 loops=3)
Filter: (random = 5)
Rows Removed by Filter: 300112

A production system running a 2TB on 9.4 with a dev that has a lot of free diskspace that we plan upgrading soon  ma[...]
カテゴリー: postgresql

ページ