Paul Ramsey: Notes for FDW in PostgreSQL 12

planet postgresql - 2019-03-26(火) 22:00:00

TL;DR: There are some changes in PostgresSQL 12 that FDW authors might be surprised by! Super technical, not suitable for ordinary humans.

OK, so I decided to update my two favourite extension projects (pgsql-http and pgsql-ogr-fdw) yesterday to support PostgreSQL 12 (which is the version currently under development likely to be released in the fall).

Fixing up pgsql-http was pretty easy, involving just one internal function signature change.

Fixing up pgsql-ogr-fdw involved some time in the debugger wondering what had changed.

Your Slot is Empty

When processing an FDW insert/update/delete, your code is expected to take a TupleTableSlot as input and use the data in that slot to apply the insert/update/delete operation to your backend data store, whatever that may be (OGR in my case). The data lived in the tts_values array, and the null flags in tts_isnull.

In PostgreSQL 12, the slot arrives at your ExecInsert/ExecUpdate/ExecDelete callback function empty! The tts_values array is populated with Datum values of 0, yet the tts_isnull array is full of true values. There’s no data to pass back to the FDW source.

What gives?!?

Andres Freund has been slowly laying the groundwork for pluggable storage in PostgreSQL, and one of the things that work has affected is TupleTableSlot. Now when you get a slot, it might not have been fully populated yet, and that is what is happening in the FDW code.

The short-term fix is just to force the slot to populate by calling slot_getallattrs, and then go on with your usual work. That’s what I did. A more future-proof way would be to use slot_getattr and only retrieve the attributes you need (assuming you don’t just need them all).

Your VarLena might have a Short Header

Varlena types are the variable size types, like text, bytea, and varchar. Varlena types store their length and some extra information in a header. The header is potentially either 4 bytes or 1 byte long. Practically it is almost always a 4 byte header. If you call the standard VARSIZE and VARDATA macros on a varlena,

カテゴリー: postgresql

Hans-Juergen Schoenig: Speeding up GROUP BY in PostgreSQL

planet postgresql - 2019-03-26(火) 18:00:52

In SQL the GROUP BY clause groups records into summary rows and turns large amounts of data into a smaller set. GROUP BY returns one records for each group. While most people know how to use GROUP BY not many actually know how to squeeze the last couple of percentage points out of the query. There is a small optimization, which can help you to speed up things by a couple of percents quite reliably. If you want to speed up GROUP BY clauses, this post is for you.

Creating a test data set in PostgreSQL

To prepare ourselves for the aggregation we first have to generate some data:

test=# CREATE TABLE t_agg (x int, y int, z numeric); CREATE TABLE test=# INSERT INTO t_agg SELECT id % 2, id % 10000, random() FROM generate_series(1, 10000000) AS id; INSERT 0 10000000

The interesting part is that the first column only has 2 distinct values while the second column will contain 10.000 different values. That is going to be of great importance for our optimization efforts.

Let us VACUUM the table to set hint bits and to build optimizer statistics. To make those execution plans more readable I also decided to turn off parallel queries:

test=# VACUUM ANALYZE ; VACUUM test=# SET max_parallel_workers_per_gather TO 0; SET Running a simple aggregation

Now that the is in place the first tests can be started:

test=# explain analyze SELECT x, y, avg(z) FROM t_agg GROUP BY 1, 2; QUERY PLAN -------------------------------------------------------------------------------------------- HashAggregate (cost=238697.01..238946.71 rows=19976 width=40) (actual time=3334.320..3339.929 rows=10000 loops=1) Group Key: x, y -> Seq Scan on t_agg (cost=0.00..163696.15 rows=10000115 width=19) (actual time=0.058..636.763 rows=10000000 loops=1) Planning Time: 0.399 ms Execution Time: 3340.483 ms (5 rows)

PostgreSQL will read the entire table sequentially and perform a hash aggregate. As you c

カテゴリー: postgresql

Stefan Fercot: pgBackRest archiving tricks

planet postgresql - 2019-03-26(火) 09:00:00

pgBackRest is a well-known powerful backup and restore tool.

While the documentation describes all the parameters, it’s not always that simple to imagine what you can really do with it.

In this post, I will introduce the asynchronous archiving and the possibility to avoid PostgreSQL to go down in case of archiving problems.

With its “info” command, for performance reasons, pgBackRest doesn’t check that all the needed WAL segments are still present. check_pgbackrest is clearly built for that. The two tricks mentioned above can produce gaps in the archived WAL segments. The new 1.5 release of check_pgbackrest provides ways to handle that, we’ll also see how.


First of all, install PostgreSQL and pgBackRest packages directly from the PGDG yum repositories:

$ sudo yum install -y\ rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm $ sudo yum install -y postgresql11-server postgresql11-contrib $ sudo yum install -y pgbackrest

Check that pgBackRest is correctly installed:

$ pgbackrest pgBackRest 2.11 - General help Usage: pgbackrest [options] [command] Commands: archive-get Get a WAL segment from the archive. archive-push Push a WAL segment to the archive. backup Backup a database cluster. check Check the configuration. expire Expire backups that exceed retention. help Get help. info Retrieve information about backups. restore Restore a database cluster. stanza-create Create the required stanza data. stanza-delete Delete a stanza. stanza-upgrade Upgrade a stanza. start Allow pgBackRest processes to run. stop Stop pgBackRest processes from running. version Get version. Use 'pgbackrest help [command]' for more information.

Create a basic PostgreSQL cluster :

$ sudo /usr/pgsql-11/bin/postgresql-11-setup initdb Configure pgBackRest to backup the local cluster

By default, the configuration

カテゴリー: postgresql

Regina Obe: PGConf US 2019 Data Loading Slides up

planet postgresql - 2019-03-26(火) 07:00:00

I gave a talk at PGConf US 2019 on some of the many ways you can load data into PostgreSQL using open source tools. This is similar to the talk I gave last year but with the addition of the pgloader commandline tool and the http PostgreSQL extension.

HTML slides PDF slides

Even though it was a talk Not much about PostGIS, but just tricks for loading data, I managed to get a mouthful of PostGIS in there.

カテゴリー: postgresql

Beena Emerson: PostgreSQL : Test Coverage

planet postgresql - 2019-03-25(月) 17:21:00
Install lcov Install Dependencies: yum install perl-devel yum install perl-Digest-MD5 yum install perl-GD
Download and install lcov rpm -U lcov-1.13-1.el7.noarch.rpm Run Test Configure and make
Use the --enable-coverage configure flag ./configure --enable-coverage make -j 4
Run make check
cd src/ make check -i
Check Coverage HTML output make coverage-html
A .gcov output file is created for each file in the test and a folder named 'coverage' with the index.html file to display the coverage information. The HTML page will show a summary of the coverage for each folder and recursively for each file and then for each line.

Text output make coverage
A .gcov and .gcov.out file is created for each file in the test.

Reset make coverage-clean
This resets the execution count.

Output files <file>.gcov.out This list out the details fo each function in the file. An example output for a function is shown below:

Function 'heap_sync' Lines executed:100.00% of 10 Branches executed:100.00% of 4 Taken at least once:75.00% of 4 Calls executed:100.00% of 6
<file>.gcov This displays the original file entirely along with the original line number and the count of the number of times each line was executed Lines which were not executed are marked with hashes ‘######’ and '-' indicated that the line is not executable..

-: 9258: /* main heap */ 50: 9259: FlushRelationBuffers(rel); call 0 returned 100%
. . <more code> .
#####: 9283:    Page        page = (Page) pagedata;         -: 9284:    OffsetNumber off;         -: 9285:     #####: 9286:    mask_page_lsn_and_checksum(page); call    0 never executed
index.html The home page: This lists out all the sub directory along with their coverage data.

Per directory info: On clicking a particular directory, we get the coverage info of each file in the selected directory.
Select a file: This gives out the per line hit count of the selected file. The one highlighted in blu[...]
カテゴリー: postgresql

Luca Ferrari: Mailing List is Back!

planet postgresql - 2019-03-25(月) 09:00:00

The historical mailing list of the Italian group has been succesfully migrated! Mailing List is Back!

With the great work of people behind the Italian group the first (and for many years the only one) Italian language mailing list has been migrated to a new platform and is now online again!

On this mailing list you can find a few very talented people willing to help with your PostgreSQL-related problem or curiosity, to discuss the current status and the future of the development and anything else you would expect from a very technical mailing list. Of course, the language is Italian!.

The link to the new mailing list management panel is

カテゴリー: postgresql

Tatsuo Ishii: Shared Relation Cache

planet postgresql - 2019-03-24(日) 17:58:00
 System catalogs? Pgpool-II needs to access PostgreSQL's system catalogs whenever it recognizes tables in user's query.  For example, Pgpool-II has to know whether the table in question is a temporary table or not. If it's a temporary table, then the query using the temporary table must be routed to the primary PostgreSQL, rather than one of standby PostgreSQL servers because  PostgreSQL does not allow to create temporary tables on standby servers. Another use case is converting the table name to OID (Object Identifier). OIDs are unique keys for objects managed in PostgreSQL's system catalogs.

Same thing can be said to functions. Details of functions, for instance whether they are "immutable" or not, is important information since it affects the decision on which the query result using the function should be cached or not when query cache feature is enabled.
Local query cache for system catalogs Sometimes Pgpool-II needs to issue up to as many as 10 queries to the system catalog when it sees a table or function for the first time. Fortunately Pgpool-II does not wast the query results. They are stored in local cache (wee call it "Relation Cache" or "Relcache"), and next time it sees the object in the same or different queries, it extracts info from the local cache . So far so good.

Problem is, the local cache is stored in private memory in Pgpool-II' s child process. For each new session from Pgpool-II clients, different child process is assigned for the session. So even if single table is used in queries, Pgpool-II continues to access system catalogs until the table's info gets filled  in all the local caches.
Shared relation cache How to mitigate the problem? One of the solutions would be sharing the relation cache info among Pgpool-II processes. This way,  once one of the processes accesses the system catalogs and obtains the info, other processes do not need to access the system catalogs any more. The cache shared by the processes is called "shared relation cache".
How to implement it? But how to implement i[...]
カテゴリー: postgresql

Peter Geoghegan: Visualizing Postgres page images within GDB

planet postgresql - 2019-03-23(土) 09:07:00
It's straightforward to set up GDB to quickly invoke pg_hexedit on a page image, without going through the filesystem. The page image can even come from a local temp buffer.
A user-defined GDB command can be created which shows an arbitrary page image in pg_hexedit from an interactive GDB session.

This is a good way to understand what's really going on when debugging access method code. It also works well with core dumps. I found this valuable during a recent project to improve the Postgres B-Tree code.

An example of how to make this work is available from a newly added section of the pg_hexedit README file:
カテゴリー: postgresql

Viorel Tabara: Benchmarking Managed PostgreSQL Cloud Solutions - Part Two: Amazon RDS

planet postgresql - 2019-03-22(金) 19:47:19

This is the second part of the multi-series Benchmarking Managed PostgreSQL Cloud Solutions. In Part 1 I presented an overview of the available tools, I discussed the reason for using the AWS Benchmark Procedure for Aurora, as well as PostgreSQL versions to be used, and I reviewed Amazon Aurora PostgreSQL 10.6.

In this part, pgbench and sysbench will be running against Amazon RDS for PostgreSQL 11.1. At the time of this writing the latest PostgreSQL version is 11.2 released about a month ago.

Related resources  ClusterControl for PostgreSQL  Benchmarking Managed PostgreSQL Cloud Solutions - Part One: Amazon Aurora  Cloud Backup Options for PostgreSQL

It’s worth pausing for a second to quickly review the PostgreSQL versions currently available in the cloud:

Amazon is again a winner, with its RDS offering, by providing the most recent version of PostgreSQL. As announced in the RDS forum AWS made PostgreSQL 11.1 available on March 13th, which is four months after the community release.

Setting Up the Environment

A few notes about the constraints related to setting up the environment and running the benchmark, points that were discussed in more detail during Part 1 of this series:

  • No changes to the cloud provider default GUC settings.
  • The connections are limited to a maximum of 1,000 as the AWS patch for pgbench did not apply cleanly. On a related note, I had to download the AWS timing patch from this pgsql-hackers submission since it was no longer available at the link mentioned in the guide.
  • The Enhanced Networking must be enabled for the client instance.
  • The database does not include a replica.
  • The database storage is not encrypted.
  • Both the client and the target instances are in the same availability zone.

First, setup the client and the database instances:

  • The client is an on demand r4.8xlarge EC2 instance:
    • vCPU: 32 (16 Cores x 2 Threads/Core)
    • RAM: 244 GiB
    • Storage: EB
カテゴリー: postgresql

Pavel Stehule: How to split string to array by individual characters?

planet postgresql - 2019-03-22(金) 15:57:00
Postgres has too much features, so sometimes is good to remind some.

Function string_to_array is well known. This function has two or three parameters. If second parameter (delimiter) is null, then input string is separated to array of characters.

postgres=# select string_to_array('Pavel Stěhule',null);
│ string_to_array │
│ {P,a,v,e,l," ",S,t,ě,h,u,l,e} │
(1 row)
カテゴリー: postgresql

Peter Bengtsson: Best way to count distinct indexed things in PostgreSQL

planet postgresql - 2019-03-22(金) 02:13:14
`SELECT COUNT(*) FROM (SELECT DISTINCT my_not_unique_indexed_column FROM my_table) t`
カテゴリー: postgresql

Yogesh Sharma: PostgreSQL Zero to Hero: Getting Started with RPMs -Part 1

planet postgresql - 2019-03-20(水) 23:33:38

One of the most important things to using PostgreSQL successfully in your development and production environments is simply getting started! One of the most popular ways to install PostgreSQL is by using RPM packages. The PostgreSQL RPM packages work across many Linux distributions, including, RedHat Enterprise Linux (RHEL), CentOS, Fedora, Scientific Linux, and more, and the PostgreSQL community provides installers for these distributions.

This guide will help you get started with installing and configuring PostgreSQL for a CentOS / RHEL 7 based system, which will also work for Fedora 29. We will be installing PostgreSQL 11, which is the latest major release of PostgreSQL as of this writing.

Installation Installing yum / dnf repository setup rpm
カテゴリー: postgresql

Craig Kerstiens: How to evaluate your database

planet postgresql - 2019-03-20(水) 22:47:00

Choosing a database isn’t something you do every day. You generally choose it once for a project, then don’t look back. If you experience years of success with your application you one day have to migrate to a new database, but that occurs years down the line. In choosing a database there are a few key things to consider. Here is your checklist, and spoiler alert, Postgres checks out strongly in each of these categories.

Does your database solve your problem?

There are a lot of new databases that rise up every year, each of these looks to solve hard problems within the data space. But, you should start by looking and seeing if they’re looking to solve a problem that you personally have. Most applications at the end of the day have some relational data model and more and more are also working with some level of unstructured data. Relational databases of course solve the relational piece, but they increasingly support the unstructured piece as well. Postgres in particular

Do you need strong gurantees for your data? ACID is still at the core of how durable and safe is your data, knowing how it stacks up here is a good evaluation criteria. But then there is also the CAP theorem which you see especially applied to distributed or clustered databases. Each of the previous links is worth a read to get a better understanding of the theory around databases. If you’re interested in how various databases perform under CAP then check out the Jepsen series of tests. But for the average person like myself it can be boiled down a bit more. Do you need full gurantee around your transactions, or do you optimize for some performance?

While it doesn’t fully speak to all the possible options you can have with databases, Postgres comes with some pretty good flexibility out of the box. It allows both synchronous (guaranteed it makes it) and asynchronous (queued up occurring soon after) replication to standbys. Those standbys could be for read replicas for reporting or for high availability. What’s nice about Postgres is can actually al

カテゴリー: postgresql

Christophe Pettus: “Look It Up: Practical PostgreSQL Indexing” at Nordic PGDay 2019

planet postgresql - 2019-03-20(水) 22:34:56

The slides from my presentation at PGDay Nordic 2019 are now available.

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Migrating simple table to partitioned. How?

planet postgresql - 2019-03-20(水) 07:57:58
Recently someone asked, on irc, how to make table partitioned. The thing is that it was supposed to be done with new partitioning, and not the old way. The problem is that while we can create table that will be seen as partitioned – we can't alter table to become partitioned. So. Is it possible? … Continue reading "Migrating simple table to partitioned. How?"
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Partial implementation of SQL/JSON path language

planet postgresql - 2019-03-20(水) 06:30:44
On 16th of March 2019, Alexander Korotkov committed patch: Partial implementation of SQL/JSON path language   SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit … Continue reading "Waiting for PostgreSQL 12 – Partial implementation of SQL/JSON path language"
カテゴリー: postgresql

Hans-Juergen Schoenig: Foreign data wrapper for PostgreSQL: Performance Tuning

planet postgresql - 2019-03-19(火) 20:42:05

Foreign data wrappers have been around for quite a while and are one of the most widely used feature in PostgreSQL. People simply like foreign data wrappers and we can expect that the community will add even more features as we speak. As far as the postgres_fdw is concerned there are some hidden tuning options, which are not widely known by users. So let us see how we can speed up the PostgreSQL foreign data wrapper.

Foreign data wrappers: Creating a “database link”

To show how things can be improved we first have to create some sample data in “adb”, which can then be integrated into some other database:

adb=# CREATE TABLE t_local (id int); CREATE TABLE adb=# INSERT INTO t_local SELECT * FROM generate_series(1, 100000); INSERT 0 100000

In this case I have simply loaded 100.000 rows into a very simple table. Let us now create the foreign data wrapper (or “database link” as Oracle people would call it). The first thing to do is to enable the postgres_fdw extension in “bdb”.


In the next step we have to create the “SERVER”, which points to the database containing our sample table. CREATE SERVER works like this:

bdb=# CREATE SERVER some_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'adb'); CREATE SERVER

Once the foreign server is created the users we need can be mapped:

bdb=# CREATE USER MAPPING FOR current_user SERVER some_server OPTIONS (user 'hs'); CREATE USER MAPPING

In this example the user mapping is really easy. We simply want the current user to connect to the remote database as “hs” (which happens to be my superuser).

Finally we can link the tables. The easiest way to do that is to use “IMPORT FOREIGN SCHEMA”, which simply fetches the remote data structure and turns everything into a foreign table.

bdb=# \h IMPORT Command: IMPORT FOREIGN SCHEMA Description: import table definitions from a foreign server Syntax: IMPORT FOREIGN SCHEM[...]
カテゴリー: postgresql

Tim Colles: PostgreSQL Roles are Cluster-Wide

planet postgresql - 2019-03-19(火) 01:20:58

A role in PostgreSQL is common to all databases in the cluster. This seems to be the result of a design decision made when the former user and group handling was unified under role. Follow these links for reference:

Roles, or rather those roles that are not just representing a specific user, ought instead to be an intrinsic part of the database model. Roles are defined by the kind of access they provide (read, write, etc) and by what relations(table, view, function, etc) they provide that access to. Access control is ideally managed within a database using roles rather than separately within each individual application that uses that database. So it makes sense that the access control rules (the roles and their associated permissions) would be defined alongside the definitions of the relations for which they are controlling access, any changes are then self contained. The access control model should be represented as part and parcel of the rest of the database model. Individual users (which are also represented as roles in PostgreSQL) are assigned one or more of the roles defined within each particular database model (based on the local enterprise definition of what needs they have of any particular database).

There is no sense to representing this kind of role at the cluster level as the definition of the role is associated specifically with the database where it actually controls access. In PostgreSQL, to encapsulate the full functionality of the database requires using not only the system catalog tables specific to that database but also the roles relevant to that database held in the cluster wide system catalog tables. With the exception of some special cases, li

カテゴリー: postgresql

Avinash Kumar: PostgreSQL Upgrade Using pg_dumpall

planet postgresql - 2019-03-18(月) 23:59:50

There are several approaches to assess when you need to upgrade PostgreSQL. In this blog post, we look at the option for upgrading a postgres database using pg_dumpall. As this tool can also be used to back up PostgreSQL clusters, then it is a valid option for upgrading a cluster too. We consider the advantages and disadvantages of this approach, and show you the steps needed to achieve the upgrade.

This is the first of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different paths to accomplish postgres upgrade or migration. The series will culminate with a practical webinar to be aired April 17th (you can register here).

We begin this journey by providing you the most straightforward way to carry on with a PostgreSQL upgrade or migration: by rebuilding the entire database from a logical backup.

Defining the scope

Let’s define what we mean by upgrading or migrating PostgreSQL using pg_dumpall.

If you need to perform a PostgreSQL upgrade within the same database server, we’d call that an in-place upgrade or just an upgrade. Whereas a procedure that involves migrating your PostgreSQL server from one server to another server, combined with an upgrade from an older version (let’s say 9.3) to a newer version PostgreSQL (say PG 11.2), can be considered a migration.

There are two ways to achieve this requirement using logical backups :

  1. Using pg_dumpall
  2. Using pg_dumpall + pg_dump + pg_restore

We’ll be discussing the first option (pg_dumpall) here, and will leave the discussion of the second option for our next post.


pg_dumpall can be used to obtain a text-format dump of the whole database cluster, and which includes all databases in the cluster. This is the only method that can be used to backup globals such as users and roles in PostgreSQL.

There are, of course, advantages and disadvantages in employing this approach to upgrading PostgreSQL by rebuilding the database cluster using pg_dumpall.

Advantages of using pg_dumpall for upgrading a Pos[...]
カテゴリー: postgresql

Venkata Nagothi: Configuring PostgreSQL for Business Continuity

planet postgresql - 2019-03-18(月) 23:27:39
Business Continuity for Databases

Business continuity for databases means databases must be continuously operational even during the disasters. It is imperative to ensure that production databases are available for the applications all the time even during the disasters, otherwise, could end-up being an expensive deal. DBAs, Architects would need to ensure that database environments can sustain disasters and are disaster recovery SLA compliant. To ensure disasters does not affect database availability, databases must be configured for business continuity.

Configuring databases for business continuity involves a lot of architecting, planning, designing and testing. A lot of factors like data centers and their geographic territories including infrastructure design come into consideration when it comes to designing and implementing an effective disaster recovery strategy for databases. That explains the fact that “Business Continuity = Avoid outages during disasters”.

To ensure production databases survive a disaster, a Disaster Recovery (DR) site must be configured. Production and DR sites must be part of two geographically distant Data Centres. This means, a standby database must be configured at the DR site for every production database so that, the data changes occurring on production database are immediately synced across to the standby database via transaction logs. This can be achieved by “Streaming Replication” capability in PostgreSQL.

What Needs to Happen if Disaster Strikes Production (or Primary) Database?

When production (primary) database crashes or becomes unresponsive, standby database must be promoted to primary and the applications must be pointed to newly promoted standby (new primary) database and all of it must happen automatically within the designated outage SLA. This process is termed as failover.

Configuring PostgreSQL for High Availability

As said above, to ensure that the PostgreSQL is disaster recovery compliant, it must be first configured with Streaming Replication (master + standby d

カテゴリー: postgresql