Tatsuo Ishii: Shared Relation Cache

planet postgresql - 2019-03-24(日) 17:58:00
 System catalogs? Pgpool-II needs to access PostgreSQL's system catalogs whenever it recognizes tables in user's query.  For example, Pgpool-II has to know whether the table in question is a temporary table or not. If it's a temporary table, then the query using the temporary table must be routed to the primary PostgreSQL, rather than one of standby PostgreSQL servers because  PostgreSQL does not allow to create temporary tables on standby servers. Another use case is converting the table name to OID (Object Identifier). OIDs are unique keys for objects managed in PostgreSQL's system catalogs.

Same thing can be said to functions. Details of functions, for instance whether they are "immutable" or not, is important information since it affects the decision on which the query result using the function should be cached or not when query cache feature is enabled.
Local query cache for system catalogs Sometimes Pgpool-II needs to issue up to as many as 10 queries to the system catalog when it sees a table or function for the first time. Fortunately Pgpool-II does not wast the query results. They are stored in local cache (wee call it "Relation Cache" or "Relcache"), and next time it sees the object in the same or different queries, it extracts info from the local cache . So far so good.

Problem is, the local cache is stored in private memory in Pgpool-II' s child process. For each new session from Pgpool-II clients, different child process is assigned for the session. So even if single table is used in queries, Pgpool-II continues to access system catalogs until the table's info gets filled  in all the local caches.
Shared relation cache How to mitigate the problem? One of the solutions would be sharing the relation cache info among Pgpool-II processes. This way,  once one of the processes accesses the system catalogs and obtains the info, other processes do not need to access the system catalogs any more. The cache shared by the processes is called "shared relation cache".
How to implement it? But how to implement i[...]
カテゴリー: postgresql

Peter Geoghegan: Visualizing Postgres page images within GDB

planet postgresql - 2019-03-23(土) 09:07:00
It's straightforward to set up GDB to quickly invoke pg_hexedit on a page image, without going through the filesystem. The page image can even come from a local temp buffer.
A user-defined GDB command can be created which shows an arbitrary page image in pg_hexedit from an interactive GDB session.

This is a good way to understand what's really going on when debugging access method code. It also works well with core dumps. I found this valuable during a recent project to improve the Postgres B-Tree code.

An example of how to make this work is available from a newly added section of the pg_hexedit README file:
カテゴリー: postgresql

Viorel Tabara: Benchmarking Managed PostgreSQL Cloud Solutions - Part Two: Amazon RDS

planet postgresql - 2019-03-22(金) 19:47:19

This is the second part of the multi-series Benchmarking Managed PostgreSQL Cloud Solutions. In Part 1 I presented an overview of the available tools, I discussed the reason for using the AWS Benchmark Procedure for Aurora, as well as PostgreSQL versions to be used, and I reviewed Amazon Aurora PostgreSQL 10.6.

In this part, pgbench and sysbench will be running against Amazon RDS for PostgreSQL 11.1. At the time of this writing the latest PostgreSQL version is 11.2 released about a month ago.

Related resources  ClusterControl for PostgreSQL  Benchmarking Managed PostgreSQL Cloud Solutions - Part One: Amazon Aurora  Cloud Backup Options for PostgreSQL

It’s worth pausing for a second to quickly review the PostgreSQL versions currently available in the cloud:

Amazon is again a winner, with its RDS offering, by providing the most recent version of PostgreSQL. As announced in the RDS forum AWS made PostgreSQL 11.1 available on March 13th, which is four months after the community release.

Setting Up the Environment

A few notes about the constraints related to setting up the environment and running the benchmark, points that were discussed in more detail during Part 1 of this series:

  • No changes to the cloud provider default GUC settings.
  • The connections are limited to a maximum of 1,000 as the AWS patch for pgbench did not apply cleanly. On a related note, I had to download the AWS timing patch from this pgsql-hackers submission since it was no longer available at the link mentioned in the guide.
  • The Enhanced Networking must be enabled for the client instance.
  • The database does not include a replica.
  • The database storage is not encrypted.
  • Both the client and the target instances are in the same availability zone.

First, setup the client and the database instances:

  • The client is an on demand r4.8xlarge EC2 instance:
    • vCPU: 32 (16 Cores x 2 Threads/Core)
    • RAM: 244 GiB
    • Storage: EB
カテゴリー: postgresql

Pavel Stehule: How to split string to array by individual characters?

planet postgresql - 2019-03-22(金) 15:57:00
Postgres has too much features, so sometimes is good to remind some.

Function string_to_array is well known. This function has two or three parameters. If second parameter (delimiter) is null, then input string is separated to array of characters.

postgres=# select string_to_array('Pavel Stěhule',null);
│ string_to_array │
│ {P,a,v,e,l," ",S,t,ě,h,u,l,e} │
(1 row)
カテゴリー: postgresql

Peter Bengtsson: Best way to count distinct indexed things in PostgreSQL

planet postgresql - 2019-03-22(金) 02:13:14
`SELECT COUNT(*) FROM (SELECT DISTINCT my_not_unique_indexed_column FROM my_table) t`
カテゴリー: postgresql

Yogesh Sharma: PostgreSQL Zero to Hero: Getting Started with RPMs -Part 1

planet postgresql - 2019-03-20(水) 23:33:38

One of the most important things to using PostgreSQL successfully in your development and production environments is simply getting started! One of the most popular ways to install PostgreSQL is by using RPM packages. The PostgreSQL RPM packages work across many Linux distributions, including, RedHat Enterprise Linux (RHEL), CentOS, Fedora, Scientific Linux, and more, and the PostgreSQL community provides installers for these distributions.

This guide will help you get started with installing and configuring PostgreSQL for a CentOS / RHEL 7 based system, which will also work for Fedora 29. We will be installing PostgreSQL 11, which is the latest major release of PostgreSQL as of this writing.

Installation Installing yum / dnf repository setup rpm
カテゴリー: postgresql

Craig Kerstiens: How to evaluate your database

planet postgresql - 2019-03-20(水) 22:47:00

Choosing a database isn’t something you do every day. You generally choose it once for a project, then don’t look back. If you experience years of success with your application you one day have to migrate to a new database, but that occurs years down the line. In choosing a database there are a few key things to consider. Here is your checklist, and spoiler alert, Postgres checks out strongly in each of these categories.

Does your database solve your problem?

There are a lot of new databases that rise up every year, each of these looks to solve hard problems within the data space. But, you should start by looking and seeing if they’re looking to solve a problem that you personally have. Most applications at the end of the day have some relational data model and more and more are also working with some level of unstructured data. Relational databases of course solve the relational piece, but they increasingly support the unstructured piece as well. Postgres in particular

Do you need strong gurantees for your data? ACID is still at the core of how durable and safe is your data, knowing how it stacks up here is a good evaluation criteria. But then there is also the CAP theorem which you see especially applied to distributed or clustered databases. Each of the previous links is worth a read to get a better understanding of the theory around databases. If you’re interested in how various databases perform under CAP then check out the Jepsen series of tests. But for the average person like myself it can be boiled down a bit more. Do you need full gurantee around your transactions, or do you optimize for some performance?

While it doesn’t fully speak to all the possible options you can have with databases, Postgres comes with some pretty good flexibility out of the box. It allows both synchronous (guaranteed it makes it) and asynchronous (queued up occurring soon after) replication to standbys. Those standbys could be for read replicas for reporting or for high availability. What’s nice about Postgres is can actually al

カテゴリー: postgresql

Christophe Pettus: “Look It Up: Practical PostgreSQL Indexing” at Nordic PGDay 2019

planet postgresql - 2019-03-20(水) 22:34:56

The slides from my presentation at PGDay Nordic 2019 are now available.

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Migrating simple table to partitioned. How?

planet postgresql - 2019-03-20(水) 07:57:58
Recently someone asked, on irc, how to make table partitioned. The thing is that it was supposed to be done with new partitioning, and not the old way. The problem is that while we can create table that will be seen as partitioned – we can't alter table to become partitioned. So. Is it possible? … Continue reading "Migrating simple table to partitioned. How?"
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Partial implementation of SQL/JSON path language

planet postgresql - 2019-03-20(水) 06:30:44
On 16th of March 2019, Alexander Korotkov committed patch: Partial implementation of SQL/JSON path language   SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit … Continue reading "Waiting for PostgreSQL 12 – Partial implementation of SQL/JSON path language"
カテゴリー: postgresql

Hans-Juergen Schoenig: Foreign data wrapper for PostgreSQL: Performance Tuning

planet postgresql - 2019-03-19(火) 20:42:05

Foreign data wrappers have been around for quite a while and are one of the most widely used feature in PostgreSQL. People simply like foreign data wrappers and we can expect that the community will add even more features as we speak. As far as the postgres_fdw is concerned there are some hidden tuning options, which are not widely known by users. So let us see how we can speed up the PostgreSQL foreign data wrapper.

Foreign data wrappers: Creating a “database link”

To show how things can be improved we first have to create some sample data in “adb”, which can then be integrated into some other database:

adb=# CREATE TABLE t_local (id int); CREATE TABLE adb=# INSERT INTO t_local SELECT * FROM generate_series(1, 100000); INSERT 0 100000

In this case I have simply loaded 100.000 rows into a very simple table. Let us now create the foreign data wrapper (or “database link” as Oracle people would call it). The first thing to do is to enable the postgres_fdw extension in “bdb”.


In the next step we have to create the “SERVER”, which points to the database containing our sample table. CREATE SERVER works like this:

bdb=# CREATE SERVER some_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'adb'); CREATE SERVER

Once the foreign server is created the users we need can be mapped:

bdb=# CREATE USER MAPPING FOR current_user SERVER some_server OPTIONS (user 'hs'); CREATE USER MAPPING

In this example the user mapping is really easy. We simply want the current user to connect to the remote database as “hs” (which happens to be my superuser).

Finally we can link the tables. The easiest way to do that is to use “IMPORT FOREIGN SCHEMA”, which simply fetches the remote data structure and turns everything into a foreign table.

bdb=# \h IMPORT Command: IMPORT FOREIGN SCHEMA Description: import table definitions from a foreign server Syntax: IMPORT FOREIGN SCHEM[...]
カテゴリー: postgresql

Tim Colles: PostgreSQL Roles are Cluster-Wide

planet postgresql - 2019-03-19(火) 01:20:58

A role in PostgreSQL is common to all databases in the cluster. This seems to be the result of a design decision made when the former user and group handling was unified under role. Follow these links for reference:

Roles, or rather those roles that are not just representing a specific user, ought instead to be an intrinsic part of the database model. Roles are defined by the kind of access they provide (read, write, etc) and by what relations(table, view, function, etc) they provide that access to. Access control is ideally managed within a database using roles rather than separately within each individual application that uses that database. So it makes sense that the access control rules (the roles and their associated permissions) would be defined alongside the definitions of the relations for which they are controlling access, any changes are then self contained. The access control model should be represented as part and parcel of the rest of the database model. Individual users (which are also represented as roles in PostgreSQL) are assigned one or more of the roles defined within each particular database model (based on the local enterprise definition of what needs they have of any particular database).

There is no sense to representing this kind of role at the cluster level as the definition of the role is associated specifically with the database where it actually controls access. In PostgreSQL, to encapsulate the full functionality of the database requires using not only the system catalog tables specific to that database but also the roles relevant to that database held in the cluster wide system catalog tables. With the exception of some special cases, li

カテゴリー: postgresql

Avinash Kumar: PostgreSQL Upgrade Using pg_dumpall

planet postgresql - 2019-03-18(月) 23:59:50

There are several approaches to assess when you need to upgrade PostgreSQL. In this blog post, we look at the option for upgrading a postgres database using pg_dumpall. As this tool can also be used to back up PostgreSQL clusters, then it is a valid option for upgrading a cluster too. We consider the advantages and disadvantages of this approach, and show you the steps needed to achieve the upgrade.

This is the first of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different paths to accomplish postgres upgrade or migration. The series will culminate with a practical webinar to be aired April 17th (you can register here).

We begin this journey by providing you the most straightforward way to carry on with a PostgreSQL upgrade or migration: by rebuilding the entire database from a logical backup.

Defining the scope

Let’s define what we mean by upgrading or migrating PostgreSQL using pg_dumpall.

If you need to perform a PostgreSQL upgrade within the same database server, we’d call that an in-place upgrade or just an upgrade. Whereas a procedure that involves migrating your PostgreSQL server from one server to another server, combined with an upgrade from an older version (let’s say 9.3) to a newer version PostgreSQL (say PG 11.2), can be considered a migration.

There are two ways to achieve this requirement using logical backups :

  1. Using pg_dumpall
  2. Using pg_dumpall + pg_dump + pg_restore

We’ll be discussing the first option (pg_dumpall) here, and will leave the discussion of the second option for our next post.


pg_dumpall can be used to obtain a text-format dump of the whole database cluster, and which includes all databases in the cluster. This is the only method that can be used to backup globals such as users and roles in PostgreSQL.

There are, of course, advantages and disadvantages in employing this approach to upgrading PostgreSQL by rebuilding the database cluster using pg_dumpall.

Advantages of using pg_dumpall for upgrading a Pos[...]
カテゴリー: postgresql

Venkata Nagothi: Configuring PostgreSQL for Business Continuity

planet postgresql - 2019-03-18(月) 23:27:39
Business Continuity for Databases

Business continuity for databases means databases must be continuously operational even during the disasters. It is imperative to ensure that production databases are available for the applications all the time even during the disasters, otherwise, could end-up being an expensive deal. DBAs, Architects would need to ensure that database environments can sustain disasters and are disaster recovery SLA compliant. To ensure disasters does not affect database availability, databases must be configured for business continuity.

Configuring databases for business continuity involves a lot of architecting, planning, designing and testing. A lot of factors like data centers and their geographic territories including infrastructure design come into consideration when it comes to designing and implementing an effective disaster recovery strategy for databases. That explains the fact that “Business Continuity = Avoid outages during disasters”.

To ensure production databases survive a disaster, a Disaster Recovery (DR) site must be configured. Production and DR sites must be part of two geographically distant Data Centres. This means, a standby database must be configured at the DR site for every production database so that, the data changes occurring on production database are immediately synced across to the standby database via transaction logs. This can be achieved by “Streaming Replication” capability in PostgreSQL.

What Needs to Happen if Disaster Strikes Production (or Primary) Database?

When production (primary) database crashes or becomes unresponsive, standby database must be promoted to primary and the applications must be pointed to newly promoted standby (new primary) database and all of it must happen automatically within the designated outage SLA. This process is termed as failover.

Configuring PostgreSQL for High Availability

As said above, to ensure that the PostgreSQL is disaster recovery compliant, it must be first configured with Streaming Replication (master + standby d

カテゴリー: postgresql

elein mustain: Having Group By Clauses elein’s GeneralBits

planet postgresql - 2019-03-18(月) 11:33:59

Some people go to great lengths to avoid GROUP BY and HAVING clauses in their queries. The error messages are fussy but they are usually right. GROUP BY and HAVING key words are essential for good SQL reporting.

The primary reason for GROUP BY is to reduce the number of rows, usually by aggregation. It produces only one row for each matching grouping from the input. This allows you to make sophisticated calculations via ordinary SQL.

Fruit Example:

We have some fruit:

item | source | amt | fresh_until ---------+---------+-----+------------ bananas | Chile | 50 | 2019-05-01 bananas | Bolivia | 25 | 2019-04-15 bananas | Chile | 150 | 2019-07-10 apples | USA-WA | 75 | 2019-07-01 apples | USA-CA | 75 | 2019-08-15 apples | Canada | 80 | 2019-08-01 grapes | USA-CA | 120 | 2019-07-15 (7 rows)

This next case allows us to look forward.  Mid-year, what fruits will be available? We do this with the same query as above, however, after the query runs we check the values of min(fresh_until) by using a having clause. HAVING is how you qualify an aggregate.

select item, count(source) as srcs, sum(amt) as tot_amt, min(fresh_until) as min_fresh_until from fruit group by item having min(fresh_until) > '30-jun-2019'; item | srcs | tot_amt | min_fresh_until --------+------+---------+---------------- grapes | 1 | 120 | 2019-07-15 apples | 3 | 230 | 2019-07-01 (2 rows)

All of the apples and grapes will be available mid-year.

A target list may contain non-aggregates and aggregates. Those non-aggregate columns in the target list
should be in the group by clause. The error message says so. The order of the columns in the group by clause matters. It determines how the aggregates are grouped. The order is often hierarchical. What that means to your columns is your focus. It could be fruit, or sources and/or fresh_until date.

Playing Cards Examples

Let’s look at another set of examples that illustrate extracting information on playing cards. You can learn about cards on Wikipedia St

カテゴリー: postgresql

Venkata Nagothi: How to Replicate PostgreSQL Data to Remote Sites

planet postgresql - 2019-03-15(金) 21:08:35

In a busy database environment with larger size databases, the need for real-time data replication is a common occurrence. Applications often need the production data to be replicated in real-time to remote sites for analytics and other critical business operations needs.

DBAs also need to ensure that the data is replicated continuously to the remote sites to meet various requirements. These requirements, though, may not always be to replicate the whole database; there can also be a need to replicate only a subset of the data (like a Table or set of Tables or data from multiple tables using an SQL for analytics, reporting etc.)

In this blog, we will focus on how to replicate tables to remote databases in real-time.

What is Table-Level Replication?

Table-level replication is the mechanism of replicating the data of a specific table or set of tables from one database (source) to another database (target) hosted remotely in a distributed environment. Table level replication ensures table data is distributed continuously and remains consistent across replicated (target) sites.

Why Use Table-Level Replication?

Table level replication is an essential need in larger, complex, highly distributed environments. In my experience, there was always a need to replicate a set of tables from a production database to a data warehousing for reporting purposes. The data has to be replicated continuously to ensure reports are getting the latest data. In critical environments, staleness of the data cannot be tolerated, so, the data changes happening on production must be replicated immediately to the target site. This can be a real challenge for DBA’s having to forecast various factors to ensure an efficient and smooth table replication.

Let us look at some requirements that table-level replication solves:

  • The reports can run on a database in an environment other than production, like data warehousing
  • A distributed database environment with distributed applications extracting data from multiple sites. In case of distributed web or
カテゴリー: postgresql

Baron Schwartz: New Book: DevOps for the Database

planet postgresql - 2019-03-15(金) 04:17:36

I’ve written a new 65-page ebook, titled DevOps for the Database. Drawn from what I’ve observed over the last 15 or so years in my own experience as well as that of many others, it tries to answer why and how some organizations deliver value to customers fast, run data-intensive applications with high quality, and have great engineering cultures—and why some don’t.

カテゴリー: postgresql

Craig Kerstiens: Fun with SQL: Text and system functions

planet postgresql - 2019-03-14(木) 01:26:00

SQL by itself is great and powerful, and Postgres supports a broad array of more modern SQL including things like window functions and common table expressions. But rarely do I write a query where I don’t want to tweak or format the data I’m getting back out of the database. Thankfully Postgres has a rich array of functions to help with converting or formatting data. These built-in functions save me from having to do the logic elsewhere or write my own functions, in other words I have to do less work because Postgres has already done it for me which I’m always happy about.

We’ve covered a set of functions earlier, today we’re going to look at some different categories of functions to dive deeper.

Manipulating strings

When building a nice cleamn report I’m often doing some cleaning up of the data. This could be as simple as combining my first_name and last_name column. In that case a simple concatenation is quite useful:

SELECT first_name || last_name as name FROM users;

But as you can expect I can do much more. If I’m perfoming some basic search for a keyword or delimiter I can search for the position of that string. Then I can feed that into a case statement:

SELECT CASE WHEN position(',' in my_string) > 0 then True ELSE False END FROM my_table;

Of course if you’re working with some field that is delimited you may want to take the above to first find the correct field. From there you likely would want to split that delimited field up. For that we can use split_part and grab the appropriate section of our delimited field:

SELECT CASE WHEN position(',' in my_string) > 0 then split_part(my_string, ',', 1) ELSE null END FROM my_table;

Finally, if you’re dealing with messy input/machine data there are a number of functions that can be handing for cleaning up that data:

  • substr - allows you to extract a substring
  • rtrim - removes the longest part of a matching string from the right side
  • reverse - for the next time you’re asked to reverse a string in an interview :)
  • regexp_replace - for
カテゴリー: postgresql

Michael Banck: Integrating Patroni into Debian

planet postgresql - 2019-03-13(水) 21:16:14
Integrating Patroni into Debian Patroni is a PostgreSQL high availability solution with a focus on containers and Kubernetes. Until recently, the available Debian packages had to be configured manually and did not integrate well with the rest of the distribution. For the upcoming Debian 10 "Buster" release, the Patroni packages... 13-03 Michael Banck
カテゴリー: postgresql

Hans-Juergen Schoenig: unaccent: Getting rid of umlauts, accents and special characters

planet postgresql - 2019-03-13(水) 18:20:07

Database people dealing with natural languages are all painfully aware of the fact that encodings, special characters, accents and alike are usually hard to deal with. This is especially true if you want to implement search in a user friendly way.

Consider the following example in PostgreSQL: My name contains a couple of of super inconvenient special characters, which will cause issues for people around the globe. The correct spelling of my surname is “Schönig”, which is pretty hard to type on most keyboards I have seen around the world. And: Who cares about my special characters anyway? People might just want to type in “Schonig” into some search form and find information about me (ideally related to PostgreSQL and database work).

Here is the problem:

test=# SELECT 'Schönig' = 'Schonig'; ?column? ---------- f (1 row) test=# SELECT 'Schönig' = 'Schoenig'; ?column? ---------- f (1 row)

The “=” operator compares those two strings and comes to the conclusion that those two strings are not identical. Therefore the correct answer is “false”. While that is true from a technical point of view it might be a real issue and end users might be unsatisfied with the result. Problems like that can make daily life pretty hard. A solution to the problem is therefore needed.

unaccent: PostgreSQL provides a useful extension

If you want to improve your user experience you can turn to the “unaccent” extension, which is shipped as part of the PostgreSQL contrib package. Installing “unaccent” is really easy:


In the next step you can call the “unaccent” function to clean a string and turn it into something more useful. This is what happens when we use the “unaccent” function on my name and the name of my PostgreSQL support company:

test=# SELECT unaccent('Hans-Jürgen Schönig, Gröhrmühlgasse 26, Wiener Neustadt'); unaccent --------------------------------------------------------- Hans-Jurgen Schonig, Grohrmuhlgasse 26, Wiener[...]
カテゴリー: postgresql