Laurenz Albe: count(*) made fast

planet postgresql - 2019-04-03(水) 17:00:28
© Laurenz Albe 2019


It is a frequent complaint that count(*) is so slow on PostgreSQL.

In this article I want to explore the options you have get your result as fast as possible.

Why is count(*) so slow?

Most people have no trouble understanding that the following is slow:

SELECT count(*) FROM /* complicated query */;

After all, it is a complicated query, and PostgreSQL has to calculate the result before it knows how many rows it will contain.

But many people are appalled if the following is slow:

SELECT count(*) FROM large_table;

Yet if you think again, the above still holds true: PostgreSQL has to calculate the result set before it can count it. Since there is no “magical row count” stored in a table (like it is in MySQL’s MyISAM), the only way to count the rows is to go through them.

So count(*) will normally perform a sequential scan of the table, which can be quite expensive.

Is the “*” in count(*) the problem?

The “*” in SELECT * FROM ... is expanded to all columns. Consequently, many people think that using count(*) is inefficient and should be written count(id) or count(1) instead. But the “*” in count(*) is quite different, it just means “row” and is not expanded at all.

Writing count(1) is the same as count(*), but count(id) is something different: It will only count the rows where id IS NOT NULL, since most aggregates ignore NULL values.

So there is nothing to be gained by avoiding the “*”.

Using an index only scan

It is tempting to scan a small index rather then the whole table to count the number of rows.
However, this is not so simple in PostgreSQL because of its multi-version concurrency control strategy. Each row version (“tuple”) contains the information to which database snapshot it is visible. But this information is not (redundantly) stored in the indexes. So it usually isn’t enough to count the entries in an index, because PostgreSQL has to visit the table entry (“heap tuple”) to make sure an index entry is visible.

To mitigate this problem, PostgreSQL has introduced the visibility ma

カテゴリー: postgresql

Magnus Hagander: When a vulnerability is not a vulnerability

planet postgresql - 2019-04-03(水) 04:39:50

Recently, references to a "new PostgreSQL vulnerability" has been circling on social media (and maybe elsewhere). It's even got it's own CVE entry. The origin appears to be a blogpost from Trustwave.

So is this actually a vulnerability? (Hint: it's not) Let's see:

カテゴリー: postgresql

Doug Hunley: Enhancing Your PostgreSQL 10 Security with the CIS Benchmark

planet postgresql - 2019-04-02(火) 23:59:37

 Crunchy Data has recently announced an update to the CIS PostgreSQL Benchmark by the Center for Internet Security, a nonprofit organization that provides publications around standards and best practices for securing technologies systems. This newly published CIS PostgreSQL 10 Benchmark joins the existing CIS Benchmarks for PostgreSQL 9.5 and 9.6 while continuing to build upon Crunchy Data's efforts with the PostgreSQL Security Technical Implementation Guide (PostgreSQL STIG).

What is a CIS Benchmark?

As mentioned in an earlier blog post, a CIS Benchmark is a set of guidelines and best practices for securely configuring a target system.  The benchmark contains a series of recommendations that help test the security of the system: some of the recommendations are "scored" (where a top score of 100 is the best), while others are are provided to establish best practices for security.

カテゴリー: postgresql

Dave Conlin: Postgres indexes for absolute beginners

planet postgresql - 2019-04-02(火) 22:45:00

Indexes are really important for Postgres performance, but they’re often misunderstood and misapplied. This post aims to give you a good grounding in indexes to avoid a lot of beginner mistakes.

Step one: understand what you want to achieve

Because indexes are such a powerful tool, a new index is often viewed as “the answer” to whatever performance problems people are experiencing. Wading straight in and creating an index for every sequential scan in sight is the simplest thing to do, but indexes have costs as well as benefits.

Not only do indexes take up memory, they raise the cost of writing to the table in question. Any speed-up an index may provide for reads isn’t free — it’s offset by more work to keep the index up to date when the data in the table change. So an unused index isn’t just useless — it’s actively harmful to your database’s performance.

First, take the time to understand which bits of your query are running slowly (use the query plan), make a hypothesis as to why they’re slow, and then validate that hypothesis by attempting to speed them up.

In order to understand when the answer might be an index, it’s important to understand the difference between sequential scans and index scans in Postgres.

Sequential scans

Sequential scans are the simplest, most obvious way of reading data from a table. Postgres jumps to the first block of memory (“page”) that holds rows for the table in question and reads in all the data, row by row, page by page, and passes it on.

Sequential scans can get a bit of a bad rap. One of the things we often hear from people when we ask them about their current performance analysis is “the first thing I do is look for sequential scans”.

It’s true that using an index on a table can make a big difference to query performance, but it’s also true that if your query just needs to get all of the data from a table in an unordered mass of rows, then things aren’t going to get any more efficient than just reading those rows in directly from consecutive pages.

Index scans

An index is jus

カテゴリー: postgresql

Alexander Sosna: HowTo: Central and semantic logging for PostgreSQL

planet postgresql - 2019-04-02(火) 18:00:00
HowTo: Central and semantic logging for PostgreSQL Today it is no longer necessary to argue why central logging makes sense or is even necessary. Most medium-sized companies now have a central logging system or are just introducing it. Once the infrastructure has been created, it must be used sensibly and efficiently! Especially as an... 02-04 Alexander Sosna
カテゴリー: postgresql

Tatsuo Ishii: Statement level load balancing

planet postgresql - 2019-04-02(火) 15:53:00
In the previous article I wrote about one of the new features of upcoming Pgpool-II 4.1.
This time I would like to introduce "statement level load balancing" feature of 4.1.

Pgpool-II can distribute read queries among PostgreSQL backend nodes. This allows to design a scale out cluster using PostgreSQL. The particular database node used for distributing read query is determined at the session level: when a client connects to Pgpool-II. This is so called "session level load balancing". For example, if a client connects to Pgpool-II and the load balance node is node 1 (we assume that this is a streaming replication standby node), then any read query will be distributed to the primary (master) node and the load balance node (in this case node1, the standby node). The distribution ratio is determined by "backend weight" parameter in the Pgpool-II configuration file (usually named "pgpool.conf"), typically "backend_weight0" or "backend_weight1", corresponding to node 0 and node 1 respectively.

This is good as long as clients connects to Pgpool-II, issue some queries, and disconnect, since next time a client connects to Pgpool-II, different load balance node may be chosen according to the backend weight parameters.

However, if your client already has a connection pooling feature, this way (session level load balancing) might be a problem, since the selection of load balance node is performed only once when the connection pooling from client to Pgpool-II is created.

The statement level load balancing feature is created to solve the problem. Unlike the session level load balancing, the load balancing node is determined when a new query is issued. The new parameter for this is "statement_level_load_balance". If this is set to on, the feature is enabled (the parameter can be changed by reloading the pgpool.conf).

At first "select_cnt" is 0, which means no SELECTs were issued.

test=# show pool_nodes;

node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_st[...]
カテゴリー: postgresql

Koichi Suzuki: Postgres-XL and global MVCC

planet postgresql - 2019-04-02(火) 11:58:21
Back to the PG

I’m very excited to become a 2ndQuadrant member.  I was involved in PostgreSQL activities in NTT group (Japanese leading ICT company, see here and here), including log shipping replication and PostgreSQL scale out solution as PostgresXC and PostgresXL.

At NTT I had several chances to work very closely with 2ndQuadrant.  After three years involvement in deep learning and accelerator usage in various applications I’m now back to PostgreSQL world.  And I’m still very interested in PostgreSQL scale out solutions and applying PostgreSQL to large scale analytic workload as well as stream analytics.

Here, I’d like to begin with a discussion of scaled out parallel distributed database with full transaction capabilities  including atomic visibility.

ACID property, distributed database and atomic visibility

Traditionally, providing full ACID property is the responsibility of each database server.   In scale out solutions, database consists of multiple servers and two phase commit protocol (2PC) is used to provide ACID properties in updating multiple servers.

Although 2PC provides write consistency among multiple servers,  we should note that it does not provide atomic visibility which ensures global transaction update can be visible to all the other transaction in a same time.    In two phase commit protocol, each server receives “COMMIT” in different time clock and this can make partial transaction update visible to others.

How Postgres-XL works, global transaction management

Postgres-XL provides atomic visibility of global transaction and this is essentially what GTM (global transaction manager) is doing.   GTM helps to share the snapshot among all the transactions so that any such partial COMMIT is not visible until all the COMMITs are successful and GTM updates the snapshot.   Note that GTM-proxy is used to reduce interaction between each server and GTM by copying the same snapshot to different local transactions.

This is very similar to what standalone PostgreSQL is doing as MVCC.   It is somewhat centr

カテゴリー: postgresql

Sebastian Insausti: How to Deploy Highly Available PostgreSQL with Single Endpoint for WordPress

planet postgresql - 2019-04-01(月) 23:00:27

WordPress is an open source software you can use to create your website, blog, or application. There are many designs and features/plugins to add to your WordPress installation. WordPress is a free software, however, there are many commercial plugins to improve it depending on your requirements.

WordPress makes it easy for you to manage your content and it’s really flexible. Create drafts, schedule publication, and look at your post revisions. Make your content public or private, and secure posts and pages with a password.

Related resources  ClusterControl for PostgreSQL  Scaling Wordpress and MySQL on Multiple Servers for Performance  Wordpress Application Clustering using Kubernetes with HAProxy and Keepalived

To run WordPress you should have at least PHP version 5.2.4+, MySQL version 5.0+ (or MariaDB), and Apache or Nginx. Some of these versions have reached EOL and you may expose your site to security vulnerabilities, so you should install the latest version available according to your environment.

As we could see, currently, WordPress only supports the MySQL and MariaDB database engines. WPPG is a plugin based on PG4WP plugin, that gives you the possibility to install and use WordPress with a PostgreSQL database as a backend. It works by replacing calls to MySQL specific functions with generic calls that map them to other database functions and rewriting SQL queries on the fly when needed.

For this blog, we’ll install 1 Application Server with WordPress 5.1.1 and HAProxy, 1.5.18 in the same server, and 2 PostgreSQL 11 database nodes (Master-Standby). All the operating system will be CentOS 7. For the databases and load balancer deploy we’ll use the ClusterControl system.

This is a basic environment. You can improve it by adding more high availability features as you can see here. So, let’s start.

Database Deployment

First, we need to install our PostgreSQL database. For this, we’ll assume you have ClusterControl installed.

To perform a deployment from ClusterControl, simply select the option

カテゴリー: postgresql

Regina Obe: SQL Server on Linux

planet postgresql - 2019-04-01(月) 22:31:00

Today is April 1st. Having no thoughts on Fools jokes for today, I dug up one of our old April fools, and it was pretty scary how the joke is just about true now. Yes SQL Server now really does run on Linux and is on it's 2017th edition, but still a poor competition to PostgreSQL.

A goody from our old joke archives

CatchMe - Microsoft SQL Server for Unix and Linux
カテゴリー: postgresql

Devrim GÜNDÜZ: End of the naming game: The PostgreSQL project changes its name

planet postgresql - 2019-04-01(月) 18:36:00
I started using PostgreSQL around September 1998. The first problem I had was pronouncing it, and even using right capital letters at the right place.

Was is PostGreSQL? PostgresSQL? PoStGreSQL? PostgreySQL?

Recently Craig also mentioned about the same problem.

Starting today, the PostgreSQL Global Development Group (abbreviated as PGDG) announced that the project will be written as PostgresQL. This will solve the problems (hopefully), and will also help use to drop the "QL" in 2024. Starting v12, all packages will also "provide" postgresXY as the package name, for a smooth change in 2024. Meanwhile, as of today, the project will accept "Postgre" as an alias for those who did not want to learn about the name of the software they are using. I heard rumours that they also say "Orac" or "SQ Serv" or "MyS", so they will now be free to drop SQL in our name, too.

Thanks to everyone who made this change. This was a real blocker for the community, and it will also help newbies in the PostgreSQL Facebook group -- they will now be free to use "Postgre" from now on.
カテゴリー: postgresql

Raghavendra Rao: Install PL/Java 1.5.2 in PostgreSQL 11

planet postgresql - 2019-04-01(月) 08:09:14
PostgreSQL 11 includes several procedural languages with the base distribution: PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. In addition, there are a number of procedural languages that are developed and maintained outside the core PostgreSQL Distribution like PL/Java (Java), PL/Lua (Lua), PL/R (R), PL/sh (Unix Shell), and PL/v8 (JavaScript). In this post, we are going to see...
カテゴリー: postgresql

Andrew Dunstan: Where and when you need a root.crt file

planet postgresql - 2019-03-30(土) 04:09:50

This is something people seem to get confused about quite often. A root.crt file is used to validate a TLS (a.k.a. SSL) certificate presented by the other end of a connection. It is usually the public certificate of the Certificate Authority (CA) that signed the presented certificate, and is used to validate that signature. If a non-root CA was used to sign the other end’s TLS certificate, the root.crt file must contain at least the root of the CA chain, and enough other elements of the chain that together with the certificate can connect the root to the signing CA.

In the simple and most common case where client certificates are not being used, only the client needs a root.crt file, to validate the server’s TLS certificate, if using 'verify-ca' or 'verify-full' ssl mode. The server doesn’t need and can’t use a root.crt file when client certificates are not being used.

On the other hand, if you are using client certificates, the server will also need a root.crt file to validate the client certificates. There is no requirement that same root.crt be used for both sides. It would be perfectly possible for the server’s certificate to be signed by one CA and the client certificates by another.

If more than one CA is used in a certain context, i.e. if the client connects to servers with certificates signed by more than one CA, or of the server accepts connections from clients with certificates signed by more than one CA, then the certificates of all the CAs can be placed in the root.crt file, one after the other. The connection will succeed as long as one of the certificates (or certificate chains) in the file is that of the relevant signing authority.

カテゴリー: postgresql

Craig Kerstiens: A health checkup playbook for your Postgres database

planet postgresql - 2019-03-30(土) 01:59:00

I talk with a lot of folks that set their database up, start working with it, and then are surprised by issues that suddenly crop up out of nowhere. The reality is, so many don’t want to have to be a DBA, instead you would rather build features and just have the database work. But your is that a database is a living breathing thing. As the data itself changes what is the right way to query and behave changes. Making sure your database is healthy and performing at it’s maximum level doesn’t require a giant overhaul constantly. In fact you can probably view it similar to how you approach personal health. Regular check-ups allow you to make small but important adjustments without having to make dramatic life altering changes to keep you on the right path.

After years of running and managing literally millions of Postgres databases, here’s my breakdown of what your regular Postgres health check should look like. Consider running this on a monthly basis to be able to make small tweaks and adjustments and avoid the drastic changes.

Cache rules everything around me

For many applications not all the data is accessed all the time. Instead certain datasets are accessed one and then for some period of time, then the data you’re accessing changes. Postgres in fact is quite good at keeping frequently accessed data in memory.

Your cache hit ratio tells you how often your data is served from in memory vs. having to go to disk. Serving from memory vs. going to disk will be orders of magnitude faster, thus the more you can keep in memory the better. Of course you could provision an instance with as much memory as you have data, but you don’t necessarily have to. Instead watching your cache hit ratio and ensuring it is at 99% is a good metric for proper performance.

You can monitor your cache hit ratio with:

SELECT sum(heap_blks_read) as heap_read, sum(heap_blks_hit) as heap_hit, sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio FROM pg_statio_user_tables; Be careful of dead tuples

Under the cov

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – REINDEX CONCURRENTLY

planet postgresql - 2019-03-30(土) 00:00:02
On 29th of March 2019, Peter Eisentraut committed patch: REINDEX CONCURRENTLY   This adds the CONCURRENTLY option to the REINDEX command. A REINDEX CONCURRENTLY on a specific index creates a new index (like CREATE INDEX CONCURRENTLY), then renames the old index away and the new index in place and adjusts the dependencies, and then drops … Continue reading "Waiting for PostgreSQL 12 – REINDEX CONCURRENTLY"
カテゴリー: postgresql

Ibrar Ahmed: PostgreSQL: Access ClickHouse, One of the Fastest Column DBMSs, With clickhousedb_fdw

planet postgresql - 2019-03-29(金) 23:01:12

Database management systems are meant to house data but, occasionally, they may need to talk with another DBMS. For example, to access an external server which may be hosting a different DBMS. With heterogeneous environments becoming more and more common, a bridge between the servers is established. We call this bridge a “Foreign Data Wrapper” (FDW). PostgreSQL completed its support of SQL/MED (SQL Management of External Data) with release 9.3 in 2013. A foreign data wrapper is a shared library that is loaded by a PostgreSQL server. It enables the creation of foreign tables in PostgreSQL that act as proxies for another data source.

When you query a foreign table, Postgres passes the request to the associated foreign data wrapper. The FDW creates the connection and retrieves or updates the data in the external data store. Since PostgreSQL planner is involved in all of this process as well, it may perform certain operations like aggregate or joins on the data when retrieved from the data source. I cover some of these later in this post.

ClickHouse Database

ClickHouse is an open source column based database management system which claims to be 100–1,000x faster than traditional approaches, capable of processing of more than a billion rows in less than a second.


clickhousedb_fdw is an open source project – GPLv2 licensed – from Percona. Here’s the link for GitHub project repository:

It is an FDW for ClickHouse that allows you to SELECT from, and INSERT INTO, a ClickHouse database from within a PostgreSQL v11 server.

The FDW supports advanced features like aggregate pushdown and joins pushdown. These significantly improve performance by utilizing the remote server’s resources for these resource intensive operations.

If you would like to follow this post and try the FDW between Postgres and ClickHouse, you can download and set up the ontime dataset for ClickHouse.  After following the instructions, the test that you have the desired data. The ClickHouse

カテゴリー: postgresql

Andreas 'ads' Scherbaum: PostgreSQL Europe Community User Group Recognition Guidelines

planet postgresql - 2019-03-29(金) 21:00:00

Over the past months, a great number of PostgreSQL User Groups and Meetups showed up all over Europe. It’s good to see that interest in PostgreSQL is growing!

Some of the user groups approached the PostgreSQL Europe board, and asked for support. Mostly for swag, but also for sending speakers, or other kind of support. We are happy to help!

In order to handle all of these requests, the PostgreSQL Europe board created a set of guidelines for user group meetings. The current version can be found on the PostgreSQL Europe website, under “Community”, and then “Community User Group Recognition Guidelines”. User groups which approach the PostgreSQL Europe board for support are expected to comply by these guidelines. Every user group is self-certified under these guidelines. If you have reason to believe that a self-certified status for a user group is not correct, please contact the PostgreSQL Europe board under “Contact”.

カテゴリー: postgresql

Daniel Vérité: Text search: a custom dictionary to avoid long words

planet postgresql - 2019-03-28(木) 21:51:02

The full text search is based on transforming the initial text into a tsvector. For example:

test=> select to_tsvector('english', 'This text is being processed.'); to_tsvector ---------------------- 'process':5 'text':2

This result is a sorted list of lexems, with their relative positions in the initial text, obtained by this process:

Raw text => Parser => Dictionaries (configurable) => tsvector

When there is enough data, we tend to index these vectors with a GIN or GIST index to speed up text search queries.

In SQL we can inspect the intermediate results of this process with the ts_debug function:

test=> select * from ts_debug('english', 'This text is being processed.'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-----------+----------------+--------------+----------- asciiword | Word, all ASCII | This | {english_stem} | english_stem | {} blank | Space symbols | | {} | | asciiword | Word, all ASCII | text | {english_stem} | english_stem | {text} blank | Space symbols | | {} | | asciiword | Word, all ASCII | is | {english_stem} | english_stem | {} blank | Space symbols | | {} | | asciiword | Word, all ASCII | being | {english_stem} | english_stem | {} blank | Space symbols | | {} | | asciiword | Word, all ASCII | processed | {english_stem} | english_stem | {process} blank | Space symbols | . | {} | |

The parser breaks down the text into tokens (token column), each token being associated with a type (alias and description columns). Then depending on their types, these tokens are submitted as input to dictionaries mapped to these types, which may produce one lexem, or several, or zero to eliminate the term from the output vector.

In the above example, spaces and punctuation are eliminated be

カテゴリー: postgresql

第 159 回理事会議事録 (2019-3) news - 2019-03-28(木) 19:06:44
第 159 回理事会議事録 (2019-3) anzai 2019/03/28 (木) - 19:06
カテゴリー: postgresql

Avinash Kumar: PostgreSQL Upgrade Using pg_dump/pg_restore

planet postgresql - 2019-03-28(木) 03:09:33

In this blog post, we will explore  pg_dump /

pg_restore , one of the most commonly used options for performing a PostgreSQL upgrade. It is important to understand the scenarios under which pg_dump and pg_restore utilities will be helpful.

This post is the second of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different methods available to upgrade your PostgreSQL databases.

About pg_dump pg_dump is a utility to perform a backup of single database. You cannot backup multiple databases unless you do so using separate commands in parallel. If your upgrade plan needs global objects to be copied over, pg_dump need to be supplemented by pg_dumpall  . To know more about pg_dumpall  , you may refer to our previous blog post. pg_dump formats pg_dump can produce dumps in multiple formats – plain text and custom format – each with own advantages. When you use pg_dump with custom format (-Fc) , you must use pg_restore to restore the dump.

If the dump is taken using a plain-text format, pg_dump generates a script file of multiple SQL commands. It can be restored using psql.

A custom format dump, however, is compressed and is not human-readable.

A dump taken in plain text format may be slightly larger in size when compared to a custom format dump.

At times, you may wish to perform schema changes in your target PostgreSQL database before restore, for example, table partitioning. Or you may wish to restore only a selected list of objects from a dump file.

In such cases, you cannot restore a selected list of tables from a plain format dump of a database. If you take the database dump in custom format,  you can use pg_restore, which will help you choose a specific set of tables for restoration.

Steps involved in upgrade

The most important point to remember is that both dump and restore should be performed using the latest binaries. For example, if we need to migrate from version 9.3 to version 11, we should be using the pg_dump binary of PostgreSQL 11 to connect

カテゴリー: postgresql

Paul Ramsey: GeoJSON Features from PostGIS

planet postgresql - 2019-03-27(水) 22:00:00

Every once in a while, someone comes to me and says:

Sure, it’s handy to use ST_AsGeoJSON to convert a geometry into a JSON equivalent, but all the web clients out there like to receive full GeoJSON Features and I end up writing boilerplate to convert database rows into GeoJSON. Also, the only solution I can find on the web is scary and complex. Why don’t you have a row_to_geojson function?

And the answer (still) is that working with rows is fiddly and I don’t really feel like it.

However! It turns out that, with the tools for JSON manipulation already in PostgreSQL and a little scripting it’s possible to make a passable function to do the work.

Start with a simple table.

DROP TABLE IF EXISTS mytable; CREATE TABLE mytable ( pk SERIAL PRIMARY KEY, name TEXT, size DOUBLE PRECISION, geom GEOMETRY ); INSERT INTO mytable (name, size, geom) VALUES ('Peter', 1.0, 'POINT(2 34)'), ('Paul', 2.0, 'POINT(5 67)');

You can convert any row into a JSON structure using the to_jsonb() function.

SELECT to_jsonb(mytable.*) FROM mytable; {"pk": 1, "geom": "010100000000000000000000400000000000004140", "name": "Peter", "size": 1} {"pk": 2, "geom": "010100000000000000000014400000000000C05040", "name": "Paul", "size": 2}

That’s actually all the information we need to create a GeoJSON feature, it just needs to be re-arranged. So let’s make a little utility function to re-arrange it.

CREATE OR REPLACE FUNCTION rowjsonb_to_geojson( rowjsonb JSONB, geom_column TEXT DEFAULT 'geom') RETURNS TEXT AS $$ DECLARE json_props jsonb; json_geom jsonb; json_type jsonb; BEGIN IF NOT rowjsonb ? geom_column THEN RAISE EXCEPTION 'geometry column ''%'' is missing', geom_column; END IF; json_geom := ST_AsGeoJSON((rowjsonb ->> geom_column)::geometry)::jsonb; json_geom := jsonb_build_object('geometry', json_geom); json_props := jsonb_build_object('properties', rowjsonb - geom_column); json_type := jsonb_build_object('type', 'Feature'); return (json_type || json_geom || json_props)::text; END; $$ LANGUAGE 'plpgsql[...]
カテゴリー: postgresql