planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 18分 前

Abdul Yadi: pgsocket: Extension for Simple TCP/IP Socket Client

2018-09-11(火) 11:05:32

pgsocket is an extension for PostgreSQL server to send bytes to remote TCP/IP socket server. For the first version only single function provided for one way data send in bytearray.

This extension is compiled in Linux against PostgreSQL version 10.

Download source code from Build in Linux as usual:
$ make clean
$ make
$ make install

On successful compilation, install this extension in PostgreSQL environment
$ create extension pgsocket

Let us send bytes to –for example– host with IP address nnn.nnn.nnn.nnn, port 9090, send time out 30 seconds, messages “Hello”
$ select pgsocketsend(‘nnn.nnn.nnn.nnn’, 9090, 30, (E’\\x’ || encode(‘Hello’, ‘hex’))::bytea);

Or using address host name instead of IP address
$ select pgsocketsend(‘thesocketserver’, 9090, 30, (E’\\x’ || encode(‘Hello’, ‘hex’))::bytea);

カテゴリー: postgresql

Bruce Momjian: Monitoring Complexity

2018-09-11(火) 03:15:01

I have always had trouble understanding the many monitoring options available in Postgres. I was finally able to collect all popular monitoring tools into a single chart (slide 96). It shows the various levels of monitoring: OS, process, query, parser, planner, executor. It also separates instant-in-time reporting and across-time analysis options.

カテゴリー: postgresql

Paul Ramsey: Parallel PostGIS and PgSQL 11

2018-09-11(火) 01:00:00

A little under a year ago, with the release of PostgreSQL 10, I evaluated the parallel query infrastructure and how well PostGIS worked with it.

The results were less than stellar for my example data, which was small-but-not-too-small: under default settings of PostgreSQL and PostGIS, parallel behaviour did not occur.

However, unlike in previous years, as of PostgreSQL 10, it was possible to get parallel plans by making changes to PostGIS settings only. This was a big improvement from PostgreSQL 9.6, which substantial changes to the PostgreSQL default settings were needed to force parallel plans.

PostgreSQL 11 promises more improvements to parallel query:

  • Parallelized hash joins
  • Parallelized CREATE INDEX for B-tree indexes
  • Parallelized CREATE TABLE .. AS, CREATE MATERIALIZED VIEW, and certain queries using UNION

With the exception of CREATE TABLE ... AS none of these are going to affect spatial parallel query. However, there have also been some none-headline changes that have improved parallel planning and thus spatial queries.


PostgreSQL 11 has slightly improved parallel spatial query:

  • Costly spatial functions on the query target list (aka, the SELECT ... line) will now trigger a parallel plan.
  • Under default PostGIS costings, parallel plans do not kick in as soon as they should.
  • Parallel aggregates parallelize readily under default settings.
  • Parallel spatial joins require higher costings on functions than they probably should, but will kick in if the costings are high enough.

In order to run these tests yourself, you will need:

  • PostgreSQL 11
  • PostGIS 2.5

You’ll also need a multi-core computer to see actual performance changes. I used a 4-core desktop for my tests, so I could expect 4x improvements at best.

The setup instructions show where to download the Canadian polling division data used for the testing:

  • pd a table of ~70K polygons
  • pts a table of ~70K points
  • pts_10 a table of ~700K points
  • pts_100 a table of ~7M points

We will work with the default configuration parameters and just mess

カテゴリー: postgresql

Regina Obe: PGOpen 2018 Data Loading Presentation Slides

2018-09-09(日) 13:16:00

At PGOpen 2018 in San Francisco, we gave a talk on 10 ways to load data into Posgres. This is one of the rare talks where we didn't talk much about PostGIS. However we did showcase tools ogr_fdw, ogr2ogr, shp2pgsql, which are commonly used for loading spatial data, but equally as good for loading non-spatial data. Below are the slide links.

Continue reading "PGOpen 2018 Data Loading Presentation Slides"
カテゴリー: postgresql

REGINA OBE: pgAdmin4 now offers PostGIS geometry viewer

2018-09-09(日) 06:13:00

pgAdmin4 version 3.3 released this week comes with a PostGIS geometry viewer. You will be able to see the graphical output of your query directly in pgAdmin, provided you output a geometry or geography column. If your column is of SRID 4326 (WGS 84 lon/lat), pgAdmin will automatically display against an OpenStreetMap background.

We have Xuri Gong to thank for working on this as a PostGIS/pgAdmin Google Summer of Code (GSOC) project. We'd like to thank Victoria Rautenbach and Frikan Erwee for mentoring.

Continue reading "pgAdmin4 now offers PostGIS geometry viewer"
カテゴリー: postgresql

Avinash Kumar: Setting up Streaming Replication in PostgreSQL

2018-09-08(土) 03:00:58

Configuring replication between two databases is considered to be a best strategy towards achieving high availability during disasters and provides fault tolerance against unexpected failures. PostgreSQL satisfies this requirement through streaming replication. We shall talk about another option called logical replication and logical decoding in our future blog post.

Streaming replication works on log shipping. Every transaction in postgres is written to a transaction log called WAL (write-ahead log) to achieve durability. A slave uses these WAL segments to continuously replicate changes from its master.

There exists three mandatory processes –

wal sender  , wal receiver  and startup  process, these play a major role in achieving streaming replication in postgres.


wal sender  process runs on a master, whereas the wal receiver  and startup  processes runs on its slave. When you start the replication, a wal receiver  process sends the LSN (Log Sequence Number) up until when the WAL data has been replayed on a slave, to the master. And then the wal sender  process on master sends the WAL data until the latest LSN starting from the LSN sent by the wal receiver , to the slave. Wal receiver  writes the WAL data sent by wal sender  to WAL segments. It is the startup  process on slave that replays the data written to WAL segment. And then the streaming replication begins.

Note: Log Sequence Number, or LSN, is a pointer to a location in the WAL.

Steps to setup streaming replication between a master and one slave Step 1:

Create the user in master using whichever slave should connect for streaming the WALs. This user must have REPLICATION ROLE.


The following parameters on the master are considered as mandatory when setting up streaming replication.

  • archive_mode : Must be set to ON to enable archiving of WALs.
  • wal_level : Must be at least set to hot_standby  until version 9.5 or replica  in the later versions.
  • max_wal_senders : Must be set t
カテゴリー: postgresql

Bruce Momjian: Signing Rows

2018-09-07(金) 22:30:01

With the RSA keys created in my previous blog entry, we can now properly sign rows to provide integrity and non-repudiation, which we did not have before. To show this, let's create a modified version of the previous schema by renaming the last column to signature:

CREATE TABLE secure_demo2 ( id SERIAL, car_type TEXT, license TEXT, activity TEXT, event_timestamp TIMESTAMP WITH TIME ZONE, username NAME, signature BYTEA);

Continue Reading »

カテゴリー: postgresql

Peter Eisentraut: Upgrading to PostgreSQL 11 with Logical Replication

2018-09-06(木) 23:09:26

It’s time.

About a year ago, we published PostgreSQL 10 with support for native logical replication. One of the uses of logical replication is to allow low- or no-downtime upgrading between PostgreSQL major versions. Until now, PostgreSQL 10 was the only PostgreSQL release with native logical replication, so there weren’t many opportunities for upgrading in this way. (Logical replication can also be used for moving data between instances on different operating systems or CPU architectures or with different low-level configuration settings such as block size or locale — sidegrading if you will.) Now that PostgreSQL 11 is near, there will be more reasons to make use of this functionality.

Let’s first compare the three main ways to upgrade a PostgreSQL installation:

  • pg_dump and restore
  • pg_upgrade
  • logical replication

We can compare these methods in terms of robustness, speed, required downtime, and restrictions (and more, but we have to stop somewhere for this article).

pg_dump and restore is arguably the most robust method, since it’s the most tested and has been in use for decades. It also has very few restrictions in terms of what it can handle. It is possible to construct databases that cannot be dumped and restored, mostly involving particular object dependency relationships, but those are rare and usually involve discouraged practices.

The problem with the dump and restore method is of course that it effectively requires downtime for the whole time the dump and restore operations run. While the source database is still readable and writable while the process runs, any updates to the source database after the start of the dump will be lost.

pg_upgrade improves on the pg_dump process by moving over the data files directly without having to dump them out into a logical textual form. Note that pg_upgrade still uses pg_dump internally to copy the schema, but not the data. When pg_upgrade was new, its robustness was questioned, and it did upgrade some databases incorrectly. But pg_upgrade is now quite mature and we

カテゴリー: postgresql

Quinn Weaver: Locks talk this Friday, at PostgresOpen! (2018-09-07)

2018-09-06(木) 09:05:00
Attending PostgresOpen?

Come join me Friday for a gentle introduction to locks in PostgreSQL. My example-driven talk covers basic lock theory, tools for lock debugging, and common pitfalls and solutions. I hope to see you there!

Time and place info is on the PostgresOpen SV website.

カテゴリー: postgresql

Bruce Momjian: Client Row Access Control

2018-09-06(木) 02:30:01

Usually the database administrator controls who can access database data. However, it is possible for clients to completely control who can access data they add to the database, with the help of openssl.

First, let's create RSA keys for three users from the command line. We first create an RSA public/private key pair for each user in their home subdirectory and then make a copy of their RSA public key in the shared directory /u/postgres/keys:

# # must be run as the root user # cd /u/postgres/keys # for USER in user1 user2 user3 > do mkdir ~"$USER"/.pgkey > chown -R "$USER" ~"$USER"/.pgkey > chmod 0700 ~"$USER"/.pgkey > openssl genpkey -algorithm RSA -out ~"$USER"/.pgkey/rsa.key > chmod 0600 ~"$USER"/.pgkey/* > openssl pkey -in ~"$USER"/.pgkey/rsa.key -pubout -out "$USER".pub > done

Continue Reading »

カテゴリー: postgresql

Dave Page: Why do we install as root?

2018-09-05(水) 20:04:00
A couple of common questions I hear from customers (usually long-time users of a particular database from Redwood) via our guys in the field is “why do we install our software as root?” And “why do we run services as postgres?”. The simple, TLDR; answer is “for security”. For a detailed explanation, read on…
A basic principle when securing a software installation is “install with maximum privilege requirements and run with minimal”. In practice this equates to having software being installed and binaries/executables etc. owned by the root user, whilst the services themselves are actually run under a minimally privileged (and ideally dedicated) service user account, typically postgres in a PostgreSQL installation. Data files, and any other files that need to be modified by the software in normal operation are also owned by the service user account.
Let’s look at the running software first. Postgres (which will in fact refuse to run as root), is a server process which is often running on a network port that is accessible from other nodes on the network. Of course, we should limit access as much as possible to only those nodes that need access using both a firewall (even simple iptables rules will work), and Postgres’ pg_hba.conf access control file, but even with those measures in place, it’s possible that a determined attacker (let’s call him Zero Cool) can still gain access to the port the database server is running on.
Once our arch-nemesis Zero Cool has access to the database server port, he needs a way to escalate his attack. This may involve exploiting an unknown security issue in Postgres itself (as with any software, we hope there are none but we’re kidding ourselves if we think it’s totally secure), or it may be that he’s used other techniques such as social engineering to learn a users credentials.
If Zero gains “regular” access to Postgres, then he will be subject to any security measures (access control lists, RLS policies etc) that limit the scope of what the user account he’s used can ac[...]
カテゴリー: postgresql

Christophe Pettus: CHAR: What is it good for?

2018-09-05(水) 04:00:33

In addition to the familiar text types VARCHAR and TEXT, PostgreSQL has a type CHAR. It’s little used… and that’s for a reason. It has some very unusual behaviors, which can be quite a surprise if you are not expecting them.

First, CHAR is a fixed-width type. When character data is stored in it, it’s padded out with spaces if it is not full length:

xof=# create table chars (c char(20)); CREATE TABLE xof=# insert into chars values('x'); INSERT 0 1 xof=# select * from chars; c ---------------------- x (1 row)

OK, that’s reasonable, right? But what is going on here?

xof=# select length(c) from chars; length -------- 1 (1 row) xof=# select substring(c from 8 for 1) = ' '::char(1) from chars; ?column? ---------- t (1 row) xof=# select substring(c from 8 for 1) = ' '::varchar(1) from chars; ?column? ---------- f (1 row) xof=# select length(substring(c from 8 for 1)) from chars; length -------- 0 (1 row) xof=# select c || 'y' from chars; ?column? ---------- xy (1 row)

CHAR, when actually used, first trims off all trailing spaces, then applies the operation. It is trying to simulate a variable-length type, for historic reasons. This can be quite surprising, since a supposedly fixed-length type suddenly starts behaving as if it were variable. Unless you are terribly nostalgic for punched cards, CHAR is generally not what you want.

Is there ever a time to use CHAR? Not really. If you have a single-character enumeration that can never be either ” or ‘ ‘ (a single space), it might be more logical to store it as CHAR(1) rather than VARCHAR, but any space savings will be minimal and highly dependent on the alignment of the surrounding items.

And for n > 1, just use VARCHAR… or TEXT. (Remember that in PostgreSQL, VARCHAR and TEXT are stored the same way.)

カテゴリー: postgresql

Craig Kerstiens: 12 Factor: Dev/prod parity for your database

2018-09-05(水) 03:29:00

The twelve-factor app changed the way we build SaaS applications. Explicit dependency management, separating config from code, scaling out your app concurrently—these design principles took us from giant J2EE apps to apps that scale predictably on the web. One of these 12 factors has long stood out as a challenge when it comes to databases: dev/prod parity. Sure, you can run the exact same version of your database, and have a sandbox copy, but testing and staging with production data… that’s a different story.

Dev/Prod parity is easy until it’s not

Running the same version of your database in development as in production should be a no brainer. Just do it. If you use Postgres 10 in production, make sure to use the same version of Postgres in dev. For Postgres at least, you’re usually pretty safe on point releases so between 10.1 and say 10.3 you may not have to worry too much, but at least keep the same major version of the database the same between dev and prod.

Now that we have the easy part out of the way, the complexity starts to come with the data. The version of your database is one piece, but how your data interacts is equally as key. If your perfectly pristine data set for development doesn’t attempt to create hotspots in your data, violate constraints, or in general push the boundaries of what’s allowed then there is a good chance you run into issues when you deploy to production.

Remember that time that you added an index to speed things up… Everything tested fine on your local dev box. Even in staging against the 10 GB, the sample DB worked fine. You deployed, and the index ran as part of your deploy script, and 30 minutes in you were madly figuring out how to cancel things as so many people showed up at your desk asking why the system is down?

Let’s start with database migrations, and indexes

We’re going to come back to dev/prod parity in a minute and how you can safely test operations against a production dataset. But first, let’s tackle two practices that you should put in place immediately to sav

カテゴリー: postgresql

Joshua Drake: PostgresConf Silicon Valley: Schedule now available

2018-09-05(水) 02:23:00
PostgresConf Silicon Valley is being held October 15th-16th at the Hilton San Jose and the schedule is now available.

The two day event received over 80 submissions! A lot of the usual and legendary speakers are present but we were pleasantly surprised to find that so many new (to the conference) speakers submitted. It shows that the mission of People, Postgres, Data is growing at an amplified rate. The presentations are all of the "best-in-class quality" our attendees have come to expect from PostgresConf events.

Whether your needs are Big Data, Google Cloud, AWS RDS, GPDR Compliance, or you just have a burning desire to learn more about Kubernetes, PostgresConf Silicon Valley has you covered!
We also have two fantastic training opportunities which are the first of their kind: Join us on October 15th-16th and remember all roads lead to PostgresConf 2019 (more news on that soon)!

カテゴリー: postgresql

Paul Ramsey: Moving on to CrunchyData

2018-09-05(水) 00:16:11

Today is my first day with my new employer Crunchy Data. Haven’t heard of them? I’m not surprised: outside of the world of PostgreSQL, they are not particularly well known, yet.

I’m leaving behind a pretty awesome gig at CARTO, and some fabulous co-workers. Why do such a thing?

While CARTO is turning in constant growth and finding good traction with some core spatial intelligence use cases, the path to success is leading them into solving problems of increasing specificity. Logistics optimization, siting, market evaluation.

Moving to Crunchy Data means transitioning from being the database guy (boring!) in a geospatial intelligence company, to being the geospatial guy (super cool!) in a database company. Without changing anything about myself, I get to be the most interesting guy in the room! What could be better than that?

Crunchy Data has quietly assembled an exceptionally deep team of PostgreSQL community members: Tom Lane, Stephen Frost, Ed Conway, Peter Geoghegan, Dave Cramer, David Steele, and Jonathan Katz are all names that will be familiar to followers of the PostgreSQL mailing lists.

They’ve also quietly assembled expertise in key areas of interest to large enterprises: security deployment details (STIGs, RLS, Common Criteria); Kubernetes and PaaS deployments; and now (ta da!) geospatial.

Why does this matter? Because the database world is at a technological inflection point.

Core enterprise systems change very infrequently, and only under pressure from multiple sources. The last major inflection point was around the early 2000s, when the fleet of enterprise proprietary UNIX systems came under pressure from multiple sources:

  • The RISC architecture began to fall noticeably behind x86 and particular x86-64.
  • Pricing on RISC systems began to diverge sharply from x86 systems.
  • A compatible UNIX operating system (Linux) was available on the alternative architecture.
  • A credible support company (Red Hat) was available and speaking the language of the enterprise.

The timeline of the Linux tidal wave was (extrem

カテゴリー: postgresql

Marriya Malik: PostgreSQL installer by 2ndQuadrant – now supports OmniDB!

2018-09-04(火) 21:30:33

PostgreSQL installer or PGInstaller – is a user-friendly, graphical installation and configuration tool for PostgreSQL. With just a few clicks you can install PostgreSQL – version 9.5, 9.6, 10 and 11(beta) – on Windows, Linux and macOS.

The latest PGInstaller release includes support for OmniDB – an interactive and user-friendly database management tool to manage multiple databases in a unified workspace.

Using utilities bundled with PostgreSQL, the only means to connect to the database is via psql. Psql works via the command line, which can be fairly tricky for new users especially if they are migrating from another database and are not used to the interface. PGInstaller makes the connection process easier with OmniDB.

PGInstaller comes bundled with OmniDB, as an optional package that can be selected during the installation process, shown in the screenshot below (this screen will show in the third step of the PGInstaller installation process). To view the complete process, please go through this step-by-step installation guide.

Once the installation process is complete, launch OmniDB. This step varies on  different platforms and can be viewed in the “README” file that opens up right after the installation. I will be demonstrating the process on Windows.

  • Go to the location:
C:\Program Files\2ndQuadrant\PostgreSQL\OmniDB
  • Double click on “omnidb-app.exe” to start OmniDB

You should see the screen below

OmniDB allows users to manage multiple databases (PostgreSQL, Oracle, MySQL, MariaDB, etc) in a unified workspace with a user-friendly and fast-performing interface. In order to start managing your PostgreSQL database, you need to first establish a connection with it. Without any connection, your interface should look like the following:

To establish connection, go to “Connections” in the top navigation bar and click on “New connection”

Enter your Server ID on which your database is hosted, Port, Database and Username and click on “Save data”

Before you start playing around with your database, you’d need to

カテゴリー: postgresql

Kaarel Moppel: Next feature release for the pgwatch2 monitoring tool

2018-09-04(火) 16:01:10

With summer fading away it’s time to get busy again – last couple of weeks I’ve taken time to work on our Open Source PostgreSQL monitoring tool pgwatch2 and partly on request from a couple of bigger organizations using the tool on a larger scale, added another batch of useful management / scaling features and some more minor enhancements from the Github queue as well. By the way, this is already the 4th “Feature Pack” in one and a half years, so after having implemented the below features we’re considering the software now “ripe”, with no important features missing. Also we’re glad that quite some people have given their feedback recently, helping to improve the software even further and thus hopefully helping to provide more value to the PostgreSQL community. But read on for a more detailed overview on the most important features from this v1.4.0 update.

Getting friendly with Ansible & Co

Similar to the last update we have tried to make pgwatch2 easier to deploy on a larger scale. This time nothing new on the containerization front though, but we’ve added the possibility to make repeatable, configuration based deployments possible! Meaning – one can add a config file(s) with connect strings, metric selections / intervals and the metric definitions themselves to some version control / configuration management / application deployment system and deploy the metrics collector easily to each required DB node, pushing metrics directly to InfluxDB or Graphite. This works better also for firewalled environments.


The previously supported centrally managed metrics gatherer / configuration database apporach works as before, but for the case when the amount of servers gets too large (hundreds and above) to be handeled by one central gatherer without lag, one can now add a logical grouping label to the monitored hosts and then deploy separate gatherers for subset(s) of hosts based on that label. There are also other performance changes like batching of metric storage requests and connection pooling, helping to i

カテゴリー: postgresql

Pavel Trukhanov: PostgreSQL: why and how WAL bloats

2018-09-04(火) 00:12:57
Today’s post is about real life of PG’s Write-Ahead log. WAL. An almost short introduction

Any changes to a Postgresql database first of all are saved in Write-Ahead log, so they will never get lost. Only after that actual changes are made to the data in memory pages (in so called buffer cache) and these pages are marked dirty — meaning they need to be synced to disk later.

For that there’s a Checkpoint process, ran periodically, that dumps all the ‘dirty’ pages to disk. It also saves the position in WAL (called REDO point), up to which all changes are synchronized.

So in case of a Postgres DB crashes, it will restore its state by sequentially replaying the WAL records from REDO point. So all the WAL records before this point are useless for recovery, but still might be needed for replication purposes or for Point In Time Recovery.

From this description a Super-Engineer might’ve figured out all the ways it will go wrong in real life :-) But in reality usually one will do this in a reactive way: one need to stumble upon a problem first.

WAL bloats #1

Our monitoring agent for every instance of Postgres will find WAL files and collect their number and total size.

Here’s a case of some strange x6 growth of WAL size and segment count:

What could that be?

WAL is considered unneeded and to be removed after a Checkpoint is made. This is why we check it first. Postgres has a special system view called pg_stat_bgwriter that has info on checkpoints:

  • checkpoints_timed — is a counter of checkpoints triggered due that the time elapsed from the previous checkpoint is more than pg setting checkpoint_timeout. These are so called scheduled checkpoints.
  • checkpoints_req — is a counter of checkpoints ran due to uncheckpointed WAL size grew to more than max_wal_size setting — requested checkpoints.

So let’s see:

We see that after 21 Aug checkpoints ceased to run. Though we would love to know the exact reason it’s so, we can’t ¯\_(ツ)_/¯

But as one might remember, Postgres is known to be prone to unexpected behavior due to long la

カテゴリー: postgresql

Pavel Stehule: New features for pspg

2018-09-02(日) 17:20:00
I wrote some (I useful) features for pspg

  • possibility to show/hide row numbers
  • possibility to hide/show row cursor

Load from GitHub 1.6.0 release.
カテゴリー: postgresql

Hans-Juergen Schoenig: PostgreSQL: Parallel CREATE INDEX for better performance

2018-09-02(日) 17:00:00

PostgreSQL will shortly be released and it is therefore time to take a look at one of the most important new features provided by PostgreSQL 11: The ability to create indexes in parallel. For many years various commercial database vendors have already offered this feature and we are glad that PostgreSQL has become part of this elite club, which offers multi-core index creation, which will dramatically improve the usability of large database deployments in the future.

Creating large tables in PostgreSQL

Since version 11 PostgreSQL supports classical “stored procedures”. The beauty is that a procedure can run more than one transaction, which is ideal if you want to generate huge amounts of random data. When you call generate_series to generate 1 million rows, PostgreSQL has to keep this data in memory and therefore generating hundreds of millions of random rows using more than 1 transactions can be really useful to reduce the memory footprint. Here is how it works:

CREATE TABLE t_demo (data numeric); CREATE OR REPLACE PROCEDURE insert_data(buckets integer) LANGUAGE plpgsql AS $$ DECLARE i int; BEGIN i := 0; WHILE i < buckets LOOP INSERT INTO t_demo SELECT random() FROM generate_series(1, 1000000); i := i + 1; RAISE NOTICE 'inserted % buckets', i; COMMIT; END LOOP; RETURN; END; $$; CALL insert_data(500);

This tiny bit of code loads 500 million random numeric values, which should be enough to demonstrate, how CREATE INDEX can be improved in PostgreSQL 11. In our example 500 million rows translate to roughly 21 GB of data:

test=# \d+ List of relations Schema | Name | Type | Owner | Size | Description --------+--------+-------+-------+-------+------------- public | t_demo | table | hs | 21 GB | (1 row)

The reason why I went for numeric is that numeric causes the most overhead of all number data types. Creating a numeric index is a lot more costly than indexing, say, int4 or int8. The goal is to see, how much CP

カテゴリー: postgresql