planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 23分 前

Achilleas Mantzios: Multitenancy Options for PostgreSQL

2018-08-28(火) 18:58:00

Multi-tenancy in a software system is called the separation of data according to a set of criteria in order to satisfy a set of objectives. The magnitude/extend, the nature and the final implementation of this separation is dependent on those criteria and objectives. Multi-tenancy is basically a case of data partitioning but we’ll try to avoid this term for the obvious reasons (the term in PostgreSQL has a very specific meaning and is reserved, as declarative table partitioning was introduced in postgresql 10).

The criteria might be:

  1. according to the id of an important master table, which symbolizes the tenant id which might represent:
    1. a company/organization within a larger holding group
    2. a department within a company/organization
    3. a regional office/branch of the same company/organization
  2. according to a user’s location/IP
  3. according to a user’s position inside the company/organization

The objectives might be:

  1. separation of physical or virtual resources
  2. separation of system resources
  3. security
  4. accuracy and convenience of management/users at the various levels of the company/organization

Note by fulfilling an objective we also fulfill all the objectives beneath, i.e. by fulfilling A we also fulfill B, C and D, by fulfilling B we also fulfill C and D, and so forth.

If we want to fulfill objective A we may choose to deploy each tenant as a separate database cluster within its own physical/virtual server. This gives maximum separation of resources and security but gives poor results when we need to see the whole data as one, i.e. the consolidated view of the whole system.

If we want to only achieve objective B we might deploy each tenant as a separate postgresql instance in the same server. This would give us control over how much space would be assigned to each instance, and also some control (depending on the OS) on CPU/mem utilization. This case is not essentially different than A. In the modern cloud computing era, the gap between A and B tends to get smaller and smaller, so that A will be most probably the p

カテゴリー: postgresql

Federico Campoli: A mechanical elephant

2018-08-28(火) 09:00:00

In the previous post we modified the apt role for controlling the setup in a declarative way. Then we added a ssh role for configuring the three devuan servers. The role is used to configure the server’s postgres process owner for ssh passwordless connection.

In this tutorial we’ll complete the setup for the postgres user and then we’ll configure the database clusters with a new role.

カテゴリー: postgresql

Bruce Momjian: Certificate Revocation Lists

2018-08-28(火) 08:30:01

If you are setting up Postgres server or client TLS/SSL certificates, be sure to also configure support for a certificate revocation list (CRL). This list, distributed by the certificate authority, lists certificates that should no longer be trusted.

While the CRL will initially likely be empty, a time will come when a private key used by a certificate or device is exposed in an unauthorized manner, or an employee who had access to private keys leaves your organization. When that happens, you will need the ability to invalidate certificates — having that ability pre-configured will help, especially during a crisis.

カテゴリー: postgresql

Daniel Vérité: Beware of your next glibc upgrade

2018-08-28(火) 01:15:12

GNU libc 2.28, released on August 1, 2018, has among its new features a major update of its Unicode locale data with new collation information.

From the announcement:

The localization data for ISO 14651 is updated to match the 2016 Edition 4 release of the standard, this matches data provided by Unicode 9.0.0. This update introduces significant improvements to the collation of Unicode characters. […] With the update many locales have been updated to take advantage of the new collation information. The new collation information has increased the size of the compiled locale archive or binary locales.

For Postgres databases using language and region-sensitive collations, which tend to be the default nowadays, it means that certain strings might sort differently after this upgrade. A critical consequence is that indexes that depend on such collations must be rebuilt immediately after the upgrade. Servers in WAL-based/streaming replication setups should also be upgraded together since a standby must run the same libc/locales as its primary.

The risk otherwise is index corruption issues, as mentioned for instance in these two threads from pgsql-general: “Issues with german locale on CentOS 5,6,7”, and “The dangers of streaming across versions of glibc: A cautionary tale”

So while this issue is not new, what’s special about glibc-2.28 is the scale of the update in locales, which is unprecedented in recent times. Previously and since year 2000, according to bug#14095, the locale data in glibc were modified on a case-by-case basis. This time, there’s a big merge to close the gap with the standard.

To get a feel for the extent of these changes, I’ve installed ArchLinux which already has glibc-2.28, along with PostgreSQL 10.5, and compared some query results against the same Postgres on Debian 9 (“stretch”) with glibc-2.24.

I expected changes, but not so broad. Simple tests on plain ASCII strings reveal obvious differences immediately. For instance, with the en_US.UTF-8 locale:

Debian stretch (glibc 2.24)

=# select ver[...]
カテゴリー: postgresql

Pavel Trukhanov: Real world SSD wearout

2018-08-28(火) 00:16:45

A year ago we’ve added SMART metrics collection to our monitoring agent that collects disk drive attributes on clients servers.

So here a couple of interesting cases from the real world.

Because we needed it to work without installing any additional software, like smartmontools, we implemented collection not of all the attributes, but only basic and not vendor-specific ones — to be able to provide consistent experience. And also that way we skipped burdensome task of maintaining a knowledge base of specific stuff — and I like that a lot :)

This time we’ll discuss only SMART attribute named “media wearout indicator”. Normalized, it shows a percentage of “write resource” left in the device. Under the hood the device keeps track of the number of cycles the NAND media has undergone, and the percentage is calculated against the maximum number of cycles for that device. The normalized value declines linearly from 100 to 1 as the average erase cycle count increases from 0.

Are there any actually dead SSDs?

Though SSDs are pretty common nowadays, just couple of years earlier you could hear a lot of fear talk about SSD wearout. So we wanted to see if some of it were true. So we searched for the maximum wearout across all the devices of all of our clients.

It was just 1%

Reading the docs says it just won’t go below 1%. So it is weared out.

We notified this client. Turns out it was a dedicated server in Hetzner. Their support replaced the device:

Do SSDs die fast?

As we introduced SMART monitoring for some of the clients already some time ago, we have accumulated history. And now we can see it on a timeline.

A server with highest wearout rate we have across our clients servers unfortunately was added to monitoring only two month ago:

This chart indicates that during these two month only, it burned through 8% of “write resource”.

So 100% of this SSD lifetime under that load will end in 100/(8/2) = 2 years.

Is that a lot or too little? I don’t know. But let’s check what kind of load it’s serving?

As you can see

カテゴリー: postgresql

Vladimir Svedov: Understanding System Columns in PostgreSQL

2018-08-27(月) 18:58:00

So you sit with your hands over a keyboard and think “what fun I can have to make my lifetime even curiouser?..” Well - create a table of course!

vao=# create table nocol(); CREATE TABLE vao=# select * from nocol; -- (0 rows)

What fun is it about a table with no data?.. Absolutely none! But I can easily fix it:

vao=# insert into nocol default values; INSERT 0 1

It looks weird and quite stupid to have a table with no columns and one row. Not to mention it is not clear what “default values” there were inserted… Well - reading few lines from docs reveals that “All columns will be filled with their default values.” Yet I have no columns! Well - I surely have some:

vao=# select attname, attnum, atttypid::regtype, attisdropped::text from pg_attribute where attrelid = 'nocol'::regclass; attname | attnum | atttypid | attisdropped ----------+--------+----------+-------------- tableoid | -7 | oid | false cmax | -6 | cid | false xmax | -5 | xid | false cmin | -4 | cid | false xmin | -3 | xid | false ctid | -1 | tid | false (6 rows)

So these six are definitely not the ALTER TABLE DROP COLUMN zombies because attisdropped is false. Also I see that the type name of those columns end up with “id”. Reading the bottom section of Object Identifier Types will give the idea. Another funny observation is - the -2 is missing! I wonder where I could have lost it - I just created a table after all! Hm, what object identifier is missing in my table? By definition I mean. I have tuple, command and xact ids. Well unless some “global over whole db identifier”, like oid?.. Checking is easy - I will create table with OIDS:

vao=# create table nocol_withoid() with oids; CREATE TABLE vao=# select attname, attnum, atttypid::regtype, attisdropped::text from pg_attribute where attrelid = 'nocol_withoid'::regclass; attname | attnum | atttypid | attisdropped ----------+--------+----------+-------------- tableoid | -7 | oid | false cmax | -6 | c[...]
カテゴリー: postgresql

Pavel Stehule: New release of pspg pager

2018-08-27(月) 17:00:00
I redesigned some work with mouse - ncurses native implementation is simple, but slow by design.

A default layout of pspg is based on old Norton Commander layout. It is good for beginners, because almost all controls are visible. Probably, when you work with pspg longer time, then you prefer more visible content against auxiliary lines. The lines (bars) can be disabled now - you can run pspg with option --no-bars. The pspg is available from github
カテゴリー: postgresql

Jobin Augustine: PostgreSQL Accessing MySQL as a Data Source Using mysqsl_fdw

2018-08-25(土) 00:38:55

There are many organizations where front/web-facing applications use MySQL and back end processing uses PostgreSQL®. Any system integration between these applications generally involves the replication—or duplication—of data from system to system. We recently blogged about pg_chameleon which can be used replicate data from MySQL® to PostgreSQL. mysql_fdw can play a key role in eliminating the problem of replicating/duplicating data. In order to eliminate maintaining the same data physically in both postgres and MySQL, we can use mysql_fdw. This allows PostgreSQL to access MySQL tables and to use them as if they are local tables in PostgreSQL. mysql_fdw can be used, too, with Percona Server for MySQL, our drop-in replacement for MySQL.

This post is to showcase how easy it is to set that up and get them working together. We will address a few points that we skipped while discussing about FDWs in general in our previous post

Preparing MySQL for fdw connectivity

On the MySQL server side, we need to set up a user to allow for access to MySQL from the PostgreSQL server side. We recommend Percona Server for MySQL if you are setting it up for the first time.

mysql> create user 'fdw_user'@'%' identified by 'Secret!123';

This user needs to have privileges on the tables which are to be presented as foreign tables in PostgreSQL.

mysql> grant select,insert,update,delete on EMP to fdw_user@'%'; Query OK, 0 rows affected (0.00 sec) mysql> grant select,insert,update,delete on DEPT to fdw_user@'%'; Query OK, 0 rows affected (0.00 sec) Installing mysql_fdw on PostgreSQL server

Under the hood, MySQL FDW (mysql_fdw) facilitates the use of PostgreSQL server as a client for MySQL Server, which means it can then fetch data from the MySQL database as a client. Obviously, mysql_fdw uses MySQL client libraries. Nowadays, many Linux distributions are packaged with MariaDB® libraries. This works well enough for mysql_fdw to function. If we install mysql_fdw from the PGDG repo, then mariadb-devel.x86_64 packages will be installed alongsid

カテゴリー: postgresql

Devrim GÜNDÜZ: My picks for Postgres Open SV 2018 !

2018-08-24(金) 21:07:00
Postgres Open SV 2018 is less than two weeks away now, and I'll speak there as well.

Many people, including me, are flying overseas to attend this great conference, so please hurry up and register before the tickets run out!

Life is hard when there are a lot of talks in parallel, but still I picked up some interesting talks that I want to attend: Continue reading "My picks for Postgres Open SV 2018 !"
カテゴリー: postgresql

Dimitri Fontaine: Geolocation with PostgreSQL

2018-08-24(金) 19:11:33

We have loaded Open Street Map points of interests in the article The Most Popular Pub Names — which compares PostgreSQL with MongoDB for simple geographical queries, and is part of our PostgreSQL Extensions article series. In today’s article, look at how to geolocalize an IP address and locate the nearest pub, all within a single SQL query!

For that, we are going to use the awesome ip4r extension from RhodiumToad.

カテゴリー: postgresql

Hans-Juergen Schoenig: PostgreSQL: Improving sort performance

2018-08-23(木) 22:00:52

Sorting is a very important aspect of PostgreSQL performance tuning. However, tuning sorts is often misunderstood or simply overlooked by many people. So I decided to come up with a PostgreSQL blog showing, how sorts can be tuned in PostgreSQL.

Creating sample data

To show how sorting works, I created a couple of million rows first:

test=# CREATE TABLE t_test (x numeric); CREATE TABLE test=# INSERT INTO t_test SELECT random() FROM generate_series(1, 5000000); INSERT 0 5000000 test=# ANALYZE ; ANALYZE

What the code does is to create a table and load 5 million random values. As you will notice, data can be loaded within seconds.

Sorting data in PostgreSQL

Let us try to sort the data. For the sake of simplicity I am using the most simplistic statements possible. What you can see is that PostgreSQL has to sort on disk because the data we want to sort does not fit into memory. In this case a bit more than 100 MB of data is moved to disk:

test=# explain analyze SELECT * FROM t_test ORDER BY x; QUERY PLAN -------------------------------------------------------------------------- Sort (cost=804270.42..816770.42 rows=5000000 width=11) (actual time=4503.484..6475.222 rows=5000000 loops=1) Sort Key: x Sort Method: external merge Disk: 102896kB -> Seq Scan on t_test (cost=0.00..77028.00 rows=5000000 width=11) (actual time=0.035..282.427 rows=5000000 loops=1) Planning time: 0.139 ms Execution time: 6606.637 ms (6 rows)

Why does PostgreSQL not simply sort stuff in memory? The reason is the work_mem parameter, which is by default set to 4 MB:

test=# SHOW work_mem; work_mem ---------- 4MB (1 row)

work_mem tells the server that up to 4 MB can be used per operation (per sort, grouping operation, etc.). If you sort too much data, PostgreSQL has to move the excessive amount of data to disk, which is of course slow.

Fortunately changing work_mem is simple and can even be done at the session level.

Speeding up sorts in PostgreSQL – using more work_mem


カテゴリー: postgresql

Joshua Otwell: PostgreSQL Streaming Replication vs Logical Replication

2018-08-23(木) 18:58:00

I consider myself a bit of an explorer. In certain things that is. I typically will always order the same food from a familiar restaurant for fear of disappointment outweighs my apprehension to try something new.

And of course, a hungry human tends to just eat right?

Yet, when it comes to technology, in particular SQL (PostgreSQL), I tend to stumble full speed (my definition of exploring) into oftentimes, unfamiliar areas, with the hope to learn. What better way to learn than experience?

So what on earth does this rambling have to do with Logical and Streaming replication in PostgreSQL?

I am a complete novice in these areas with zero knowledge. Yes, I have about as much understanding in this area of Postgres as I do in astrophysics.

Did I mention that I had zero knowledge?

Therefore, in this blog post, I will try to digest the differences in these types of replication. Without real-world hands on experience, I can't promise you the 'Be all end all' manuscript for replication.

Likely, others less-versed in this particular area (such as myself) would benefit from this blog post.

Experienced users and developers along for the ride, I hope to see you in the comments below.

A Couple of Base Definitions

In the broad sense of the term, what does Replication mean?

Replication as defined on Wiktionary has this to say:

"Process by which an object, person, place or idea may be copied mimicked or reproduced."

Yet, the 5th listed item there is more applicable to this blog post and I feel we should look at it as well:

"(computing) The process of frequent electronic data copying a one database in one computer or server to a database in another so that all users share the same level of information. Used to improve fault tolerance of the system."

Now there is something we can get into. The mention of copying data from one server or database to another? We are in familiar territory now...

So, adding in what we know from that definition, what are the definitions of Streaming Replication and Logical Replication?

Let's see what the

カテゴリー: postgresql

Federico Campoli: Happy birthday pg_chameleon

2018-08-23(木) 09:00:00
Today is two years since I started working on pg_chameleon. Back in 2016 this commit changed the project’s license from GPL v2 to the 2 clause BSD and the project’s scope, which became a MySQL to PostgreSQL replica system. Since then I learned a lot of lessons, made several mistakes and worked out solutions which resulted in a decent tool for bridging two different universes. Writing a replica system is a very complex task.
カテゴリー: postgresql

Magnus Hagander: Updates about upcoming conferences

2018-08-22(水) 21:39:29

Summer vacation times are over. Well, for some of us at least, clearly some are still lucky enough to be off, which is showing itself a bit (see below). But as both conference organisation and participation throughout the rest of the year is starting to be clear, I figure it's time to share some updates around different ones.

Postgres Open SV

First of all - if you haven't already, don't forget to register for Postgres Open SV in San Francisco in two weeks time! Registration for the main US West Coast/California PostgreSQL community conference will close soon, so don't miss your chance. I'm looking forward to meeting many old and new community members there.

PostgreSQL Conference Europe

Next up after Postgres Open will be, the main European PostgreSQL community conference of 2018. The planning for this years conference is at full speed, but unfortunately we are slightly behind. In particular, we were supposed to be notifying all speakers today if they were accepted or not, and unfortunately our program committee are a bit behind schedule on this one. We had over 200 submissions this year which makes their work even bigger than usual. But speakers will be receiving their notification over the upcoming couple of days.

Hopefully once all speakers have been able to confirm their attendance, we will also have a schedule out soon. Until then, you can always look at the list of accepted talks so far. This list is dynamically updated as speakers get approved and confirm their talks.

We have already sold approximately half of the tickets that we have available this year, so if you want to be sure to get your spot, we strongly recommend that you register as soon as you can! And if you want to attend the training sessions, you should hurry even more as some are almost sold out!


Work on the program committee of PGConf.ASIA has also been going on over the late summer, and is mostly done! The schedule is not quite ready yet, but expected out shortly. You can look forward to a very interesting lineup of spe

カテゴリー: postgresql