Craig Kerstiens: Postgres data types you should consider using

planet postgresql - 2018-08-30(木) 03:07:00

Postgres is a rich and powerful database. And the existence of PostgreSQL extension APIs have enabled Postgres to expand its capabilities beyond the boundaries of what you would expect in a traditional relational database. Examples of popular Postgres extensions today include HyperLogLog, which gives you approximate distincts with a small footprint—to rich geospatial support via PostGIS—to Citus which helps you scale out your Postgres database across multiple nodes to improve performance for multi-tenant SaaS applications and real-time analytics dashboards—to the built-in full text search capabilities in PostgreSQL. With all the bells and whistles you can layer into Postgres, sometimes the most basic built-ins get overlooked.

PostgreSQL has nearly 100 different data types, and these data types can come with their own tuned indexing or their own specialized functions. You probably already use the basics such as integers and text, and today we’re going to take a survey of less-used but incredibly powerful PostgreSQL data types.

JSONB tops the list of Postgres data types

Postgres first received JSON support in Postgres 9.2. But the initial JSON support in 9.2 was about JSON validation, hence was less ideal for many use cases that needed JSON as well as fast query performance.

A couple of years later we got the the successor to the JSON datatype: JSONB. JSONB is a binary version of JSON stored on disk. JSONB compresses, so you lose whitespace, but it comes with some powerful index types to allow you to work much more flexibly with your JSON data.

JSONB is great for unstructured data, and with Postgres you can easily join JSONB data to your other relational models. We use JSONB ourselves heavily at Citus for things such as feature flags, event observation data, and recording logs. You can index JSONB data with GIN indexes which allow you to query on keys and speed up your lookups, since the GIN index makes the full document automatically available for you.

Range types are a calendar app’s best friend

Let’s face it,

カテゴリー: postgresql

Bruce Momjian: Foreign Data Wrappers and Passwords

planet postgresql - 2018-08-30(木) 00:45:01

Foreign data wrappers (FDW) allow data to be read and written to foreign data sources, like NoSQL stores or other Postgres servers. Unfortunately the authentication supported by FDWs is typically limited to passwords defined using CREATE USER MAPPING. For example, postgres_fdw only supports password-based authentication, e.g. SCRAM. Though only the database administrator can see the password, this can still be a security issue.

Ideally, at least some of the Postgres FDWs should support more sophisticated authentication methods, particularly SSL certificates. Another option would be to allow user authentication to be sent through FDWs, so the user has the same permissions on the FDW source and target. There is no technical reason FDW authentication is limited to passwords. This problem has been discussed, and it looks like someone has a plan for solving it, so hopefully it will be improved soon.

カテゴリー: postgresql

Ibrar Ahmed: Tune Linux Kernel Parameters For PostgreSQL Optimization

planet postgresql - 2018-08-29(水) 21:33:46

For optimum performance, a PostgreSQL database depends on the operating system parameters being defined correctly. Poorly configured OS kernel parameters can cause degradation in database server performance. Therefore, it is imperative that these parameters are configured according to the database server and its workload. In this post, we will discuss some important kernel parameters that can affect database server performance and how these should be tuned.


SHMMAX is a kernel parameter used to define the maximum size of a single shared memory segment a Linux process can allocate. Until version 9.2, PostgreSQL uses System V (SysV) that requires SHMMAX setting. After 9.2, PostgreSQL switched to POSIX shared memory. So now it requires fewer bytes of System V shared memory.

Prior to version 9.3 SHMMAX was the most important kernel parameter. The value of SHMMAX is in bytes.

Similarly SHMALL is another kernel parameter used to define system wide total amount of shared memory pages. To view the current values for SHMMAX, SHMALL or SHMMIN, use the ipcs command.

$ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 1073741824 max total shared memory (kbytes) = 17179869184 min seg size (bytes) = 1 $ ipcs -M IPC status from as of Thu Aug 16 22:20:35 PKT 2018 shminfo: shmmax: 16777216 (max shared memory segment size) shmmin: 1 (min shared memory segment size) shmmni: 32 (max number of shared memory identifiers) shmseg: 8 (max shared memory segments per process) shmall: 1024 (max amount of shared memory in pages)

PostgreSQL uses System V IPC to allocate the shared memory. This parameter is one of the most important kernel parameters. Whenever you get following error messages, it means that you have an older version PostgreSQL and you have a very low SHMMAX value. Users are expected to adjust and increase the value according to the shared memory they are going to use.

Possible misconfiguration e[...]
カテゴリー: postgresql

Brian Fehrle: A Guide to Partitioning Data In PostgreSQL

planet postgresql - 2018-08-29(水) 18:58:00
What is Data Partitioning?

For databases with extremely large tables, partitioning is a wonderful and crafty trick for database designers to improve database performance and make maintenance much easier. The maximum table size allowed in a PostgreSQL database is 32TB, however unless it’s running on a not-yet-invented computer from the future, performance issues may arise on a table with only a hundredth of that space.

Partitioning splits a table into multiple tables, and generally is done in a way that applications accessing the table don’t notice any difference, other than being faster to access the data that it needs. By splitting the table into multiple tables, the idea is to allow the execution of the queries to have to scan much smaller tables and indexes to find the data needed. Regardless of how efficient an index strategy is, scanning an index for a table that’s 50GB will always be much faster than an index that’s for a table at 500GB. This applies to table scans as well, because sometimes table scans are just unavoidable.

When introducing a partitioned table to the query planner, there are a few things to know and understand about the query planner itself. Before any query is actually executed, the query planner will take the query and plan out the most efficient way it will access the data. By having the data split up across different tables, the planner can decide what tables to access, and what tables to completely ignore, based on what each table contains.

This is done by adding constraints to the split up tables that define what data is allowed in each table, and with a good design, we can have the query planner scan a small subset of data rather than the whole thing.

Should A Table Be Partitioned?

Partitioning can drastically improve performance on a table when done right, but if done wrong or when not needed, it can make performance worse, even unusable.

How big is the table?

There is no real hardline rule for how big a table must be before partitioning is an option, but based on database access

カテゴリー: postgresql

Christophe Pettus: Don’t LOCK tables. Just don’t.

planet postgresql - 2018-08-29(水) 09:00:10

It’s not uncommon that an application needs to serialize access to one or more resources. The temptation is very high to use the LOCK TABLE SQL statement to do this.

Resist the temptation.

There are many issues with using LOCK:

  • It blocks autovacuum, which can cause bloat or even transaction ID wraparound in extreme cases.
  • An ACCESS EXCLUSIVE lock (the default mode) is passed down to secondaries, which can block queries there, or even cause deadlock-type situations.
  • It’s easy to cause deadlocks with bad LOCKing order.

If the goal is to serialize access, consider using advisory locks instead. They have all of the benefits of a LOCK on a table, while not actually blocking access to autovacuum, or access on secondaries.

(Yes, some database tools may need to take explicit locks for a variety of reasons; that’s a different matter, of course.)

カテゴリー: postgresql

PHP UK Conference 2019 CFP is Open

php.net - 2018-08-29(水) 04:02:06
カテゴリー: php

Hans-Juergen Schoenig: What hot_standby_feedback in PostgreSQL really does

planet postgresql - 2018-08-28(火) 19:23:24

Many of you out there using PostgreSQL streaming replication might wonder what this hot_standby_feedback parameter in postgresql.conf really does. Support customers keep asking this question so it might be useful to share this knowledge with a broader audience of PostgreSQL users out there.

What VACUUM does in PostgreSQL

VACUUM is an essential command in PostgreSQL its goal is to clean out dead rows, which are not needed by anyone anymore. The idea is to reuse space inside a table later as new data comes in. The important thing is: The purpose of VACUUM is to reuse space inside a table – this does not necessarily imply that a relation will shrink. Also: Keep in mind that VACUUM can only clean out dead rows, if they are not need anymore by some other transaction running on your PostgreSQL server.

Consider the following image:

How hot_standby_feedback and VACUUM work together in PostgreSQL

As you can see we have two connections here. The first connection on the left side is running a lengthy SELECT statement. Now keep in mind: An SQL statement will basically “freeze” its view of the data. Within an SQL statement the world does not “change” – the query will always see the same set of data regardless of changes made concurrently. That is really really important to understand.

Let us take a look at the second transaction. It will delete some data and commit. The question that naturally arises is: When can PostgreSQL really delete this row from disk? DELETE itself cannot really clean the row from disk because there might still be a ROLLBACK instead of a COMMIT. In other words a rows must not be deleted on DELETE. PostgreSQL can only mark it as dead for the current transaction. As you can see other transactions might still be able to see those deleted rows.
However, even COMMIT does not have the right to really clean out the row. Remember: The transaction on the left side can still see the dead row because the SELECT statement does not change its snapshot while it is running. COMMIT is therefore too early to clean out

カテゴリー: postgresql

Achilleas Mantzios: Multitenancy Options for PostgreSQL

planet postgresql - 2018-08-28(火) 18:58:00

Multi-tenancy in a software system is called the separation of data according to a set of criteria in order to satisfy a set of objectives. The magnitude/extend, the nature and the final implementation of this separation is dependent on those criteria and objectives. Multi-tenancy is basically a case of data partitioning but we’ll try to avoid this term for the obvious reasons (the term in PostgreSQL has a very specific meaning and is reserved, as declarative table partitioning was introduced in postgresql 10).

The criteria might be:

  1. according to the id of an important master table, which symbolizes the tenant id which might represent:
    1. a company/organization within a larger holding group
    2. a department within a company/organization
    3. a regional office/branch of the same company/organization
  2. according to a user’s location/IP
  3. according to a user’s position inside the company/organization

The objectives might be:

  1. separation of physical or virtual resources
  2. separation of system resources
  3. security
  4. accuracy and convenience of management/users at the various levels of the company/organization

Note by fulfilling an objective we also fulfill all the objectives beneath, i.e. by fulfilling A we also fulfill B, C and D, by fulfilling B we also fulfill C and D, and so forth.

If we want to fulfill objective A we may choose to deploy each tenant as a separate database cluster within its own physical/virtual server. This gives maximum separation of resources and security but gives poor results when we need to see the whole data as one, i.e. the consolidated view of the whole system.

If we want to only achieve objective B we might deploy each tenant as a separate postgresql instance in the same server. This would give us control over how much space would be assigned to each instance, and also some control (depending on the OS) on CPU/mem utilization. This case is not essentially different than A. In the modern cloud computing era, the gap between A and B tends to get smaller and smaller, so that A will be most probably the p

カテゴリー: postgresql

Federico Campoli: A mechanical elephant

planet postgresql - 2018-08-28(火) 09:00:00

In the previous post we modified the apt role for controlling the setup in a declarative way. Then we added a ssh role for configuring the three devuan servers. The role is used to configure the server’s postgres process owner for ssh passwordless connection.

In this tutorial we’ll complete the setup for the postgres user and then we’ll configure the database clusters with a new role.

カテゴリー: postgresql

Bruce Momjian: Certificate Revocation Lists

planet postgresql - 2018-08-28(火) 08:30:01

If you are setting up Postgres server or client TLS/SSL certificates, be sure to also configure support for a certificate revocation list (CRL). This list, distributed by the certificate authority, lists certificates that should no longer be trusted.

While the CRL will initially likely be empty, a time will come when a private key used by a certificate or device is exposed in an unauthorized manner, or an employee who had access to private keys leaves your organization. When that happens, you will need the ability to invalidate certificates — having that ability pre-configured will help, especially during a crisis.

カテゴリー: postgresql

Daniel Vérité: Beware of your next glibc upgrade

planet postgresql - 2018-08-28(火) 01:15:12

GNU libc 2.28, released on August 1, 2018, has among its new features a major update of its Unicode locale data with new collation information.

From the announcement:

The localization data for ISO 14651 is updated to match the 2016 Edition 4 release of the standard, this matches data provided by Unicode 9.0.0. This update introduces significant improvements to the collation of Unicode characters. […] With the update many locales have been updated to take advantage of the new collation information. The new collation information has increased the size of the compiled locale archive or binary locales.

For Postgres databases using language and region-sensitive collations, which tend to be the default nowadays, it means that certain strings might sort differently after this upgrade. A critical consequence is that indexes that depend on such collations must be rebuilt immediately after the upgrade. Servers in WAL-based/streaming replication setups should also be upgraded together since a standby must run the same libc/locales as its primary.

The risk otherwise is index corruption issues, as mentioned for instance in these two threads from pgsql-general: “Issues with german locale on CentOS 5,6,7”, and “The dangers of streaming across versions of glibc: A cautionary tale”

So while this issue is not new, what’s special about glibc-2.28 is the scale of the update in locales, which is unprecedented in recent times. Previously and since year 2000, according to bug#14095, the locale data in glibc were modified on a case-by-case basis. This time, there’s a big merge to close the gap with the standard.

To get a feel for the extent of these changes, I’ve installed ArchLinux which already has glibc-2.28, along with PostgreSQL 10.5, and compared some query results against the same Postgres on Debian 9 (“stretch”) with glibc-2.24.

I expected changes, but not so broad. Simple tests on plain ASCII strings reveal obvious differences immediately. For instance, with the en_US.UTF-8 locale:

Debian stretch (glibc 2.24)

=# select ver[...]
カテゴリー: postgresql

Pavel Trukhanov: Real world SSD wearout

planet postgresql - 2018-08-28(火) 00:16:45

A year ago we’ve added SMART metrics collection to our monitoring agent that collects disk drive attributes on clients servers.

So here a couple of interesting cases from the real world.

Because we needed it to work without installing any additional software, like smartmontools, we implemented collection not of all the attributes, but only basic and not vendor-specific ones — to be able to provide consistent experience. And also that way we skipped burdensome task of maintaining a knowledge base of specific stuff — and I like that a lot :)

This time we’ll discuss only SMART attribute named “media wearout indicator”. Normalized, it shows a percentage of “write resource” left in the device. Under the hood the device keeps track of the number of cycles the NAND media has undergone, and the percentage is calculated against the maximum number of cycles for that device. The normalized value declines linearly from 100 to 1 as the average erase cycle count increases from 0.

Are there any actually dead SSDs?

Though SSDs are pretty common nowadays, just couple of years earlier you could hear a lot of fear talk about SSD wearout. So we wanted to see if some of it were true. So we searched for the maximum wearout across all the devices of all of our clients.

It was just 1%

Reading the docs says it just won’t go below 1%. So it is weared out.

We notified this client. Turns out it was a dedicated server in Hetzner. Their support replaced the device:

Do SSDs die fast?

As we introduced SMART monitoring for some of the clients already some time ago, we have accumulated history. And now we can see it on a timeline.

A server with highest wearout rate we have across our clients servers unfortunately was added to okmeter.io monitoring only two month ago:

This chart indicates that during these two month only, it burned through 8% of “write resource”.

So 100% of this SSD lifetime under that load will end in 100/(8/2) = 2 years.

Is that a lot or too little? I don’t know. But let’s check what kind of load it’s serving?

As you can see

カテゴリー: postgresql

Vladimir Svedov: Understanding System Columns in PostgreSQL

planet postgresql - 2018-08-27(月) 18:58:00

So you sit with your hands over a keyboard and think “what fun I can have to make my lifetime even curiouser?..” Well - create a table of course!

vao=# create table nocol(); CREATE TABLE vao=# select * from nocol; -- (0 rows)

What fun is it about a table with no data?.. Absolutely none! But I can easily fix it:

vao=# insert into nocol default values; INSERT 0 1

It looks weird and quite stupid to have a table with no columns and one row. Not to mention it is not clear what “default values” there were inserted… Well - reading few lines from docs reveals that “All columns will be filled with their default values.” Yet I have no columns! Well - I surely have some:

vao=# select attname, attnum, atttypid::regtype, attisdropped::text from pg_attribute where attrelid = 'nocol'::regclass; attname | attnum | atttypid | attisdropped ----------+--------+----------+-------------- tableoid | -7 | oid | false cmax | -6 | cid | false xmax | -5 | xid | false cmin | -4 | cid | false xmin | -3 | xid | false ctid | -1 | tid | false (6 rows)

So these six are definitely not the ALTER TABLE DROP COLUMN zombies because attisdropped is false. Also I see that the type name of those columns end up with “id”. Reading the bottom section of Object Identifier Types will give the idea. Another funny observation is - the -2 is missing! I wonder where I could have lost it - I just created a table after all! Hm, what object identifier is missing in my table? By definition I mean. I have tuple, command and xact ids. Well unless some “global over whole db identifier”, like oid?.. Checking is easy - I will create table with OIDS:

vao=# create table nocol_withoid() with oids; CREATE TABLE vao=# select attname, attnum, atttypid::regtype, attisdropped::text from pg_attribute where attrelid = 'nocol_withoid'::regclass; attname | attnum | atttypid | attisdropped ----------+--------+----------+-------------- tableoid | -7 | oid | false cmax | -6 | c[...]
カテゴリー: postgresql

Pavel Stehule: New release of pspg pager

planet postgresql - 2018-08-27(月) 17:00:00
I redesigned some work with mouse - ncurses native implementation is simple, but slow by design.

A default layout of pspg is based on old Norton Commander layout. It is good for beginners, because almost all controls are visible. Probably, when you work with pspg longer time, then you prefer more visible content against auxiliary lines. The lines (bars) can be disabled now - you can run pspg with option --no-bars. The pspg is available from github https://github.com/okbob/pspg
カテゴリー: postgresql

Jobin Augustine: PostgreSQL Accessing MySQL as a Data Source Using mysqsl_fdw

planet postgresql - 2018-08-25(土) 00:38:55

There are many organizations where front/web-facing applications use MySQL and back end processing uses PostgreSQL®. Any system integration between these applications generally involves the replication—or duplication—of data from system to system. We recently blogged about pg_chameleon which can be used replicate data from MySQL® to PostgreSQL. mysql_fdw can play a key role in eliminating the problem of replicating/duplicating data. In order to eliminate maintaining the same data physically in both postgres and MySQL, we can use mysql_fdw. This allows PostgreSQL to access MySQL tables and to use them as if they are local tables in PostgreSQL. mysql_fdw can be used, too, with Percona Server for MySQL, our drop-in replacement for MySQL.

This post is to showcase how easy it is to set that up and get them working together. We will address a few points that we skipped while discussing about FDWs in general in our previous post

Preparing MySQL for fdw connectivity

On the MySQL server side, we need to set up a user to allow for access to MySQL from the PostgreSQL server side. We recommend Percona Server for MySQL if you are setting it up for the first time.

mysql> create user 'fdw_user'@'%' identified by 'Secret!123';

This user needs to have privileges on the tables which are to be presented as foreign tables in PostgreSQL.

mysql> grant select,insert,update,delete on EMP to fdw_user@'%'; Query OK, 0 rows affected (0.00 sec) mysql> grant select,insert,update,delete on DEPT to fdw_user@'%'; Query OK, 0 rows affected (0.00 sec) Installing mysql_fdw on PostgreSQL server

Under the hood, MySQL FDW (mysql_fdw) facilitates the use of PostgreSQL server as a client for MySQL Server, which means it can then fetch data from the MySQL database as a client. Obviously, mysql_fdw uses MySQL client libraries. Nowadays, many Linux distributions are packaged with MariaDB® libraries. This works well enough for mysql_fdw to function. If we install mysql_fdw from the PGDG repo, then mariadb-devel.x86_64 packages will be installed alongsid

カテゴリー: postgresql

Devrim GÜNDÜZ: My picks for Postgres Open SV 2018 !

planet postgresql - 2018-08-24(金) 21:07:00
Postgres Open SV 2018 is less than two weeks away now, and I'll speak there as well.

Many people, including me, are flying overseas to attend this great conference, so please hurry up and register before the tickets run out!

Life is hard when there are a lot of talks in parallel, but still I picked up some interesting talks that I want to attend: Continue reading "My picks for Postgres Open SV 2018 !"
カテゴリー: postgresql

Dimitri Fontaine: Geolocation with PostgreSQL

planet postgresql - 2018-08-24(金) 19:11:33

We have loaded Open Street Map points of interests in the article The Most Popular Pub Names — which compares PostgreSQL with MongoDB for simple geographical queries, and is part of our PostgreSQL Extensions article series. In today’s article, look at how to geolocalize an IP address and locate the nearest pub, all within a single SQL query!

For that, we are going to use the awesome ip4r extension from RhodiumToad.

カテゴリー: postgresql

Hans-Juergen Schoenig: PostgreSQL: Improving sort performance

planet postgresql - 2018-08-23(木) 22:00:52

Sorting is a very important aspect of PostgreSQL performance tuning. However, tuning sorts is often misunderstood or simply overlooked by many people. So I decided to come up with a PostgreSQL blog showing, how sorts can be tuned in PostgreSQL.

Creating sample data

To show how sorting works, I created a couple of million rows first:

test=# CREATE TABLE t_test (x numeric); CREATE TABLE test=# INSERT INTO t_test SELECT random() FROM generate_series(1, 5000000); INSERT 0 5000000 test=# ANALYZE ; ANALYZE

What the code does is to create a table and load 5 million random values. As you will notice, data can be loaded within seconds.

Sorting data in PostgreSQL

Let us try to sort the data. For the sake of simplicity I am using the most simplistic statements possible. What you can see is that PostgreSQL has to sort on disk because the data we want to sort does not fit into memory. In this case a bit more than 100 MB of data is moved to disk:

test=# explain analyze SELECT * FROM t_test ORDER BY x; QUERY PLAN -------------------------------------------------------------------------- Sort (cost=804270.42..816770.42 rows=5000000 width=11) (actual time=4503.484..6475.222 rows=5000000 loops=1) Sort Key: x Sort Method: external merge Disk: 102896kB -> Seq Scan on t_test (cost=0.00..77028.00 rows=5000000 width=11) (actual time=0.035..282.427 rows=5000000 loops=1) Planning time: 0.139 ms Execution time: 6606.637 ms (6 rows)

Why does PostgreSQL not simply sort stuff in memory? The reason is the work_mem parameter, which is by default set to 4 MB:

test=# SHOW work_mem; work_mem ---------- 4MB (1 row)

work_mem tells the server that up to 4 MB can be used per operation (per sort, grouping operation, etc.). If you sort too much data, PostgreSQL has to move the excessive amount of data to disk, which is of course slow.

Fortunately changing work_mem is simple and can even be done at the session level.

Speeding up sorts in PostgreSQL – using more work_mem


カテゴリー: postgresql

第 152 回理事会議事録 (2018-08)

www.postgresql.jp news - 2018-08-23(木) 19:14:50
第 152 回理事会議事録 (2018-08) anzai 2018/08/23 (木) - 19:14
カテゴリー: postgresql

Joshua Otwell: PostgreSQL Streaming Replication vs Logical Replication

planet postgresql - 2018-08-23(木) 18:58:00

I consider myself a bit of an explorer. In certain things that is. I typically will always order the same food from a familiar restaurant for fear of disappointment outweighs my apprehension to try something new.

And of course, a hungry human tends to just eat right?

Yet, when it comes to technology, in particular SQL (PostgreSQL), I tend to stumble full speed (my definition of exploring) into oftentimes, unfamiliar areas, with the hope to learn. What better way to learn than experience?

So what on earth does this rambling have to do with Logical and Streaming replication in PostgreSQL?

I am a complete novice in these areas with zero knowledge. Yes, I have about as much understanding in this area of Postgres as I do in astrophysics.

Did I mention that I had zero knowledge?

Therefore, in this blog post, I will try to digest the differences in these types of replication. Without real-world hands on experience, I can't promise you the 'Be all end all' manuscript for replication.

Likely, others less-versed in this particular area (such as myself) would benefit from this blog post.

Experienced users and developers along for the ride, I hope to see you in the comments below.

A Couple of Base Definitions

In the broad sense of the term, what does Replication mean?

Replication as defined on Wiktionary has this to say:

"Process by which an object, person, place or idea may be copied mimicked or reproduced."

Yet, the 5th listed item there is more applicable to this blog post and I feel we should look at it as well:

"(computing) The process of frequent electronic data copying a one database in one computer or server to a database in another so that all users share the same level of information. Used to improve fault tolerance of the system."

Now there is something we can get into. The mention of copying data from one server or database to another? We are in familiar territory now...

So, adding in what we know from that definition, what are the definitions of Streaming Replication and Logical Replication?

Let's see what the

カテゴリー: postgresql