Kaarel Moppel: Ideas for scaling PostgreSQL to multi-terabyte and beyond

planet postgresql - 2018-10-16(火) 17:00:27

After a recent chat with a DBA, making his first steps towards migrating a big database away from Oracle to Postgres, I thought there must be just too few articles on PostgreSQL features and “hacks”, aiding in achieving scalability to squeeze the last out of the hardware to safely accomodate some decent multi-terabyte size databases. Especially as there are quite a few options out there, I was very surprised that there was so much fear that Postgres is somehow very limited in the scaling matters. Well maybe it was indeed once so (I started with Postgres in 2011), but in year 2018 things are pretty solid actually – so please do read on for some ideas how to juggle terabytes with ease.

Standard Postgres facilities

If you don’t like to sweat too much and do some pioneering then the safest way to scale of course would be to stick with proven out-of-the-box features of Postgres – so first I’d recommend to take a look at the following keywords with some short explanations and maybe it’s all that you need.

  • Light-weight / special purpose indexes

For a complex OLTP system, supporting hundreds of freaky queries, it is very common that the indexes actually take much more disk space than the table files holding the data. To improve on that (especially for indexes that are used infrequently) one can reduce the index sizes drastically with appropriate use of partial, BRIN, GIN or even a bit experimental BLOOM indexes. In total there are 7 different index types supported…but mostly people only know about and use the default B-tree – a big mistake in a multi-TB setting!

Partial indexes allow indexing only a subset of the data – for example in a sales system we might not be interested in fast access to orders in status “FINISHED” (some nightly reports deal with that usually and they can take their time), so why should we index such rows?

GIN, the most know non-default index type perhaps, has been actually around for ages (full-text search) and in short is perfect for indexing columns where there are lot of repeating values – th

カテゴリー: postgresql

Tim Colles: Hold the exact(ish) “CREATE VIEW” definition in the PostgreSQL System Catalog

planet postgresql - 2018-10-16(火) 00:15:36

It would be rather nice if the exact SQL definition (well almost, see below) supplied with the “create view” statement could be held in the PostgreSQL System Catalog, rather than the re-written optimised version that is currently held. This has come up before, see:


Back then this idea got stomped on for good reasons but I am not suggesting quite the same.

We would prefer PostgreSQL to store view definitions in the system catalog *AFTER* column expansion (unlike in the discussion above) but *BEFORE* internal parser rewrites. So, currently for example:


CREATE VIEW foo(x, y) AS

SELECT view_definition FROM information_schema.views
WHERE table_name = ‘foo’;

SELECT bar.x, bar.y
FROM bar
WHERE ((bar.y = ANY (ARRAY[‘a’::text, ‘b’::text])) AND (NOT (bar.x IS DISTINCT FROM 0)));
(1 row)

The column expansion and corresponding rewrite is mandatory at the time of creation (see below), but the other changes in the definition are not. They make comparing the current definition of a view as it is in a database against an externally maintained definition difficult, which is what we in particular want to be able to do.

The SQL standard (at least the ancient version I have access to) for “information_schema” seems rather open to interpretation on this issue. It says that the view definition column:

“contains a representation of the view descriptors”

What do we take “representation” to be – presumably a completely mangled internally optimised “blob” would be compliant (but otherwise useless). The spirit of the statement might be more reasonably taken to be “a human legible representation”. In that case how much rewrite should actually be allowed – to me the rewritten definition above is already getting less legible than the original, but may

カテゴリー: postgresql

Bruce Momjian: Three-Year Cycle

planet postgresql - 2018-10-15(月) 23:45:02

In the early years of Postgres's open source development, we focused on features that could be accomplished in a few weekends. Within a few years, we had completed many of those, and were challenged trying to accomplish big project with a mostly volunteer workforce. Soon, however, large companies started to sponsor developers' time and we launched into big feature development again.

Currently, the calendar is our only real challenge. We have major releases every year, but many features take multiple years to fully implement. We have seen this multi-year process with:

  • Windows port
  • Point-in-time recovery
  • JSON
  • Streaming replication

and are in the process of completing even more:

  • Parallelism
  • Partitioning
  • JIT
  • Sharding

Continue Reading »

カテゴリー: postgresql

Pavel Trukhanov: PgBouncer monitoring improvements in recent versions

planet postgresql - 2018-10-15(月) 22:48:47
PgBouncer monitoring improvements in recent versions

As I wrote in my previous article “USE, RED and real world PgBouncer monitoring” there are some nice commands in PgBouncer’s admin interface that allow to collect stats how things going and spot problems, if you know where to look.

This post is about new stats added in these commands in new PgBouncer versions.

So as you know, SHOW STATS shows cumulative stats for each proxied DB:

Since PgBouncer version 1.8 there’s a couple of new columns in its output.

First one — total_xact_time — total number of microseconds spent by pgbouncer when connected to PostgreSQL in a transaction, either idle in transaction or executing queries.

This will allow us to chart db pool utilization in terms of time spent in transactions and compare it to the query time utilization:

We see two totally different situations — while database is used for serving queries only 5 to 25 % of the time, PgBouncer connections around 8:00 am spend up to 70% of time in transactions!

But this total_xact_time is useful in one more very important way.

There’s a known anti-pattern in Postgres usage, is when your application opens up a transaction, makes a query and then starts doing something else, for example some CPU-heavy calculation on that result or query to some other resource/service/database, while transaction keeps hangging. Later this app will probably return back to this transaction and might, for example, update something and commit it. The bad thing in that case is that there’s a corresponding Postgres backend process, that sits there doing nothing, while transaction is idling. And Postgres backends are somewhat expensive.

Your app should avoid such behavior.

This idle in transaction state might be monitored in the Postgres itself — there’s state column in pg_stat_activity system view. But pg_stat_activity provides us only with a snapshot of current states, that leads to possible false negative errors in reporting occurrences of such cases. Using PgBouncer's stats we can calculate a percenta

カテゴリー: postgresql

pgCMH - Columbus, OH: Joining the herd

planet postgresql - 2018-10-15(月) 13:00:00

The Oct meeting will be held at 18:00 EST on Tues, the 23rd. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.


Our very own Douglas will be presenting this month. He’s going to tell us all about how you can join the PostgreSQL community and contribute the growth and success of PostgreSQL.


CoverMyMeds has graciously agreed to validate your parking if you use their garage so please park there:

You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive.

Park in any space that is not marked ‘24 hour reserved’.

Once parked, take the elevator/stairs to the 3rd floor to reach the Miranova lobby. Once in the lobby, the elevator bank is in the back (West side) of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11th floor. (If the elevator won’t let you pick the 11th floor, contact Doug or CJ (info below)). Once you exit the elevator, look to your left and right; one side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space:

The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079 or CJ at +1.740.407.7043

カテゴリー: postgresql

International PHP Conference 2019 - Spring Edition

php.net - 2018-10-13(土) 05:20:54
カテゴリー: php

Longhorn PHP 2019 CFP is open!

php.net - 2018-10-13(土) 03:08:06
カテゴリー: php

Bruce Momjian: Installing PL/v8

planet postgresql - 2018-10-13(土) 00:00:02

PL/v8 is the JavaScript server-side language for Postgres. It has been available for several years, but a change in the way Google packages the v8 languages has made it burdensome for packagers to build PL/v8 packages.

Therefore, few package managers still distribute PL/v8. This is disappointing since it undercuts some of our NoSQL story. We previously supported JSON storage and a JSON-specific server-side language. This second option is effectively no longer available, and those users who are using PL/v8 will need to find alternatives. This highlights the risk of software significantly relying on other software that it does not control and cannot maintain itself.

カテゴリー: postgresql

Ibrar Ahmed: Track PostgreSQL Row Changes Using Public/Private Key Signing

planet postgresql - 2018-10-12(金) 22:43:31

Authorisations and encryption/decryption within a database system establish the basic guidelines in protecting your database by guarding against malicious structural or data changes.

What are authorisations?

Authorisations are the access privileges that mainly control what a user can and cannot do on the database server for one or more databases. So consider this to be like granting a key to unlock specific doors. Think of this as more like your five star hotel smart card. It allows you access all facilities that are meant for you, but doesn’t let you open every door. Whereas, privileged staff have master keys which let them open any door.

Similarly, in the database world, granting permissions secures the system by allowing specific actions by specific users or user groups, yet it allows database administrator to perform whatever action(s) on the database he/she wishes. PostgreSQL provides user management where you can can create users, and grant and revoke their privileges.


Encryption, decryption can protect your data, obfuscate schema structure and help hide code from prying eyes. Encryption/decryption hides the valuable information and ensures that there are no mischievous changes in the code or data that may be considered harmful. In almost all cases, data encryption and decryption happens on the database server. This is more like hiding your stuff somewhere in your room so that nobody can see it, but also making your stuff difficult to access.

PostgreSQL also provides encryption using pgcrypto (PostgreSQL extension). There are some cases where you don’t want to hide the data, but don’t want people to update it either. You can revoke the privileges to modify the data.

Data modifications

But what if an admin user modifies the data? How you can identify that data is changed? If somebody changes the data and you don’t know about, then it is more dangerous than you losing your data, as you are relying on data which may no longer be valid.

Logs in database systems allow us to track back changes and “po

カテゴリー: postgresql

Shaun M. Thomas: PG Phriday: Studying Stored Procedures in Postgres 11

planet postgresql - 2018-10-12(金) 16:00:47
Studying Stored Procs in Postgres 11

With Postgres 11 looming on the near horizon, it’s only appropriate to check out a recent beta and kick the tires a few times. Whether it’s improvements in parallelism, partitions, stored procedures, JIT functionality, or any number of elements in the release page, there’s a lot to investigate.

It just so happens that I ran across a fortuitous event on Twitter when deciding on an appropriate topic. Behold!

Wasting XIDs has never been this fun

Wait! No! That’s not what stored procedures are for!

I felt so good like anything was possible

When confronted with such a blatant corruption of such a cool new feature, it’s only natural to question the wisdom of doing so. It is, after all, not a great idea to programatically consume transaction IDs. I said as much and moved on with life, certain the worst was over.

Then this happened.

Magnus helpfully recommends proceeding

Now, Magnus is infamous for two things: his technical acumen, and giddy malevolence. His advocacy of a stored procedure named “waste_xid” only proved nobody anywhere should ever run this anywhere, lest they immolate whatever system hosted the database instance.

But hey, VMs are cheap; let’s break things. How else can we learn the limits of our new toys, but by virtually atomizing them?

I hit cruise control and rubbed my eyes

Before we saddle our poor innocent Postgres 11 installation with an inadvisable stored procedure designed specifically to underhandedly reap its transaction lifespan, we should probably make the routine as evil as possible.

One thing stands out immediately: calling EXECUTE is unnecessary overhead. According to the information function documentation, txid_current will assign a new transaction ID if there isn’t one already. Since the stored procedure is constantly committing, that’s extremely handy. And since this is Pl/pgSQL, we can use direct assignment instead.

Our new procedure looks something like this:

カテゴリー: postgresql

Pavel Stehule: New functionality of plpgsql_extension

planet postgresql - 2018-10-11(木) 23:40:00
I am working on new release of plpgsql_check extension - https://github.com/okbob/plpgsql_check.

Interesting new function is possibility to return list of used relation and functions. With these information is easy to generate dependency graph:

postgres=# \sf fx
LANGUAGE plpgsql
AS $function$
perform upper(((plus(a) + 200) * 100)::text) from xx;
postgres=# select * from plpgsql_show_dependency_tb('fx()');
│ type │ oid │ schema │ name │ params │
│ FUNCTION │ 18310 │ public │ plus │ (integer) │
│ RELATION │ 24576 │ public │ xx │ │
(2 rows)
カテゴリー: postgresql

PHP 7.1.23 Released

php.net - 2018-10-11(木) 23:11:47
カテゴリー: php

PHP 7.2.11 Released

php.net - 2018-10-11(木) 23:11:19
カテゴリー: php

PHP 7.3.0RC3 Released

php.net - 2018-10-11(木) 20:47:50
カテゴリー: php

Bruce Momjian: Multi-Host Libpq

planet postgresql - 2018-10-11(木) 03:45:01

Libpq is used by many client interface languages to communicate with the Postgres server. One new feature in Postgres 10 is the ability to specify multiple servers for connection attempts. Specifically, it allows the connection string to contain multiple sets of host, hostaddr, and port values. These are tried until one connects.

NoSQL solutions have used this method of multi-host connectivity for a while, so it is good Postgres can now do it too. It doesn't have all the features of a separate connection pooler, but it doesn't have the administrative or performance overhead of a separate connection pooler either, so it certainly fits a need for some environments.

カテゴリー: postgresql

Jonathan Katz: Won’t You Be My Neighbor? Quickly Finding Who is Nearby

planet postgresql - 2018-10-10(水) 23:01:54

Many applications these days want us to know how close we are to things:

  • What are the three closest coffee shops to my current location?
  • Which is the nearest airport to the office?
  • What are the two closest subway stops to the restaurant?

and countless more examples.

Another way of asking these questions is to say “who are my nearest neighbors to me?” This maps to a classic algorithmic problem: efficiently finding the K-nearest neighbors (or K-NN), where K is a constant. For example, the first question would be a 3-NN problem as we are trying to find the 3 closest coffee shops.

(If you are interested in learning more about K-NN problems in general, I highly recommend looking at how you can solve this using n-dimensional Voronoi diagrams, a wonderful data structure developed in the field of computational geometry.)

How can we use PostgreSQL to help us quickly find our closest neighbors? Let’s explore.

カテゴリー: postgresql

Laurenz Albe: Are your foreign keys indexed?

planet postgresql - 2018-10-10(水) 17:01:23
© Laurenz Albe 2018


Foreign key constraints are an important tool to keep your database consistent while also documenting relationships between tables.

A fact that is often ignored is that foreign keys need proper indexing to perform well.

This article will explain that and show you how to search for missing indexes.

Index at the target of a foreign key

In the following, I will call the table on which the foreign key constraint is defined the source table and the referenced table the target table.

The referenced columns in the target table must have a primary key or unique constraint. Such constraints are implemented with unique indexes in PostgreSQL. Consequently, the target side of a foreign key is automatically indexed.

This is required so that there is always a well-defined row to which the foreign key points. The index also comes handy if you want to find the row in the target table that matches a row in the source table.

Index at the source of a foreign key

In contrast to the above, PostgreSQL requires no index at the source of a foreign key.

However, such an index is quite useful for finding all source rows that reference a target row. The typical cases where you need that are:

1. You perform a join between the two tables where you explicitly search for the source rows referencing one or a few target rows. If there is an index on the columns at the source, PostgreSQL can use an efficient nested loop join.

This is well known and pretty obvious.

2. You delete rows or update key columns in the target table.

Then PostgreSQL has to check if the foreign key constraint is still satisfied. It does so by searching if there are rows in the source table that would become orphaned by the data modification. Without an index, this requires a sequential scan of the source table.

An example

Let’s build a source and a target table:

-- to make the plans look simpler SET max_parallel_workers_per_gather = 0; -- to speed up CREATE INDEX SET maintenance_work_mem = '512MB'; CREATE TABLE target ( t_id integer NOT NULL, [...]
カテゴリー: postgresql

Andrew Dunstan: Managing Freezing in PostgreSQL

planet postgresql - 2018-10-10(水) 16:26:18

Postgres contains a moving event horizon, which is in effect about 2 billion transactions ahead of or behind the current transaction id. Transactions more than 2 billion ahead of or more than 2 billion behind the current transaction id are considered to be in the future, and will thus be invisible to current transactions.

Postgres avoids this catastrophic data loss by specially marking old rows so that no matter where they are in relation to the current transaction id they will be visible.

Freezing is this process of marking old live tuples (i.e. database rows) so that they don’t get run over by the moving event horizon that would otherwise make them appear to be in the future. This is in contrast to vacuuming, which is the freeing up of space consumed by old dead tuples that are no longer visible to any transaction.

Both processes are managed by vacuum.

There are a number of settings that govern how freezing is done.

First, vacuum_freeze_min_age governs whether or not a tuple will be frozen while vacuum is already looking at a page to see if it has dead tuples that can be cleaned up. Tuples older than vacuum_freeze_min_age will be frozen in this case. Setting this low means that there will be less work to do later on, but at the possible cost of extra effort both in CPU and IO or WAL activity. Generally you probably want this set to at least a few hours worth of transactions. Let’s say you’re expecting to do up to 2000 transactions per second as a sustained rate. 2000 TPS is 7.2m transactions per hour. Thus a fairly aggressive setting for this case might be say 20m. The default setting is 50m. Similarly for vacuum_multixact_freeze_min_age. Note that the transaction_id and multixid counters are independent – you need to keep track of both of them.

Second, there are vacuum_freeze_table_age and vacuum_multixact_freeze_table_age. These settings govern when autovacuum will not just look at pages that might have dead rows, but any page that might have unfrozen rows. The defaults for these settings are 150m. If you hav

カテゴリー: postgresql

Fernando Laudares Camargos: PostgreSQL Monitoring: Set Up an Enterprise-Grade Server (and Sign Up for Webinar Weds 10/10…)

planet postgresql - 2018-10-10(水) 01:23:44

This is the last post in our series on building an enterprise-grade PostgreSQL set up using open source tools, and we’ll be covering monitoring.

The previous posts in this series discussed aspects such as security, backup strategy, high availability, connection pooling and load balancing, extensions, and detailed logging in PostgreSQL. Tomorrow, Wednesday, October 10 at 10AM EST, we will be reviewing these topics together, and showcasing then in practice in a webinar format: we hope you can join us!

Register Now


Monitoring databases

The importance of monitoring the activity and health of production systems is unquestionable. When it comes to the database, with its high number of customizable settings, the ability to track its various metrics (status counters and gauges) allows for the maintenance of a historical record of its performance over time. This can be used for capacity planningtroubleshooting and validation.

When it comes to capacity planning, a monitoring solution is a helpful tool to help you assess how the current setup is faring. At the same time, it can help predict future needs based on trends, such as the increase of active connections, queries, and CPU usage. For example, an increase in CPU usage might be due to a genuine increase in workload, but it could also be a sign of unoptimized queries growing in popularity. In which case, comparing CPU with disk access might provide a more complete view of what is going on.

Being able to easily correlate data like this helps you to catch minor issues and to plan accordingly, sometimes allowing you to avoid an easier but more costly solution of scaling up to mitigate problems like this. But having the right monitoring solution is really invaluable when it comes to investigative work and root cause analysis. Trying to understand a problem that has already taken place is a rather complicated, and often unenviable, task unless you established a continuous, watchful eye on the set up for the whole time.

Finally, a monitoring solution can help you valida

カテゴリー: postgresql

Bruce Momjian: Trigger Me Writable

planet postgresql - 2018-10-09(火) 23:30:01

Postgres support for hot standby servers allows read-only queries to be run on standby servers, but how are read-only sessions handled when promoting a standby server to primary? After a standby is promoted to primary, new connections are read/write, but existing connections also change to read/write:

SELECT pg_is_in_recovery(); pg_is_in_recovery ------------------- t SHOW transaction_read_only; transaction_read_only ----------------------- on \! touch /u/pg/data2/primary.trigger -- wait five seconds for the trigger file to be detected CREATE TABLE test (x INTEGER); SELECT pg_is_in_recovery(); pg_is_in_recovery ------------------- f SHOW transaction_read_only; transaction_read_only ----------------------- off

Continue Reading »

カテゴリー: postgresql