Michael Paquier: Postgres 12 highlight - New PGXS options for isolation and TAP tests

planet postgresql - 2018-12-07(金) 19:30:44

If you maintain some PostgreSQL extensions which rely on PGXS, a build infrastructure for PostgreSQL, the following commit added to Postgres 12 will be likely something interesting, because it adds new options to control more types of regression tests:

commit: d3c09b9b1307e022883801000ae36bcb5eef71e8 author: Michael Paquier <michael@paquier.xyz> date: Mon, 3 Dec 2018 09:27:35 +0900 committer: Michael Paquier <michael@paquier.xyz> date: Mon, 3 Dec 2018 09:27:35 +0900 Add PGXS options to control TAP and isolation tests, take two The following options are added for extensions: - TAP_TESTS, to allow an extention to run TAP tests which are the ones present in t/*.pl. A subset of tests can always be run with the existing PROVE_TESTS for developers. - ISOLATION, to define a list of isolation tests. - ISOLATION_OPTS, to pass custom options to isolation_tester. A couple of custom Makefile rules have been accumulated across the tree to cover the lack of facility in PGXS for a couple of releases when using those test suites, which are all now replaced with the new flags, without reducing the test coverage. Note that tests of contrib/bloom/ are not enabled yet, as those are proving unstable in the buildfarm. Author: Michael Paquier Reviewed-by: Adam Berlin, Álvaro Herrera, Tom Lane, Nikolay Shaplov, Arthur Zakirov Discussion: https://postgr.es/m/20180906014849.GG2726@paquier.xyz

This is similar rather to the existing REGRESS and REGRESS_OPTS which allow to respectively list a set of regression tests and pass down additional options to pg_regress (like a custom configuration file). When it comes to REGRESS, input files need to be listed in sql/ and expected output files are present in expected/, with items listed without a dedicated “.sql” suffix.

The new options ISOLATION and ISOLATION_OPTS added in PostgreSQL 12 are similar to REGRESS and REGRESS_OPTS, except that they can be used to define a set of tests to stress the behavior of concurrent sessions, for example for locking checks across commands, etc. PostgreSQL

カテゴリー: postgresql

Pavel Stehule: New release of Orafce extension

planet postgresql - 2018-12-07(金) 16:19:00
I released new mostly release of Orafce. Today it is massive package of emulation often used Oracle's API and the emulation is on maximum what is possible.

Now Orafce has good and very nice documentation written by Horikawa Tomohiro (big thanks for his work).

There are not too much news for people who use Oracle 3.6:
  • possibility to better emulate || operator for varchar2 and nvarchar2 types
  • few bugfixes
  • only PostgreSQL 9.4 and newer are supported
  • support for PostgreSQL 11, current master branch (future PostgreSQL 12) is supported too

カテゴリー: postgresql

Quinn Weaver: BDR talk by Mark Wong of 2nd Quadrant

planet postgresql - 2018-12-07(金) 05:37:00
In the Bay Area? This Wednesday, 2018-12-12, Mark Wong from 2nd Quadrant talking about BDR (Bi-Directional Replication, a form of multi-master) for PostgreSQL. This is a great chance to get inside, real-time info on BDR.

Multi-master is one of those features where when you need it, you really, really need it, and if you're in that category, this talk is for you. It's also of interest to anyone trying to figure out the best solution for scaling and redundancy beyond one machine and one data center.

To attend, you must RSVP at Meetup with your full name (for building security's guest list).
カテゴリー: postgresql

PHP 7.1.25 Released

php.net - 2018-12-07(金) 01:10:25
カテゴリー: php

PHP 7.2.13 Released

php.net - 2018-12-07(金) 01:09:43
カテゴリー: php

PHP 5.6.39 Released

php.net - 2018-12-06(木) 23:14:16
カテゴリー: php

PHP 7.3.0 Released

php.net - 2018-12-06(木) 21:57:53
カテゴリー: php

PHP 7.0.33 Released

php.net - 2018-12-06(木) 21:00:00
カテゴリー: php

Kaarel Moppel: PostgreSQL affiliate projects for horizontal multi-terabyte scaling

planet postgresql - 2018-12-06(木) 18:00:20

Some weeks ago I wrote about some common concepts / performance hacks, how one can (relatively) easily scale to a terabyte cluster or more. And based on my experience visiting customers from various industries, 80% of them are not even reaching that threshold…but just to be clear I wanted to write another post showing that a couple of terabytes are of course not the “end station” for Postgres, given one is ready to roll up his sleeves and get “hands dirty“ so to say. So let’s look here at some additional Postgres-like projects for cases where you still want to make use of your Postgres know-how and SQL skills over big amounts of data.

But be warned, the road is getting bumpy now – we now usually need to change the applications and also the surrounding bits and we’re doing sharding, meaning data does not live on a single node anymore so SQL aggregates over all data can get quirky. Also we’re mostly extending the rock-solid core PostgreSQL with 3rd-party extensions or using forks with constraining characteristics, so you might have to re-define and re-import the data and you might need to learn some new query constructs and forget some standard PostgreSQL ones…so generally be prepared to pull out a bit of hair if you’ve got any left:) But OK, here some projects that you should know of.

Postgres extensions/derivatives for multi-terabyte scale-out
  • Sharding via PL/Proxy stored procedures

This kind of “old school” solution was created and battle tested in Skype (huge user of Postgres by the way!) by scaling an important cluster to 32 nodes so it obviously works pretty well. The main upside on the other hand is that all data and data access is sharded for you automatically after you pick a stored procedure parameter as shard key and you can use all of the standard Postgres features…with the downside that, well, all data access needs to go over PL/pgSQL stored procedures which most developers I guess are not so versed in. In short PL/Proxy is just some glue to get the stored procedure call to reach the correct shard so

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add log_statement_sample_rate parameter

planet postgresql - 2018-12-06(木) 04:34:22
On 29th of November 2018, Alvaro Herrera committed patch: Add log_statement_sample_rate parameter   This allows to set a lower log_min_duration_statement value without incurring excessive log traffic (which reduces performance). This can be useful to analyze workloads with lots of short queries.   Author: Adrien Nayrat   Discussion: https://postgr.es/m/-ee1e-db9f-fa97-@anayrat.info One of the problems I did encounter … Continue reading "Waiting for PostgreSQL 12 – Add log_statement_sample_rate parameter"
カテゴリー: postgresql

Tomas Vondra: Databases vs. encryption

planet postgresql - 2018-12-06(木) 00:30:34

Let’s assume you have some sensitive data, that you need to protect by encryption. It might be credit card numbers (the usual example), social security numbers, or pretty much anything you consider sensitive. It does not matter if the encryption is mandated by a standard like PCI DSS or if you just decided to encrypt the sensitive stuff. You need to do the encryption right and actually protecting the information in both cases. Unfortunately, full-disk-encrytion and pgcrypto are not a good fit for multiple reasons, and application-level encryption reduces the database to “dumb” storage. Let’s look at an alternative approach – offloading the encryption to a separate trusted component, implemented as a custom data type.

Note: A couple of weeks ago at pgconf.eu 2018, I presented a lightning talk introducing a PoC of an alternative approach to encrypting data in a database. I got repeatedly asked about various details since then, so let me just explain it in this blog post.

FDE and pgcrypto

In the PostgreSQL world, people will typically recommend two solutions to this problem – full-disk encryption and pgcrypto. Unfortunately, neither of them really works for this use case :-(

Full-disk encryption (FDE) is great. It’s transparent to the database (and application), so there are no implementation changes needed. The overhead is very low, particularly when your CPU supports AES-NI etc. The problem is it only really protects against someone stealing the disk. It does not protect against OS-level attacks (rogue sysadmin, someone gaining remote access to the box or backups, …). Nor does it protect against database-level attacks (think SQL injection). And most importantly, it’s trivial to leak the plaintext data into server log, various monitoring systems etc. Not great.

pgcrypto addresses some of these issues as the encryption happens in the database. But it means the database has to know the keys, and those are likely part of SQL queries and so the issue with leaking data into server logs and monitoring systems is still

カテゴリー: postgresql

Bruce Momjian: The Meaning of WAL

planet postgresql - 2018-12-06(木) 00:30:01

The write-ahead log (WAL) is very important for Postgres reliability. However, how it works is often unclear.

The "write-ahead" part of the write-ahead log means that all database changes must be written to pg_wal files before commit. However, shared buffers dirtied by a transaction can be written (and fsync'ed) before or after the transaction commits.

Huh? Postgres allows dirty buffers to be written to storage before the transaction commits? Yes. When dirty buffers are written to storage, each modified row is marked with the currently-executing transaction id that modified it. Any session viewing those rows knows to ignore those changes until the transaction commits. If it did not, a long transaction could dirty all the available shared buffers and prevent future database changes.

Continue Reading »

カテゴリー: postgresql

Alexey Lesovsky: Why avoid long transactions?

planet postgresql - 2018-12-05(水) 19:15:00
The majority of PostgreSQL community clearly understands why long and idle transactions are “bad”. But when you talk about it to the newcomers it’s always a good idea to backup your explanation with some real tests.

While preparing slides for my presentation about vacuum I have made a simple test case with long transaction using pgbench. Here are the results.

pgbench -c8 -P 60 -T 3600 -U postgres pgbench
starting vacuum...end.
progress: 60.0 s, 9506.3 tps, lat 0.841 ms stddev 0.390
progress: 120.0 s, 5262.1 tps, lat 1.520 ms stddev 0.517
progress: 180.0 s, 3801.8 tps, lat 2.104 ms stddev 0.757
progress: 240.0 s, 2960.0 tps, lat 2.703 ms stddev 0.830
progress: 300.0 s, 2575.8 tps, lat 3.106 ms stddev 0.891

in the end

progress: 3300.0 s, 759.5 tps, lat 10.533 ms stddev 2.554
progress: 3360.0 s, 751.8 tps, lat 10.642 ms stddev 2.604
progress: 3420.0 s, 743.6 tps, lat 10.759 ms stddev 2.655
progress: 3480.0 s, 739.1 tps, lat 10.824 ms stddev 2.662
progress: 3540.0 s, 742.5 tps, lat 10.774 ms stddev 2.579
progress: 3600.0 s, 868.2 tps, lat 9.215 ms stddev 2.569

This is a standard TPC-B pgbench test, running on a small database which completely resides in shared buffers (it removes disk IO influences).

As you can see, the performance measured in transaction per second initially dropped during the first few minutes of the test and continues to reduce further.

Look at the statistics from the vacuum logs:

tuples: 0 removed, 692428 remain, 691693 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 984009 remain, 983855 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 1176821 remain, 1176821 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 1494122 remain, 1494122 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 2022284 remain, 2022284 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 2756298 remain, 2756153 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 3500913 remain, 3500693[...]
カテゴリー: postgresql

Tatsuo Ishii: log_client_messages in Pgpool-II 4.0

planet postgresql - 2018-12-04(火) 16:59:00
Pgpool-II 4.0 adds new logging feature called "log_client_messages". This allows to log messages coming from frontend. Up to 3.7 the only way to log frontend messages was enable debugging log, which produced tremendous amount of logs.

For example, with log_client_messages enabled, "pgbench -S -M parepared -t 2" produces frontend logs below:

2018-12-04 16:43:45: pid 6522: LOG:  Parse message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  statement: "P0_1", query: "SELECT abalance FROM pgbench_accounts WHERE aid = $1;"
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.
2018-12-04 16:43:45: pid 6522: LOG:  Bind message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: "", statement: "P0_1"
2018-12-04 16:43:45: pid 6522: LOG:  Describe message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Execute message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.

As you can see, pgbench sends a query "SELECT abalance FROM pgbench_accounts WHERE aid = $1;" using prepared statement "P0_1", then bind message to bind parameter to be bound to "$1".
It then sends describe message to obtain meta data, and finally sends execute message to run the query.

Below are the second execution of query (remember that we add "-t 2" parameter to execute 2 transactions).

2018-12-04 16:43:45: pid 6522: LOG:  Bind message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: "", statement: "P0_1"
2018-12-04 16:43:45: pid 6522: LOG:  Describe message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Execute message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.
2018-12-04 16:43:45: pid 6522: LOG:  Terminate message from frontend.

This time no parse message is sent because pgbench reuses the named statement "P0_1", which eliminates the p[...]
カテゴリー: postgresql

Bruce Momjian: Views vs. Materialized Views

planet postgresql - 2018-12-04(火) 00:45:01

Views and materialized views are closely related. Views effectively run the view query on every access, while materialized views store the query output in a table and reuse the results on every materialized view reference, until the materialized view is refreshed. This cache effect becomes even more significant when the underlying query or tables are slow, such as analytics queries and foreign data wrapper tables. You can think of materialized views as cached views.

カテゴリー: postgresql

Bruce Momjian: Extensibility

planet postgresql - 2018-12-01(土) 06:30:01

Extensibility was built into Postgres from its creation. In the early years, extensibility was often overlooked and made Postgres server programming harder. However, in the last 15 years, extensibility allowed Postgres to adapt to modern workloads at an amazing pace. The non-relational data storage options mentioned in this presentation would not have been possible without Postgres's extensibility.

カテゴリー: postgresql

Nickolay Ihalainen: PostgreSQL Streaming Physical Replication With Slots

planet postgresql - 2018-12-01(土) 02:05:32

PostgreSQL streaming physical replication with slots simplifies setup and maintenance procedures. Usually, you should estimate disk usage for the Write Ahead Log (WAL) and provide appropriate limitation to the number of segments and setup of the WAL archive procedure. In this article, you will see how to use replication with slots and understand what problems it could solve.


PostgreSQL physical replication is based on WAL. Th Write Ahead Log contains all database changes, saved in 16MB segment files. Normally postgres tries to keep segments between checkpoints. So with default settings, just 1GB of WAL segment files is available.

Replication requires all WAL files created after backup and up until the current time. Previously, it was necessary to keep a huge archive directory (usually mounted by NFS to all slave servers). The slots feature introduced in 9.4 allows Postgres to track the latest segment downloaded by a slave server. Now, PostgreSQL can keep all segments on disk, even without archiving, if a slave is seriously behind its master due to downtime or networking issues. The drawback: the disk space could be consumed infinitely in the case of configuration error. Before continuing, if you need a better understanding of physical replication and streaming replication, I recommend you read “Streaming Replication with PostgreSQL“.

Create a sandbox with two PostgreSQL servers

To setup replication, you need at least two PostgreSQL servers. I’m using pgcli (pgc) to setup both servers on the same host. It’s easy to install on Linux, Windows, and OS X, and provides the ability to download and run any version of PostgreSQL on your staging server or even on your laptop.

python -c "$(curl -fsSL https://s3.amazonaws.com/pgcentral/install.py)" mv bigsql master cp -r master slave $ cd master master$ ./pgc install pg10 master$ ./pgc start pg10 $ cd ../slave slave$ ./pgc install pg10 slave$ ./pgc start pg10

First of all you should allow the replication user to connect:

master$ echo "host replication replic[...]
カテゴリー: postgresql

Liaqat Andrabi: Webinar : Introduction to OmniDB [Follow Up]

planet postgresql - 2018-11-30(金) 22:59:58

A database management tool that simplifies what is complex and drives performance. OmniDB is one such tool with which you can connect to several different databases – including PostgreSQL, Oracle, MySQL and others.

2ndQuadrant recently hosted a webinar on this very topic: Introduction to OmniDB. The webinar was presented by OmniDB co-founders and PostgreSQL consultants at 2ndQuadrant, Rafael Castro & William Ivanski.

The recording of the webinar is now available here.

Questions that Rafael and William couldn’t respond to during the live webinar have been answered below.

Q1: There are other open source GUI tools around to manage PostgreSQL. Why are you investing efforts on a new tool?

A1: When OmniDB was created we wanted a web tool, and not all available tools offered this architecture. Also, as advanced SQL developers, we wanted fewer forms and more SQL templates. Finally, we also wanted the freedom to develop features that don’t exist in other tools, or existed but were unmaintained or not up-to-date, such as customizable monitoring dashboard, console tab and the debugger which now supports PG 11 procedures.

Q2: Currently it is not possible to import data from a file into a database. Do you plan to implement such feature?

A2: Yes, we will implement this soon. There will be an interface for the user to upload and configure data to be imported, and also in the Console Tab there will be a new \copy command.

Q3: Is it possible to view the query plan ?

A3: Yes, it is possible to view the query plan using the magnifying glass icons in the Query Tab. The first one will do an EXPLAIN, and the second an EXPLAIN ANALYZE. The output can be seen as a list or as a tree.

Q4: Is it possible to pass parameters in the EXPLAIN command ?

A4: You can always manually execute EXPLAIN with any parameters that you need. However, the graphical component to view the plan only allows EXPLAIN or EXPLAIN ANALYZE. We will investigate the possibility to make the EXPLAIN command customizable for the graphical component.

For any questions, co

カテゴリー: postgresql

Marco Slot: Why the RDBMS is the future of distributed databases, ft. Postgres and Citus

planet postgresql - 2018-11-30(金) 17:15:00

Around 10 years ago I joined Amazon Web Services and that’s where I first saw the importance of trade-offs in distributed systems. In university I had already learned about the trade-offs between consistency and availability (the CAP theorem), but in practice the spectrum goes a lot deeper than that. Any design decision may involve trade-offs between latency, concurrency, scalability, durability, maintainability, functionality, operational simplicity, and other aspects of the system—and those trade-offs have meaningful impact on the features and user experience of the application, and even on the effectiveness of the business itself.

Perhaps the most challenging problem in distributed systems, in which the need for trade-offs is most apparent, is building a distributed database. When applications began to require databases that could scale across many servers, database developers began to make extreme trade-offs. In order to achieve scalability over many nodes, distributed key-value stores (NoSQL) put aside the rich feature set offered by the traditional relational database management systems (RDBMS), including SQL, joins, foreign keys, and ACID guarantees. Since everyone wants scalability, it would only be a matter of time before the RDBMS would disappear, right? Actually, relational databases have continued to dominate the database landscape. And here’s why:

The most important aspect to consider when making trade-offs in a distributed system (or any system) is development cost.

The trade-offs made by your database software will have significant impact on the development cost of your application. Handling data in an advanced application that needs to be usable, reliable, and performant is a problem that is inherently complex to solve. The number of man hours required to successfully address every little subproblem can be enormous. Fortunately, a database can take care of many of these subproblems, but database developers face the cost problem as well. It actually takes many decades to build the functionality, gu

カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - DOS prevention

planet postgresql - 2018-11-30(金) 13:50:22

A couple of months ago a thread has begun on the PostgreSQL community mailing lists about a set of problems where it is possible to lock down PostgreSQL from connections just by running a set of queries with any user, having an open connection to the cluster being enough to do a denial of service.

For example, in one session do the following by scanning pg_stat_activity in a transaction with any user:

BEGIN; SELECT count(*) FROM pg_stat_activity;

This has the particularity to take an access share lock on the system catalog pg_authid which is a critical catalog used for authentication. And then, with a second session and the same user, do for example VACUUM FULL on pg_authid, like that:

VACUUM FULL pg_authid;

This user is not an owner of the relation so VACUUM will fail. However, at this stage the second session will be stuck until the first session commits as an attempt to take a lock on the relation will be done, and a VACUUM FULL takes an exclusive lock, which prevents anything to read or write it. Hence, in this particular case, as pg_authid is used for authentication, then no new connections can be done to the instance until the transaction of the first session has committed.

As the thread continued, more commands have been mentioned as having the same kind of issues:

  • As mentioned above, VACUUM FULL is a pattern. In this case, queuing for a lock on a relation for which an operation will fail should not happen. This takes an exclusive lock on the relation.
  • TRUNCATE, for reasons similar to VACUUM FULL.
  • REINDEX on a database or a schema.

The first two cases have been fixed for PostgreSQL 12, with commit a556549 for VACUUM and commit f841ceb for TRUNCATE. Note that similar work has been done a couple of years ado with for example CLUSTER in commit cbe24a6. In all those cases, the root of the problem is to make sure that the user has the right to take a lock on a relation before attempting it and locking it, so this has basically required a bit of refactoring so as the code involved makes use of RangeVa

カテゴリー: postgresql