postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add log_statement_sample_rate parameter

planet postgresql - 2018-12-06(木) 04:34:22
On 29th of November 2018, Alvaro Herrera committed patch: Add log_statement_sample_rate parameter   This allows to set a lower log_min_duration_statement value without incurring excessive log traffic (which reduces performance). This can be useful to analyze workloads with lots of short queries.   Author: Adrien Nayrat   Discussion: https://postgr.es/m/-ee1e-db9f-fa97-@anayrat.info One of the problems I did encounter … Continue reading "Waiting for PostgreSQL 12 – Add log_statement_sample_rate parameter"
カテゴリー: postgresql

Tomas Vondra: Databases vs. encryption

planet postgresql - 2018-12-06(木) 00:30:34

Let’s assume you have some sensitive data, that you need to protect by encryption. It might be credit card numbers (the usual example), social security numbers, or pretty much anything you consider sensitive. It does not matter if the encryption is mandated by a standard like PCI DSS or if you just decided to encrypt the sensitive stuff. You need to do the encryption right and actually protecting the information in both cases. Unfortunately, full-disk-encrytion and pgcrypto are not a good fit for multiple reasons, and application-level encryption reduces the database to “dumb” storage. Let’s look at an alternative approach – offloading the encryption to a separate trusted component, implemented as a custom data type.

Note: A couple of weeks ago at pgconf.eu 2018, I presented a lightning talk introducing a PoC of an alternative approach to encrypting data in a database. I got repeatedly asked about various details since then, so let me just explain it in this blog post.

FDE and pgcrypto

In the PostgreSQL world, people will typically recommend two solutions to this problem – full-disk encryption and pgcrypto. Unfortunately, neither of them really works for this use case :-(

Full-disk encryption (FDE) is great. It’s transparent to the database (and application), so there are no implementation changes needed. The overhead is very low, particularly when your CPU supports AES-NI etc. The problem is it only really protects against someone stealing the disk. It does not protect against OS-level attacks (rogue sysadmin, someone gaining remote access to the box or backups, …). Nor does it protect against database-level attacks (think SQL injection). And most importantly, it’s trivial to leak the plaintext data into server log, various monitoring systems etc. Not great.

pgcrypto addresses some of these issues as the encryption happens in the database. But it means the database has to know the keys, and those are likely part of SQL queries and so the issue with leaking data into server logs and monitoring systems is still

[...]
カテゴリー: postgresql

Bruce Momjian: The Meaning of WAL

planet postgresql - 2018-12-06(木) 00:30:01

The write-ahead log (WAL) is very important for Postgres reliability. However, how it works is often unclear.

The "write-ahead" part of the write-ahead log means that all database changes must be written to pg_wal files before commit. However, shared buffers dirtied by a transaction can be written (and fsync'ed) before or after the transaction commits.

Huh? Postgres allows dirty buffers to be written to storage before the transaction commits? Yes. When dirty buffers are written to storage, each modified row is marked with the currently-executing transaction id that modified it. Any session viewing those rows knows to ignore those changes until the transaction commits. If it did not, a long transaction could dirty all the available shared buffers and prevent future database changes.

Continue Reading »

カテゴリー: postgresql

Alexey Lesovsky: Why avoid long transactions?

planet postgresql - 2018-12-05(水) 19:15:00
The majority of PostgreSQL community clearly understands why long and idle transactions are “bad”. But when you talk about it to the newcomers it’s always a good idea to backup your explanation with some real tests.

While preparing slides for my presentation about vacuum I have made a simple test case with long transaction using pgbench. Here are the results.

pgbench -c8 -P 60 -T 3600 -U postgres pgbench
starting vacuum...end.
progress: 60.0 s, 9506.3 tps, lat 0.841 ms stddev 0.390
progress: 120.0 s, 5262.1 tps, lat 1.520 ms stddev 0.517
progress: 180.0 s, 3801.8 tps, lat 2.104 ms stddev 0.757
progress: 240.0 s, 2960.0 tps, lat 2.703 ms stddev 0.830
progress: 300.0 s, 2575.8 tps, lat 3.106 ms stddev 0.891

in the end

progress: 3300.0 s, 759.5 tps, lat 10.533 ms stddev 2.554
progress: 3360.0 s, 751.8 tps, lat 10.642 ms stddev 2.604
progress: 3420.0 s, 743.6 tps, lat 10.759 ms stddev 2.655
progress: 3480.0 s, 739.1 tps, lat 10.824 ms stddev 2.662
progress: 3540.0 s, 742.5 tps, lat 10.774 ms stddev 2.579
progress: 3600.0 s, 868.2 tps, lat 9.215 ms stddev 2.569

This is a standard TPC-B pgbench test, running on a small database which completely resides in shared buffers (it removes disk IO influences).

As you can see, the performance measured in transaction per second initially dropped during the first few minutes of the test and continues to reduce further.

Look at the statistics from the vacuum logs:

tuples: 0 removed, 692428 remain, 691693 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 984009 remain, 983855 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 1176821 remain, 1176821 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 1494122 remain, 1494122 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 2022284 remain, 2022284 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 2756298 remain, 2756153 are dead but not yet removable, oldest xmin: 62109160
tuples: 0 removed, 3500913 remain, 3500693[...]
カテゴリー: postgresql

Tatsuo Ishii: log_client_messages in Pgpool-II 4.0

planet postgresql - 2018-12-04(火) 16:59:00
Pgpool-II 4.0 adds new logging feature called "log_client_messages". This allows to log messages coming from frontend. Up to 3.7 the only way to log frontend messages was enable debugging log, which produced tremendous amount of logs.

For example, with log_client_messages enabled, "pgbench -S -M parepared -t 2" produces frontend logs below:

2018-12-04 16:43:45: pid 6522: LOG:  Parse message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  statement: "P0_1", query: "SELECT abalance FROM pgbench_accounts WHERE aid = $1;"
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.
2018-12-04 16:43:45: pid 6522: LOG:  Bind message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: "", statement: "P0_1"
2018-12-04 16:43:45: pid 6522: LOG:  Describe message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Execute message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.

As you can see, pgbench sends a query "SELECT abalance FROM pgbench_accounts WHERE aid = $1;" using prepared statement "P0_1", then bind message to bind parameter to be bound to "$1".
It then sends describe message to obtain meta data, and finally sends execute message to run the query.

Below are the second execution of query (remember that we add "-t 2" parameter to execute 2 transactions).

2018-12-04 16:43:45: pid 6522: LOG:  Bind message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: "", statement: "P0_1"
2018-12-04 16:43:45: pid 6522: LOG:  Describe message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Execute message from frontend.
2018-12-04 16:43:45: pid 6522: DETAIL:  portal: ""
2018-12-04 16:43:45: pid 6522: LOG:  Sync message from frontend.
2018-12-04 16:43:45: pid 6522: LOG:  Terminate message from frontend.

This time no parse message is sent because pgbench reuses the named statement "P0_1", which eliminates the p[...]
カテゴリー: postgresql

Bruce Momjian: Views vs. Materialized Views

planet postgresql - 2018-12-04(火) 00:45:01

Views and materialized views are closely related. Views effectively run the view query on every access, while materialized views store the query output in a table and reuse the results on every materialized view reference, until the materialized view is refreshed. This cache effect becomes even more significant when the underlying query or tables are slow, such as analytics queries and foreign data wrapper tables. You can think of materialized views as cached views.

カテゴリー: postgresql

ページ