postgresql

Luca Ferrari: pgenv gets patching support

planet postgresql - 2018-10-31(水) 09:00:00

pgenv does now support a customizable patching feature that allows the user to define which patches to apply when an instance is built.

pgenv gets patching support

pgenv, the useful tool for managing several PostgreSQL installations, gets support for customizable patching.

What is all about? Well, it happens that you could need to patch PostgreSQL source tree before you build, and it could be because something on your operating system is different than the majority of the systems PostgreSQL is built against. Nevermind, you need to patch it!

pgenv did support a very simple patching mechanism hardcoded within the program itself, but during the last days I worked on a different and more customizable approach. The idea is simple: the program will apply every patch file listed in an index for the particular version. So, if you want to build the outshining 11.0 and need to patch it, build an index text file and list there all the patches, and the pgenv build process will apply them before compiling.

Of course, what if you need to apply the same patches over and over to different versions? You will end up with several indexes, one for each version you need to patch. Uhm…not so smart! To avoid this, I designed the patching index selection in a way that allows you to group patches for operating system and brand.

Allow me to explain more in detail with an example. Suppose you are on a Linux machine and need to patch version 11.0: the program will search for a file that matches any of the following:

$PGENV_ROOT/patch/index/patch.11.0.Linux $PGENV_ROOT/patch/index/patch.11.0 $PGENV_ROOT/patch/index/patch.11.Linux $PGENV_ROOT/patch/index/patch.11

This desperate searching for works selecting the first file that matches the operating system and PostgreSQL version or a combination of the two...

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add pg_partition_tree to display information about partitions

planet postgresql - 2018-10-31(水) 05:33:03
On 30th of October 2018, Michael Paquier committed patch: Add pg_partition_tree to display information about partitions     This new function is useful to display a full tree of partitions with a partitioned table given in output, and avoids the need of any complex WITH RECURSIVE query when looking at partition trees which are deep … Continue reading "Waiting for PostgreSQL 12 – Add pg_partition_tree to display information about partitions"
カテゴリー: postgresql

Nickolay Ihalainen: PostgreSQL locking, part 3: lightweight locks

planet postgresql - 2018-10-30(火) 22:31:40

PostgreSQL lightweight locks, or LWLocks, control memory access. PostgreSQL uses multi-process architecture and should allow only consistent reads and writes to shared memory structures. LWLocks have two levels of locking: shared and exclusive. It’s also possible to release all acquired LWLocks to simplify clean up. Other databases often call primitives similar to LWLocks “latches”. Because LWLocks is an implementation detail, application developers shouldn’t pay much attention to this kind of locking.

This is the third and final part of a series on PostgreSQL locking, related to latches protecting internal database structures. Here are the previous parts: Row-level locks and table-level locks.

Instrumentation

Starting from PostgreSQL 9.6, LWLocks activity can be investigated with the pg_stat_activity system view. It could be useful under high CPU utilization. There are system settings to help with contention on specific lightweight locks.

Before PostgreSQL 9.5, the LWLocks implementation used spin-locks.  It was a bottleneck. This was fixed in 9.5 with atomic state variable.

Potential heavy contention places
  • WALInsertLock: protects WAL buffers. You can increase the number of wal buffers to get a slight improvement. Incidentally, synchronous_commit=off increases pressure on the lock even more, but it’s not a bad thing. full_page_writes=off reduces contention, but it’s generally not recommended.
  • WALWriteLock: accrued by PostgreSQL processes while WAL records are flushed to disk or during a WAL segments switch. synchronous_commit=off removes the wait for disk flush, full_page_writes=off reduces the amount of data to flush.
  • LockMgrLock: appears in top waits during a read-only workload. It latches relations regardless of its size. It’s not a single lock, but at least 16 partitions. Thus it’s important to use multiple tables during benchmarks and avoid single table anti-pattern in production.
  • ProcArrayLock: Protects the ProcArray structure. Before PostgreSQL 9.0, every transaction acquired this lock exclusively befo
[...]
カテゴリー: postgresql

Craig Kerstiens: The biggest mistake Postgres ever made

planet postgresql - 2018-10-30(火) 22:30:34

Postgres has experienced a long and great run. It’s over 20 years old and has a track record of being safe and reliable (which is the top thing I care about in a database). In recent years it’s become more cool with things like JSONB, JIT support, and a powerful extension ecosystem. But, Postgres has made some mistakes along the way, the most notable being it’s name.

Postgres gets its name from Ingress. Ingress was one of the first databases and was lead by Michael Stonebreaker who won a Turing award for Postgres and other works. Ingress began in the early 70s at UC Berkeley, which is still to this day known as a top university when it comes to databases. Out of Ingress came a number of databases you’ll still know today such as SQL Server and Sybase. It also as you may have guessed by now spawned Postgres which means Post-Ingress.

In the early days of Postgres there was no SQL. No not NoSQL, there was not SQL. Postgres had it’s own query language. It wasn’t until 1995 that Postgres received SQL support, and with its addition of SQL support it updated it’s name to PostgreSQL.

You see, with Postgres becoming PostgreSQL we began a journey of Postgres being mispronounced for it’s forseeable future and it is still currently the case. Is it really that big of an issue? Well it’s big enough that the PostgreSQL website has a FAQ including “How to pronounce PostgreSQL”. As it stands today there are two generally accepted names:

  • post-GRES-que-ell
  • Postgres

With one of the above there is far less confusion. And in fact I’m not the only one to share this opinion. Tom Lane is a major contributor to every Postgres release for more than the last decade. He’s one of the top 10 contributors to open source in general having worked on the JPEG/PNG/TIFF image formats before coming over to database land. Tom has this classic email in the PostgreSQL mailing list:

```

[>> Can i get data in postgre from non-postgre db? > The name is PostgreSQL or Postgres, not postgre. It might help to explain that the pronunciation is "post-g[...]
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add pg_promote function

planet postgresql - 2018-10-30(火) 22:08:49
On 25th of October 2018, Michael Paquier committed patch: Add pg_promote function     This function is able to promote a standby with this new SQL-callable function. Execution access can be granted to non-superusers so that failover tools can observe the principle of least privilege.   Catalog version is bumped.   Author: Laurenz Albe   … Continue reading "Waiting for PostgreSQL 12 – Add pg_promote function"
カテゴリー: postgresql

Kaarel Moppel: Peeking at query performance of the upcoming version 11 of PostgreSQL

planet postgresql - 2018-10-30(火) 18:00:25

Last weeks the first release candidate of the upcoming Postgres version 11 was released, stating that everything is going more or less as planned – great news! The 11 release is promising a lot of very cool stuff as usual, including – better partitioning, more parallization and JIT capabilities (which are not activated by default though). Besides those headline features there of course hundreds or rather thousands (`git diff –shortstat REL_10_5..REL_11_RC1` says “3192 files changed, 324206 insertions, 178499 deletions”) of smaller changes with some of them for sure boosting performance also on some typical queries. Release notes didn’t state any numerical performance indicators though as usually, so people have to find it out on their own – so that’s what this blog post is about. I’m basically running 4 quite simple analytical/aggregate type of queries and laying out the numbers for you to evaluate – so jump to the summarizing table end of the post or get into the details by reading on.

Test Queries

3 out of 4 test queries I actually just carried over from the last year’s similar test for 9.6 vs 10 (check it out here if you’re into archeology) and added one query to test index traversing performance. Queries are all tied to the schema generated by our good old friend pgbench, with one minor adjustment – creating a copy of the pgbench_accounts table be able to simulate a join on two 1 million rows tables. This time I also didn’t disable the parallel features as they’re “on” by default for both versions and thus won’t be probably touched for most setups.

/* Sum up 50mio rows */ SELECT sum(abalance) FROM pgbench_accounts CROSS JOIN generate_series(1, 5) /* Unique columns joining */ SELECT count(*) FROM pgbench_accounts JOIN pgbench_accounts_copy using (aid) /* Using Grouping Sets analytical feature */ SELECT count(*) FROM (SELECT aid, bid, count(*) FROM pgbench_accounts GROUP BY CUBE (aid, bid)) a; /* B-tree index traversing */ SELECT COUNT(DISTINCT aid) FROM pgbench_accounts where aid % 2 = 0;

For pbench “sc

[...]
カテゴリー: postgresql

Quinn Weaver: It's just an expression

planet postgresql - 2018-10-30(火) 15:23:00
PostgreSQL has lots of great features for text search.

In particular, there are a bunch of ways to do case-insensitive searches for text:
  • There's standard SQL ILIKE… but than can be expensive — especially if you put %'s at both start and end — and it's overkill if you want an exact string match.
  • The same goes for case-insensitive regexp matching: overkill for simple case-insensitive matches. It does work with indexes, though!
  • Then there's the citext extension, which is pretty much the perfect answer. It lets you use indexes and still get case-insensitive matching, which is really cool. It Just Works.
OK, but what if you didn't have the foresight to use citext? And what if you don't want to go through the pain of changing the data type of that column? In a case like this, an expression index can be really handy.

Without an index, a case-insensitive match like this can be quite expensive indeed:
sandbox# select addy from email_addresses where lower(addy) = 'quinn@example.com';

                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on email_addresses  (cost=0.00..1.04 rows=1 width=32) (actual time=0.031..0.032 rows=1 loops=1)
   Filter: (lower(addy) = 'quinn@example.com'::text)
   Rows Removed by Filter: 3
 Planning time: 0.087 ms
 Execution time: 0.051 ms
(5 rows)

(That's a sequential scan of the entire table, lowercasing the addy column of each row before comparing it to the desired address.)

And a regular index on email_addresses(addy) won't help, because the lower() operation forces a sequential scan.
But an expression index will do the trick:

sandbox# create index email_addresses__lower__addy on email_addresses (lower(addy));
CREATE INDEX
sandbox# explain analyze select addy from email_addresses where lower(addy) = 'quinn@example.com';
                                                                  QUERY PLAN
------------------------------------------------------------------------[...]
カテゴリー: postgresql

damien clochard: Introducing PostgreSQL Anonymizer

planet postgresql - 2018-10-29(月) 19:17:36

I’ve just released today an extension called PostgreSQL Anonymizer that will mask or replace personally identifiable information (PII) or commercially sensitive data from a PostgreSQL database.

The project is open source and available here :

https://gitlab.com/daamien/postgresql_anonymizer

I strongly believe in a declarative approach of anonymization : the location of sensitive information inside database and the rules to hide this information should be declared directly using the Data Definition Language (DDL). In the age of GDPR, developpers should specify an anonymization strategy inside the table definitions, just like they specify data types, foreign keys and constraints.

This project is a prototype designed to show the power of implementing data masking directly inside PostgreSQL. Currently it is based on the COMMENT statement (probably the most unused PostgreSQL syntax) and an event trigger. In the near future, I’d like to propose a new syntax for dynamic data masking (MS SQL Server already has it)

The extension can be used to put dynamic masks on certain users or permanently modify sensitive data. Various masking techniques are available : randomization, partial scrambling, custom rules, etc.

Here’s a basic example :

Imagine a people table

=# SELECT * FROM people; id | name | phone ------+----------------+------------ T800 | Schwarzenegger | 0609110911
  1. Activate the masking engine
=# CREATE EXTENSION IF NOT EXISTS anon CASCADE; =# SELECT anon.mask_init();
  1. Declare a masked user
=# CREATE ROLE skynet; =# COMMENT ON ROLE skynet IS 'MASKED';
  1. Declare the masking rules
=# COMMENT ON COLUMN people.name IS 'MASKED WITH FUNCTION anon.random_last_name()'; =# COMMENT ON COLUMN people.phone IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';
  1. Connect with the masked user
=# \! psql test -U skynet -c 'SELECT * FROM people;' id | name | phone ------+----------+------------ T800 | Nunziata | 06******11

Of course this project is a work in progress. I need yo

[...]
カテゴリー: postgresql

brian davis: A Poor Man's Column Oriented Database in PostgreSQL

planet postgresql - 2018-10-29(月) 10:16:00

Let's get this out of the way. If you need a real column oriented database, use one. Don't even think about using the insanity below in production.

Having said that, I've been wondering for a while if it would be possible to demonstrate the properties of a column oriented db by simulating one using only native functionality found in good old, row oriented, PostgreSQL.

Two properties of a column oriented database that make it beneficial in OLAP environments are

  1. For queries that involve only a subset of a table's columns, it only needs to read the data for those columns off disk, and no others, saving on IO
  2. Storing each column separately means it can compress that data better since it's all of the same type, further reducing IO

The reduction in IO saves time for queries aggregating data over a large fraction of the table, improving performance.

PostgreSQL, of course, is row oriented. It stores and reads data a whole row at a time, and this is great when you want to access a majority of a table's columns for a small percentage of the table. i.e. OLTP type queries.

The challenge will be, can we structure things so PostgreSQL behaves more like a column oriented db?

Strategy

The strategy will be to use multiple tables behind the scenes to store the data for each column individually, and throw a view on top to tie them all together and make it look like a single table. If PostgreSQL's query planner is smart enough, it should be able to see that queries involving only a subset of columns only need to access those particular underlying tables, save on IO, and beat out a traditional PostgreSQL table of the same shape. We can even allow for modifying this pseudo-table by creating INSTEAD OF triggers on the view to take INSERT, UPDATE, DELETE statements, chop them up, and perform the necessary operations on the underlying tables.

Here's how the underlying tables will look. We'll have a single 'primary table' that contains only the primary key...

Table "public.primary_table" Column | Type |[...]
カテゴリー: postgresql

ページ