postgresql

Laurenz Albe: Killed index tuples

planet postgresql - 2018-11-05(月) 18:00:16
© Laurenz Albe 2018

 

Since I only recently learned about the concept of “killed index tuples”, I thought there might be some others who are not yet familiar with this interesting PostgreSQL concept.

This may give you an explanation the next time you encounter wildly varying execution times for the same execution plan of the same PostgreSQL query.

Before we look more closely at the index, let’s review the life cycle of a table row version (“heap tuple”).

Life, death and visibility in the table heap

It is widely known that the visibility of heap tuples is determined by the system columns xmin and xmax (though there is more to xmax than meets the eye). A heap tuple is “dead” if its xmax is less than the xmin of all active transactions.

Now xmin and xmax are only valid if the respective transactions have been marked committed in the “commit log”. Consequently, any transaction that needs to know if it can see a tuple has to consult the commit log. To save future readers that extra work, the first one that consults the commit log will save the information in the tuple’s “hint bits”.

Dead tuples are eventually reclaimed by VACUUM.

This is all fairly well known, but how is the situation with index entries?

Life, death and visibility in the index

To avoid redundancy and to keep index tuples small, the visibility information is not stored in the index.
The status of an index tuple is determined by the heap tuple it points to, and both are removed by VACUUM at the same time.

As a consequence, an index scan has to inspect the heap tuple to determine if it can “see” an entry. This is the case even if all the columns needed are in the index tuple itself. Even worse, this “heap access” will result in random I/O, which is not very efficient on spinning disks.

This makes index scans in PostgreSQL more expensive than in other database management systems that use a different architecture. To mitigate that, several features have been introduced over the years:

  • PostgreSQL 8.1 introduced the “bitmp index scan”. This scan method fi
[...]
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Foreign Key to partitioned table – part 3

planet postgresql - 2018-11-05(月) 03:33:28
Previously I tested performance of pl/PgSQL coded foreign keys to partitioned table. Now, let's see if I can make creation of them a bit easier. Using the same schema as before, I see that adding actual fkeys is pretty complicated. I need to create two separate functions, and four triggers, remembering what goes where. This … Continue reading "Foreign Key to partitioned table – part 3"
カテゴリー: postgresql

Andreas Scherbaum: Using Makefiles to build PostgreSQL

planet postgresql - 2018-11-04(日) 06:00:00

Andreas 'ads' Scherbaum

For a long time I was using a Makefile to quickly build, start, stop and then wipe a predefined PostgreSQL version. That comes handy if you just want to test something on an older version, without actually installing the software. Everything happens in a single directory, even a different port is assigned.

When I needed that setup recently, I ran into unrelated build errors:

relpath.c:21:10: fatal error: catalog/pg_tablespace_d.h: No such file or directory #include "catalog/pg_tablespace_d.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated.

Can't be - pg_tablespace_d.h is included in the tarball I'm using.

 

 

Continue reading "Using Makefiles to build PostgreSQL"
カテゴリー: postgresql

Abdul Yadi: pgAdmin3 Adjustment for PostgreSQL 11.0

planet postgresql - 2018-11-03(土) 15:01:57

What is my favourite PostgreSQL GUI-admin tool? pgAdmin3. I love its light weight user interface and simple navigation. Thanks to BigSQL Development Team for surviving the tool from freeze.

With PostgreSQL release 11.0, here is my patch file corresponding catalog table changes: pgadmin3-patch-text-file

First, clone pgAdmin3 project: clone git clone https://bitbucket.org/openscg/pgadmin3-lts.git

Then, apply the patch: patch -p0 -i [patch-text-file]

Oldies but goldies.

カテゴリー: postgresql

Rafia Sabih: My experience at PGConf Europe 2018

planet postgresql - 2018-11-02(金) 13:58:00
It was my first time at PGConf Europe this year, like many other firsts it was special, hence the blog.
Let's start with some of the basics, PostgreSQL conferences are held in a somewhat regional basis. There are many of them like,  PGConf India, PGConf USA, PGConf Europe, PGConf Asia, and then there are other one day events called PgDays. Coming back to PGConf Europe 2018,  it was organised from 23-26 October in Lisbon Marriott, Lisbon.
My talk 'Parallel Query in PG: how not to (mis)use it?' was scheduled on the first slot of last day. So, I had enough time to analyse and study the audience and prepare accordingly. But, first things first...
The conference started with a one day training session on 22 Oct, one has to buy different tickets for training and conference. You get a free registration for the conference only if you're the speaker. I wasn't part of the training session, hence will not be discussing anything about it. This was my day to rest and try the Portugal cuisine.
The next day was the start of the conference. It was opened by Magnus Hagander covering the logistics and introducing us to the conference halls, etc., must say it was one entertaining start. The next was the keynote by Paul Ramsey. The keynote was my first comprehensive introduction to PostGIS. Further, there was a nice snack buffet arranged in the lobby, and this was my time to know more people, the most exciting part of any conference. I happened to catch Tom Lane!
Henceforth, I was forced to take some difficult decisions like which talk to attend, since there were three parallel sessions going on. There was such a variety of areas covered in the conference and most of them have amazing presentations, that it made me greedy and hate the idea of parallel sessions.
To keep the discussion short, I enjoyed being exposed to some of the new areas and uses of postgres like, challenges of using postgres on cloud, multi-column indexes, pluggable storage, benchmarking,  efficient query planning in latest PG, new and old features of pos[...]
カテゴリー: postgresql

Bruce Momjian: Users vs. Developers

planet postgresql - 2018-11-01(木) 20:45:01

Some open source projects have a distinction between the developers of the open source software and its users. Since Postgres was originally developed in a university, and none of the university developers continued when Internet-based development started in 1996, all our active developers see themselves as stewards of code developed before we arrived. This causes a flatter organizational structure and helps to forge closer user/developer ties.

カテゴリー: postgresql

PostgreSQL10.5のドキュメント日本語版を公開 (2018-11-01)

www.postgresql.jp news - 2018-11-01(木) 10:29:09
PostgreSQL10.5のドキュメント日本語版を公開 (2018-11-01) kuwamura 2018/11/01 (木) - 10:29
カテゴリー: postgresql

Daniel Pocock: RHL'19 St-Cergue, Switzerland, 25-27 January 2019

planet postgresql - 2018-11-01(木) 06:06:34

(translated from original French version)

The Rencontres Hivernales du Libre (RHL) (Winter Meeting of Freedom) takes place 25-27 January 2019 at St-Cergue.

Swisslinux.org invites the free software community to come and share workshops, great meals and good times.

This year, we celebrate the 5th edition with the theme «Exploit».

Please think creatively and submit proposals exploring this theme: lectures, workshops, performances and other activities are all welcome.

RHL'19 is situated directly at the base of some family-friendly ski pistes suitable for beginners and more adventurous skiers. It is also a great location for alpine walking trails.

Why, who?

RHL'19 brings together the forces of freedom in the Leman basin, Romandy, neighbouring France and further afield (there is an excellent train connection from Geneva airport). Hackers and activists come together to share a relaxing weekend and discover new things with free technology and software.

If you have a project to present (in 5 minutes, an hour or another format) or activities to share with other geeks, please send an email to rhl-team@lists.swisslinux.org or submit it through the form.

If you have any specific venue requirements please contact the team.

You can find detailed information on the event web site.

Please ask if you need help finding accommodation or any other advice planning your trip to the region.

カテゴリー: postgresql

Craig Kerstiens: Materialized views vs. Rollup tables in Postgres

planet postgresql - 2018-11-01(木) 04:12:00

Materialized views were a long awaited feature within Postgres for a number of years. They finally arrived in Postgres 9.3, though at the time were limited. In Postgres 9.3 when you refreshed materialized views it would hold a lock on the table while they were being refreshed. If your workload was extremely busines hours based this could work, but if you were powering something to end-users this was a deal breaker. In Postgres 9.4 we saw Postgres achieve the ability to refresh materialized views concurrently. With this we now have fully baked materialized view support, but even still we’ve seen they may not always be the right approach.

What is a view?

For those of you that aren’t database experts we’re going to backup a little bit. To know what a materialized view is we’re first going to look at a standard view. A view is a defined query that you can query against as if it were a table. Views are especially helpful when you have complex data models that often combine for some standard report/building block. We’ll look at an example in just a moment as we get to a materialized views.

Views are great for simplifying copy/paste of complex SQL. The downside is that each time a view is executed the results are recomputed. For large datasets this can cause scanning of a lot of data, invalidate your cache, and in general just be slow. Enter materialized views

Materializing your views

Let’s start with an example schema that could contain a lot of raw data. In this case a very basic web analytics tool that records pageview, the time it occurred, and the session id of the user.

CREATE TABLE pageviews ( id bigserial, page text, occurred_at timestamptz, session_id bigint );

There are a number of different views that could be very common based on this raw data. And if we have a real-time dashboard we’re powering it can quickly become unfeasible to query this raw data as a query would take too long. Instead we can do some rollups with materialized views:

CREATE MATERIALIZED VIEW rollups AS SELECT date_trunc('day') a[...]
カテゴリー: postgresql

Sebastian Insausti: How to Create a Single Endpoint for your PostgreSQL Replication setup using HAProxy

planet postgresql - 2018-10-31(水) 23:41:48

Managing traffic to the database can get harder and harder as it increases in amount and the database is actually distributed across multiple servers. PostgreSQL clients usually talk to a single endpoint. When a primary node fails, the database clients will keep retrying the same IP. In case you have failed over to a secondary node, the application needs to be updated with the new endpoint. This is where you would want to put a load balancer between the applications and the database instances. It can direct applications to available/healthy database nodes and failover when required. Another benefit would be to increase read performance by using replicas effectively. It is possible to create a read-only port that balances reads across replicas. In this blog, we will cover HAProxy. We’ll see what is, how it works and how to deploy it for PostgreSQL.

What is HAProxy?

HAProxy is an open source proxy that can be used to implement high availability, load balancing and proxying for TCP and HTTP based applications.

As a load balancer, HAProxy distributes traffic from one origin to one or more destinations and can define specific rules and/or protocols for this task. If any of the destinations stops responding, it is marked as offline, and the traffic is sent to the rest of the available destinations.

How to install and configure HAProxy manually

To install HAProxy on Linux you can use the following commands:

On Ubuntu/Debian OS:

$ apt-get install haproxy -y

On CentOS/RedHat OS:

$ yum install haproxy -y

And then we need to edit the following configuration file to manage our HAProxy configuration:

$ /etc/haproxy/haproxy.cfg

Configuring our HAProxy is not complicated, but we need to know what we are doing. We have several parameters to configure, depending on how we want HAProxy to work. For more information, we can follow the documentation about the HAProxy configuration.

Let's look at a basic configuration example. Suppose that you have the following database topology:

Database Topology Example

We want to create a HA

[...]
カテゴリー: postgresql

Rafia Sabih: Using parallel sequential scan in PostgreSQL

planet postgresql - 2018-10-31(水) 22:29:00
Parallel sequential scan is the first parallel access method in PostgreSQL and is introduced in version 9.6.  The committer of this feature and my colleague at EnterpriseDB Robert Haas wrote an awesome blog on it, there is another great blog by another PostgreSQL committer and my colleague Amit Kapila. Both of these blogs explain this access method, its design, usage, and related parameters. 
Still, I could not help but notice that there are curiosities around the usage of this access method. Every now and then I could see a complaint saying parallel sequential scan is not getting selected or it is degrading the performance of a query.  So, I decided to write this blog to cater more practical scenarios and specifically focus on its less talked about aspect  -- where parallel sequential scan would (should) not improve the performance.
Before diving into the details of parallel SeqScan, let's first understand the basic infrastructure and terminology related to it in PostgreSQL. The processes that run in parallel and scan the tuples of a relation are called parallel workers or workers in short. There is one special worker namely leader which co-ordinates and collects the output of the scan from each  of the worker. This worker may or may not participate in scanning the relation depending on it's load in dividing and combining processes. End users can also control the involvement of leader in relation scan by GUC parameter parallel_leader_participation, it is a boolean parameter. 
Now, let's understand the concept of parallel scan in PostgreSQL by a simple example.
  • Let there be a table T (a int, b int) containing 100 tuples
  • Let's say we have two workers and one leader,
  • Cost of scanning one tuple is 10
  • Cost of communicating a tuple from worker to leader is 20
  • Cost of dividing the tuples among workers is 30
  • For simplicity, let's assume that leader gives 50 tuples to each of the worker
Now, let's analyse if parallel scan will be faster than non parallel scan,
Cost of SeqScan = 10*100 = 1000
Cost of Parallel SeqSc[...]
カテゴリー: postgresql

Liaqat Andrabi: Webinar: PostgreSQL is NOT your traditional SQL database [Follow Up]

planet postgresql - 2018-10-31(水) 16:00:58

PostgreSQL is referred to as “The world’s most advanced open source database” – but what does PostgreSQL have that other open source relational databases don’t?  

2ndQuadrant recently hosted a webinar on this very topic: PostgreSQL is NOT your traditional SQL database, presented by Gülçin Yıldırım Jelínek, Cloud Services Manager at 2ndQuadrant.

The recording of the webinar is now available here.

Questions that Gülçin couldn’t respond to during the live webinar have been answered below.

Q1: What exactly is the role of postgresql for a marketplace like ebay or rakuten?

A1: This question is not very clear. If the question is about whether Postgres can be used in an e-commerce website, the answer is yes.

 

Q2: I’m in process of switching from MS SQL Server to Postgres and I have an issue:
Simple search in text columns with diacritics chart.

Ex: table Person
Name
——-
Ștefan
ștefan
Stefan

When I search:

SELECT * FROM pers WHERE Name LIKE 'ste%';

I want to retrieve all records from above.
In SQL Server there’s a simple way to accomplish this – I use: COLLATE Latin1_General_100_CI_AI when I define column, and that it’s.
Do you have recommendations to accomplish the same task in Postgres?

A2: The unaccenting collations are not supported in PostgreSQL. You can query like this to get the same result:

SELECT * FROM pers WHERE unaccent(Name) ILIKE 'ste%';


For any questions, comments, or feedback, please visit our website or send an email to webinar@2ndquadrant.com.

カテゴリー: postgresql

Quinn Weaver: Remember your history

planet postgresql - 2018-10-31(水) 14:56:00
PostgreSQL keeps track of which WAL files go with which timelines in small history files. Each time you make a base backup, a history file is born. The file is written once and never updated. It's a simple system, and it works well and silently.

In fact, sometimes it works a little too silently.

At PostgreSQL Experts we've run into the problem where a client's history files disappear because they are stored in S3, and there's a lifecycle configuration in place that says to move everything over a certain age to Glacier. That's a good policy for WAL files!

Unfortunately, it's not a good policy for history files: without the latest history file you can't restore the latest backup, and without past history files, you are unable to do PITR to certain points in time.

The solution we used was to move the whole WAL archive to S3 Standard-Infrequent Access storage, dissolving the problem with lifecycle configurations while controlling costs. But you could also fix this by editing the lifecycle configuration.

The important thing is this: hold on to all history files. They're tiny text files, and when you need them, you really need them. This is also a good reason to test restores, not just of the latest backup, but of database states at arbitrary points in time.

*    *    *
Addendum: another very common problem we see is WAL archives that become corrupted because a client accidentally pointed a two primaries at the same WAL archive (for instance, they might have copied a postgresql.conf file by hand, or via a DevOps tool like Puppet). In this case, the whole archive is corrupted, and you're best off starting with a fresh S3 bucket or an empty directory and doing a new base backup immediately.

One of the many nice features of pgBackRest is that it will notice this and prevent you from doing it. Fewer footguns → better backups.
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Foreign Key to partitioned table – part 2

planet postgresql - 2018-10-31(水) 10:45:14
Previously I wrote about how to create foreign key pointing to partitioned table. Final solution in there required four separate functions and four triggers for each key between two tables. Let's see how fast it is, and if it's possible to make it simpler. First bit – performance test (for making it simpler you will … Continue reading "Foreign Key to partitioned table – part 2"
カテゴリー: postgresql

Luca Ferrari: pgenv gets patching support

planet postgresql - 2018-10-31(水) 09:00:00

pgenv does now support a customizable patching feature that allows the user to define which patches to apply when an instance is built.

pgenv gets patching support

pgenv, the useful tool for managing several PostgreSQL installations, gets support for customizable patching.

What is all about? Well, it happens that you could need to patch PostgreSQL source tree before you build, and it could be because something on your operating system is different than the majority of the systems PostgreSQL is built against. Nevermind, you need to patch it!

pgenv did support a very simple patching mechanism hardcoded within the program itself, but during the last days I worked on a different and more customizable approach. The idea is simple: the program will apply every patch file listed in an index for the particular version. So, if you want to build the outshining 11.0 and need to patch it, build an index text file and list there all the patches, and the pgenv build process will apply them before compiling.

Of course, what if you need to apply the same patches over and over to different versions? You will end up with several indexes, one for each version you need to patch. Uhm…not so smart! To avoid this, I designed the patching index selection in a way that allows you to group patches for operating system and brand.

Allow me to explain more in detail with an example. Suppose you are on a Linux machine and need to patch version 11.0: the program will search for a file that matches any of the following:

$PGENV_ROOT/patch/index/patch.11.0.Linux $PGENV_ROOT/patch/index/patch.11.0 $PGENV_ROOT/patch/index/patch.11.Linux $PGENV_ROOT/patch/index/patch.11

This desperate searching for works selecting the first file that matches the operating system and PostgreSQL version or a combination of the two...

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add pg_partition_tree to display information about partitions

planet postgresql - 2018-10-31(水) 05:33:03
On 30th of October 2018, Michael Paquier committed patch: Add pg_partition_tree to display information about partitions     This new function is useful to display a full tree of partitions with a partitioned table given in output, and avoids the need of any complex WITH RECURSIVE query when looking at partition trees which are deep … Continue reading "Waiting for PostgreSQL 12 – Add pg_partition_tree to display information about partitions"
カテゴリー: postgresql

Nickolay Ihalainen: PostgreSQL locking, part 3: lightweight locks

planet postgresql - 2018-10-30(火) 22:31:40

PostgreSQL lightweight locks, or LWLocks, control memory access. PostgreSQL uses multi-process architecture and should allow only consistent reads and writes to shared memory structures. LWLocks have two levels of locking: shared and exclusive. It’s also possible to release all acquired LWLocks to simplify clean up. Other databases often call primitives similar to LWLocks “latches”. Because LWLocks is an implementation detail, application developers shouldn’t pay much attention to this kind of locking.

This is the third and final part of a series on PostgreSQL locking, related to latches protecting internal database structures. Here are the previous parts: Row-level locks and table-level locks.

Instrumentation

Starting from PostgreSQL 9.6, LWLocks activity can be investigated with the pg_stat_activity system view. It could be useful under high CPU utilization. There are system settings to help with contention on specific lightweight locks.

Before PostgreSQL 9.5, the LWLocks implementation used spin-locks.  It was a bottleneck. This was fixed in 9.5 with atomic state variable.

Potential heavy contention places
  • WALInsertLock: protects WAL buffers. You can increase the number of wal buffers to get a slight improvement. Incidentally, synchronous_commit=off increases pressure on the lock even more, but it’s not a bad thing. full_page_writes=off reduces contention, but it’s generally not recommended.
  • WALWriteLock: accrued by PostgreSQL processes while WAL records are flushed to disk or during a WAL segments switch. synchronous_commit=off removes the wait for disk flush, full_page_writes=off reduces the amount of data to flush.
  • LockMgrLock: appears in top waits during a read-only workload. It latches relations regardless of its size. It’s not a single lock, but at least 16 partitions. Thus it’s important to use multiple tables during benchmarks and avoid single table anti-pattern in production.
  • ProcArrayLock: Protects the ProcArray structure. Before PostgreSQL 9.0, every transaction acquired this lock exclusively befo
[...]
カテゴリー: postgresql

Craig Kerstiens: The biggest mistake Postgres ever made

planet postgresql - 2018-10-30(火) 22:30:34

Postgres has experienced a long and great run. It’s over 20 years old and has a track record of being safe and reliable (which is the top thing I care about in a database). In recent years it’s become more cool with things like JSONB, JIT support, and a powerful extension ecosystem. But, Postgres has made some mistakes along the way, the most notable being it’s name.

Postgres gets its name from Ingress. Ingress was one of the first databases and was lead by Michael Stonebreaker who won a Turing award for Postgres and other works. Ingress began in the early 70s at UC Berkeley, which is still to this day known as a top university when it comes to databases. Out of Ingress came a number of databases you’ll still know today such as SQL Server and Sybase. It also as you may have guessed by now spawned Postgres which means Post-Ingress.

In the early days of Postgres there was no SQL. No not NoSQL, there was not SQL. Postgres had it’s own query language. It wasn’t until 1995 that Postgres received SQL support, and with its addition of SQL support it updated it’s name to PostgreSQL.

You see, with Postgres becoming PostgreSQL we began a journey of Postgres being mispronounced for it’s forseeable future and it is still currently the case. Is it really that big of an issue? Well it’s big enough that the PostgreSQL website has a FAQ including “How to pronounce PostgreSQL”. As it stands today there are two generally accepted names:

  • post-GRES-que-ell
  • Postgres

With one of the above there is far less confusion. And in fact I’m not the only one to share this opinion. Tom Lane is a major contributor to every Postgres release for more than the last decade. He’s one of the top 10 contributors to open source in general having worked on the JPEG/PNG/TIFF image formats before coming over to database land. Tom has this classic email in the PostgreSQL mailing list:

```

[>> Can i get data in postgre from non-postgre db? > The name is PostgreSQL or Postgres, not postgre. It might help to explain that the pronunciation is "post-g[...]
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Add pg_promote function

planet postgresql - 2018-10-30(火) 22:08:49
On 25th of October 2018, Michael Paquier committed patch: Add pg_promote function     This function is able to promote a standby with this new SQL-callable function. Execution access can be granted to non-superusers so that failover tools can observe the principle of least privilege.   Catalog version is bumped.   Author: Laurenz Albe   … Continue reading "Waiting for PostgreSQL 12 – Add pg_promote function"
カテゴリー: postgresql

Kaarel Moppel: Peeking at query performance of the upcoming version 11 of PostgreSQL

planet postgresql - 2018-10-30(火) 18:00:25

Last weeks the first release candidate of the upcoming Postgres version 11 was released, stating that everything is going more or less as planned – great news! The 11 release is promising a lot of very cool stuff as usual, including – better partitioning, more parallization and JIT capabilities (which are not activated by default though). Besides those headline features there of course hundreds or rather thousands (`git diff –shortstat REL_10_5..REL_11_RC1` says “3192 files changed, 324206 insertions, 178499 deletions”) of smaller changes with some of them for sure boosting performance also on some typical queries. Release notes didn’t state any numerical performance indicators though as usually, so people have to find it out on their own – so that’s what this blog post is about. I’m basically running 4 quite simple analytical/aggregate type of queries and laying out the numbers for you to evaluate – so jump to the summarizing table end of the post or get into the details by reading on.

Test Queries

3 out of 4 test queries I actually just carried over from the last year’s similar test for 9.6 vs 10 (check it out here if you’re into archeology) and added one query to test index traversing performance. Queries are all tied to the schema generated by our good old friend pgbench, with one minor adjustment – creating a copy of the pgbench_accounts table be able to simulate a join on two 1 million rows tables. This time I also didn’t disable the parallel features as they’re “on” by default for both versions and thus won’t be probably touched for most setups.

/* Sum up 50mio rows */ SELECT sum(abalance) FROM pgbench_accounts CROSS JOIN generate_series(1, 5) /* Unique columns joining */ SELECT count(*) FROM pgbench_accounts JOIN pgbench_accounts_copy using (aid) /* Using Grouping Sets analytical feature */ SELECT count(*) FROM (SELECT aid, bid, count(*) FROM pgbench_accounts GROUP BY CUBE (aid, bid)) a; /* B-tree index traversing */ SELECT COUNT(DISTINCT aid) FROM pgbench_accounts where aid % 2 = 0;

For pbench “sc

[...]
カテゴリー: postgresql

ページ