Bruce Momjian: Updated Sharding Presentation

planet postgresql - 2019-06-01(土) 01:00:00

I presented my sharding talk today at PGCon in Ottawa. The slides have been updated to more clearly show what has been accomplished toward the goal of built-in sharding, and what remains to be done. The talk was well attended. I also attended a breakfast meeting this morning about sharding.

カテゴリー: postgresql

Álvaro Hernández: What it is PostgreSQL Ibiza and why you should attend

planet postgresql - 2019-05-31(金) 16:49:11
PostgreSQL for Thinkers Please check PostgreSQL Ibiza’s conference website. If you haven’t done so, please also take two minutes to watch our conference video. Now if you think that PostgreSQL Ibiza is another PostgreSQL Conference, just on the beach (which is not that bad anyway!), then keep reading. Because it is much more than that. PostgreSQL Ibiza is a new, disruptive PostgreSQL conference. We are all in a rush. I have been to many dozens of PostgreSQL conferences in the last decade.
カテゴリー: postgresql

Luca Ferrari: Normalize to save space

planet postgresql - 2019-05-31(金) 09:00:00

It is no surprise at all: a normalized database requires less space on disk than a not-normalized one.

Normalize to save space

Sometimes you get a database that Just Works (tm) but its data is not normalized. I’m not a big fan of data normalization, I mean it does surely matter, but I don’t tend to “over-normalize” data ahead of design. However, one of my database was growing more and more because of a table with a few repeated extra information.
Of course a normalized database gives you some more disk space at the cost of the joins during query execution, but having a decent server and a small join table is enough to sleep at night!
Let’s see what we are talking about:

mydb=# select pg_size_pretty( pg_database_size( 'mydb' ) ); pg_size_pretty ---------------- 13 GB (1 row)

Ok, 13 GB is not something scarying, let’s say it is a fair database to work on (please note the size if reported after a full VACUUM). In such database, I’ve a table root that handles a lot of data from hardware sensors; such table is of course partitioned on a time base scale. One thing the table was storing was information about the sensor name, a text string repeated over and over on child tables too. While this was not a problem in the beginning, it was wasting space over time.

Shame on me!

Let’s go normalize the table!
Normalizing a table is quite straightforward, and I’m not interesting in sharing details here. Let’s say this was quite easy because my...

カテゴリー: postgresql

Dimitri Fontaine: Introducing pg_auto_failover: A high availability and automated failover Postgres extension

planet postgresql - 2019-05-30(木) 23:02:00

As part of the Citus team (Citus scales out Postgres horizontally, but that’s not all we work on), I’ve been working on pg_auto_failover for quite some time now and I’m excited that we have now introduced pgautofailover as Open Source, to give you automated failover and high availability!

When designing pg_auto_failover, our goal was this: to provide an easy to set up Business Continuity solution for Postgres that implements fault tolerance of any one node in the system. The documentation chapter about the pgautofailover architecture includes the following:

It is important to understand that pgautofailover is optimized for Business Continuity. In the event of losing a single node, then pgautofailover is capable of continuing the PostgreSQL service, and prevents any data loss when doing so, thanks to PostgreSQL Synchronous Replication.

Introduction to pgautofailover

The pg_auto_failover solution for Postgres is meant to provide an easy to setup and reliable automated failover solution. This solution includes software driven decision making for when to implement failover in production.

The most important part of any automated failover system is the decision making policy, and we have a whole documentation chapter online about pgautofailover faul tolerance mechanisms.

When using pgautofailover, multiple active agents are deployed to keep track of your production Postgres setup properties:

  • the monitor, a Postgres database itself equipped with the pg_auto_failover extension, registers and checks the health of the active Postgres nodes.

  • each Postgres node that is registered in the pg_auto_failover monitor must also run a local agent, the pg_autoctl run service.

  • each Postgres service that is managed has two Postgres nodes set up together in the same group. A single monitor setup may manage as many Postgres groups as needed.

With such a deployment, the monitor connects to every registered node on a regular schedule (20s by default) and registers success or failure in its pgautofailover.node table.

In additio

カテゴリー: postgresql

Dave Page: Avoiding Gmail's confidential mode

planet postgresql - 2019-05-30(木) 06:11:00
So this is one of the very few (maybe the first?) blog entries I've written that aren't directly related to PostgreSQL, however, it does affect how I (and others) may work on the project.

Last night I received email from Google about my personal G Suite account which I use for all my day-to-day email, which for the most part is related to work on pgAdmin and PostgreSQL. Google were proudly announcing the rollout of their new Gmail Confidential Mode update. If you've not come across this yet, then essentially what it does is allow users to send emails that can be deleted or expired after a certain amount of time, optionally require SMS verification to open them, and prevent printing (but not screen-shots of course), forwarding or downloading etc.

When using the Gmail web interface, this all works fairly seamlessly. I can see why some people would want it if that's all they use, however, like many people, I also use other clients, for example, via IMAP. In that case, instead of the original email Gmail sends a placeholder email to replace the actual message which contains a link to allow you to login to Google and view the message online (assuming the SMS verification passes and the message hasn't been deleted or expired of course). That's going to be quite inconvenient to me, besides which, I really don't want anyone to be able to control access to emails they've sent me, after I've received them.

There's another problem affecting PostgreSQL's mailing lists however. How long will it be until someone sends such a message to one of the PostgreSQL lists, where it will do nothing but clutter up the archives and annoy other users (who won't be able to read the message anyway as they won't be able to login to Google as or whatever the list address was)?

Fixing the PostgreSQL mail servers After some discussion with some of the PostgreSQL sysadmin team, we discovered that Gmail adds a header to the messages that have confidential mode enabled (X-Gm-Locker: <token>). This is easy for us to [...]
カテゴリー: postgresql

elein mustain: Swoop de Dupe

planet postgresql - 2019-05-30(木) 01:31:36
The problem: duplicate rows Once upon a time, there was a database person who knows about referential integrity and its importance for creating database tables.  The interesting part  is that referential integrity, including primary keys, enable keeping with the standard of not having duplicate rows. However, they just wanted a private little table on their […]
カテゴリー: postgresql

Luca Ferrari: PostgreSQL is almost the best (according to Stack Overflow Survery)

planet postgresql - 2019-05-29(水) 09:00:00

Stack Overflow 2019 Suvery results are available, and PostgreSQL is almost leading in the database field.

PostgreSQL is almost the best (according to Stack Overflow Survery)

According to the 2019 suvery made by Stack Overflow and available here, PostgreSQL is the second top database, slightly ahead of Microsoft SQL Server and cleary ahead of Oracle. And this is true both for community and professional users that take the survey.

PostgreSQL is keeping its high position year after year and this means that the database is growing as a professional choice. In particular, in the professional users’ opinion PostgreSQL is more used and MySQL and MS SQL loose some points.

カテゴリー: postgresql

Dave Cramer: Deploying Active-Active PostgreSQL on Kubernetes

planet postgresql - 2019-05-29(水) 05:06:15

Kubernetes is a very popular container orchestration framework. I’ll show you how to get Symmetric-DS working on a single Kubernetes instance.

I had previously explored how to build an active-active PostgreSQL cluster using Symmetric-DS.  The steps are essentially the same on Kubernetes:

  1. Start 2 PostgreSQL pods
  2. Create a user and a database on each pod/instance
  3. Start  the primary symmetric-ds pod
  4. Add the symmetric-ds triggers and routes
  5. Open registration
  6. Start the secondary symmertic-ds pod

However, there are some interesting nuances I discovered while building out this setup, which I will discuss below. Before diving into the details, I'd also like to credit my colleagues Patrick McLaughlin and Simon Nielly as co-authors for helping me to put together this guide.

Anyway, let's explore how to deploy an active-active PostgreSQL cluster on Kubernetes.

カテゴリー: postgresql

Umair Shahid: Postgres is the coolest database – Reason #4: It is extendable

planet postgresql - 2019-05-28(火) 17:50:10
PostgreSQL is packed with features. What may not be part of the core is available as extensions. What are extensions – you ask? PostgreSQL exposes APIs that are designed to easily allow external programs to load into the database and function just like core features. So if you find that you need a feature in […]
カテゴリー: postgresql

Paolo Melchiorre: Upgrading PostgreSQL from version 10 to 11 on Ubuntu 19.04 (Disco Dingo)

planet postgresql - 2019-05-28(火) 07:00:00

Howto guide for upgrading PostgreSQL from version 10 to 11 on Ubuntu, after its upgrade from version 18.10 to 19.04.

カテゴリー: postgresql

Paul Ramsey: Parallel PostGIS and PgSQL 12

planet postgresql - 2019-05-28(火) 01:00:00

For the last couple years I have been testing out the ever-improving support for parallel query processing in PostgreSQL, particularly in conjunction with the PostGIS spatial extension. Spatial queries tend to be CPU-bound, so applying parallel processing is frequently a big win for us.

Initially, the results were pretty bad.

  • With PostgreSQL 10, it was possible to force some parallel queries by jimmying with global cost parameters, but nothing would execute in parallel out of the box.
  • With PostgreSQL 11, we got support for parallel aggregates, and those tended to parallelize in PostGIS right out of the box. However, parallel scans still required some manual alterations to PostGIS function costs, and parallel joins were basically impossible to force no matter what knobs you turned.

With PostgreSQL 12 and PostGIS 3, all that has changed. All standard query types now readily parallelize using our default costings. That means parallel execution of:

  • Parallel sequence scans,
  • Parallel aggregates, and
  • Parallel joins!!

PostgreSQL 12 and PostGIS 3 have finally cracked the parallel spatial query execution problem, and all major queries execute in parallel without extraordinary interventions.

What Changed

With PostgreSQL 11, most parallelization worked, but only at much higher function costs than we could apply to PostGIS functions. With higher PostGIS function costs, other parts of PostGIS stopped working, so we were stuck in a Catch-22: improve costing and break common queries, or leave things working with non-parallel behaviour.

For PostgreSQL 12, the core team (in particular Tom Lane) provided us with a sophisticated new way to add spatial index functionality to our key functions. With that improvement in place, we were able to globally increase our function costs without breaking existing queries. That in turn has signalled the parallel query planning algorithms in PostgreSQL to parallelize spatial queries more aggressively.


In order to run these tests yourself, you will need:

  • PostgreSQL 12
  • PostGIS 3
カテゴリー: postgresql

Chris Travers: Table Inheritance: What's it Good For?

planet postgresql - 2019-05-26(日) 17:03:00
Table inheritance is one of the most misunderstood -- and powerful -- features of PostgreSQL.  With it, certain kinds of hard problems become easy.  While many folks who have been bitten by table inheritance tend to avoid the feature, this blog post is intended to provide a framework for reasoning about when table inheritance is actually the right tool for the job.

Table inheritance is, to be sure, a power tool and thus something to use only when it brings an overall reduction in complexity to the design.  Moreover the current documentation doesn't provide a lot of guidance regarding what the tool actually helps with and where are the performance costs and because inheritance sits orthogonal to relational design, working this out individually is very difficult.

This blog post covers uses of table inheritance which simplify overall database design and are not addressed by declarative partitioning, because they are used in areas other than table partitioning.

Table Inheritance Explained PostgreSQL provides the ability for tables to exist in an inheritance directed acyclic graph.  Columns provided by parent tables are merged in name and type into the child table.  Altering a parent table and adding a column thus cascades this operation to all child tables, though if any child table has a column with the same name and different type, the operation will fail.
Inheritance, Tables, and Types Every table in PostgreSQL has a corresponding campsite type, and any table can be implicitly cast to any parent table.  This is transitive.  Combined with tuple processing functions, this gives you a number of very powerful ways of working with data at various different levels of scale.
Indexes and foreign keys are not inherited.  Check constraints are inherited unless set to NO INHERIT.
Inheritance and Querying When a table is queried, by default all child tables are also queried and their results appended to the result.  Because of exclusion constraint processing, this takes out an ACCESS SHARE lock on all child tables at pl[...]
カテゴリー: postgresql

Regina Obe: PostGIS 3.0.0alpha1

planet postgresql - 2019-05-26(日) 09:00:00

The PostGIS development team is pleased to release PostGIS 3.0.0alpha1.

This release requires PostgreSQL 9.5-12beta1 and GEOS >= 3.6

Best served with PostgreSQL 12beta1.

Continue Reading by clicking title hyperlink ..
カテゴリー: postgresql

Fernando Laudares Camargos: An Overview of Sharding in PostgreSQL and How it Relates to MongoDB’s

planet postgresql - 2019-05-24(金) 17:44:38

A couple of weeks ago I presented at Percona University São Paulo about the new features in PostgreSQL that allow the deployment of simple shards. I’ve tried to summarize the main points in this post, as well as providing an introductory overview of sharding itself. Please note I haven’t included any third-party extensions that provide sharding for PostgreSQL in my discussion below.

Partitioning in PostgreSQL

In a nutshell, until not long ago there wasn’t a dedicated, native feature in PostgreSQL for table partitioning. Not that that prevented people from doing it anyway: the PostgreSQL community is very creative. There’s a table inheritance feature in PostgreSQL that allows the creation of child tables with the same structure as a parent table. That, combined with the employment of proper constraints in each child table along with the right set of triggers in the parent table, has provided practical “table partitioning” in PostgreSQL for years (and still works). Here’s an example:

Using table inheritance CREATE TABLE temperature ( id BIGSERIAL PRIMARY KEY NOT NULL, city_id INT NOT NULL, timestamp TIMESTAMP NOT NULL, temp DECIMAL(5,2) NOT NULL );

Figure 1a. Main (or parent) table

CREATE TABLE temperature_201901 (CHECK (timestamp >= DATE '2019-01-01' AND timestamp <= DATE '2019-01-31')) INHERITS (temperature); CREATE TABLE temperature_201902 (CHECK (timestamp >= DATE '2019-02-01' AND timestamp <= DATE '2019-02-28')) INHERITS (temperature); CREATE TABLE temperature_201903 (CHECK (timestamp >= DATE '2019-03-01' AND timestamp <= DATE '2019-03-31')) INHERITS (temperature);

Figure 1b. Child tables inherit the structure of the parent table and are limited by constraints

CREATE OR REPLACE FUNCTION temperature_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.timestamp >= DATE '2019-01-01' AND NEW.timestamp <= DATE '2019-01-31' ) THEN INSERT INTO temperature_201901 VALUES (NEW.*); ELSIF ( NEW.timestamp >= DATE '2019-02-01' AND NEW.timestamp <= DATE '2019-02-28' ) THEN INSERT INTO temperature_201902 [...]
カテゴリー: postgresql

Lukas Fittl: Managing multiple databases in Rails 6

planet postgresql - 2019-05-24(金) 00:30:00

If you’ve worked with Ruby on Rails you likely have some understanding of how your database works with Rails, traditionally that has always meant specifying a single database per environment in your config/database.yml, possibly together with an environment setting like DATABASE_URL. Based on that configuration all reads and writes will access the database.

With Rails 6 this is about to change, thanks to the work of Eileen M. Uchitelle together with contributors from GitHub, Basecamp and Shopify. In the upcoming Rails 6 (currently in RC1), you will be able to easily change which database server you are connecting to, to support a variety of scenarios such as using read replicas and splitting your database into dedicated components.

The most interesting part, which we wanted to detail in this post, is related to configuring automatic queries against a read replicas, or follower database.

First, let’s see how we can configure an additional database in Rails 6. In your config/database.yml, you can now specify multiple connections like this:

production: primary: <<: *defaults url: <%= ENV['DATABASE_URL'] %> follower: <<: *defaults url: <%= ENV['FOLLOWER_DATABASE_URL'] %>

Next, we configure our model to use this new database connection. There are two default connection types, called “reading” and “writing”, for which we’d want to specify the database it should connect to:

class ApplicationRecord < ActiveRecord::Base connects_to database: { writing: :primary, reading: :follower } end

Now, this doesn’t actually change anything yet. By default all queries would still go to the writing database, even if they are read queries.

First, if we want to verify this is working manually, we can enforce the use of the read database like this:

ActiveRecord::Base.connected_to(role: :reading) do puts MyObject.count end

But that would be a lot of work to modify our application code. Instead, we can tell Rails to automatically utilize the reading role for GET and HEAD requests (which are not sup

カテゴリー: postgresql

第 161 回理事会議事録 (2019-5) news - 2019-05-23(木) 19:14:22
第 161 回理事会議事録 (2019-5) anzai 2019/05/23 (木) - 19:14
カテゴリー: postgresql

2019-02-02 第36回 PostgreSQL 勉強会 news - 2019-05-23(木) 15:20:04
2019-02-02 第36回 PostgreSQL 勉強会 sakata.tetsuo 2019/05/23 (木) - 15:20
カテゴリー: postgresql

Pavel Stehule: new bugfix release of plpgsql_check

planet postgresql - 2019-05-21(火) 14:23:00
Last week I fixed some critical bugs on profiler integrated to plpgsql_check.

Now, integrated profiler is tested on real bigger project.
カテゴリー: postgresql

Dave Conlin: Multi-column indexes

planet postgresql - 2019-05-20(月) 21:08:11

This post covers standard Postgres B-tree indexes. To make it easier to read, I’ve just referred to “indexes” throughout, but it’s worth bearing in mind that the other Postgres index types behave very differently.

At pgMustard, we’ve been looking into multi-column indexes recently: particularly what problems they can cause and what advice we can offer to fix those problems. So I thought that now would be a good time to talk about their disadvantages, how to tell if they’re letting you down, and the alternatives.

In my introduction to indexing, I talked about multi-column indexes and how useful they can be if you have an expensive sort that you want to prepare in advance. They can also be used to speed up retrieving rows that match a filter or constraint, but unfortunately it’s not as simple as “I need to query on columns a, b and c, so I’ll make a multi-column index on a, b and c”.

Photo by Erol Ahmed When multi-column indexes fall flat

Suppose I have a table full of information on fiction books and I want to query it by author. Then you can imagine my index is a filing cabinet full of index cards, organised in alphabetical order by author, with dividers that make it easy to jump to any point in the catalogue. So if I want to find all the John le Carré books, I look him up under “C” and find a list of excellent spy novels. Great.

Now suppose I want to add another column to my index — publication date. If I add it as a second column after the author, then this is similar to sorting all the cards in my filing cabinet, first by author, and then by publication date within each author.

This works fine if I want to know, for example, all the Ursula K. le Guin books published after 1970. Turn to “Guin, Ursula K. le”, and then within that section, take all the cards that are after the first of January 1970. Sweet.

But if I want to know which books in my library were published before 1700? I have to go through every single author and look up which of their books are pre-1700. The index won’t save us much time, in fact

カテゴリー: postgresql