planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 5分 11秒 前

Luca Ferrari: Normalize to save space

2019-05-31(金) 09:00:00

It is no surprise at all: a normalized database requires less space on disk than a not-normalized one.

Normalize to save space

Sometimes you get a database that Just Works (tm) but its data is not normalized. I’m not a big fan of data normalization, I mean it does surely matter, but I don’t tend to “over-normalize” data ahead of design. However, one of my database was growing more and more because of a table with a few repeated extra information.
Of course a normalized database gives you some more disk space at the cost of the joins during query execution, but having a decent server and a small join table is enough to sleep at night!
Let’s see what we are talking about:

mydb=# select pg_size_pretty( pg_database_size( 'mydb' ) ); pg_size_pretty ---------------- 13 GB (1 row)

Ok, 13 GB is not something scarying, let’s say it is a fair database to work on (please note the size if reported after a full VACUUM). In such database, I’ve a table root that handles a lot of data from hardware sensors; such table is of course partitioned on a time base scale. One thing the table was storing was information about the sensor name, a text string repeated over and over on child tables too. While this was not a problem in the beginning, it was wasting space over time.

Shame on me!

Let’s go normalize the table!
Normalizing a table is quite straightforward, and I’m not interesting in sharing details here. Let’s say this was quite easy because my...

カテゴリー: postgresql

Dimitri Fontaine: Introducing pg_auto_failover: A high availability and automated failover Postgres extension

2019-05-30(木) 23:02:00

As part of the Citus team (Citus scales out Postgres horizontally, but that’s not all we work on), I’ve been working on pg_auto_failover for quite some time now and I’m excited that we have now introduced pgautofailover as Open Source, to give you automated failover and high availability!

When designing pg_auto_failover, our goal was this: to provide an easy to set up Business Continuity solution for Postgres that implements fault tolerance of any one node in the system. The documentation chapter about the pgautofailover architecture includes the following:

It is important to understand that pgautofailover is optimized for Business Continuity. In the event of losing a single node, then pgautofailover is capable of continuing the PostgreSQL service, and prevents any data loss when doing so, thanks to PostgreSQL Synchronous Replication.

Introduction to pgautofailover

The pg_auto_failover solution for Postgres is meant to provide an easy to setup and reliable automated failover solution. This solution includes software driven decision making for when to implement failover in production.

The most important part of any automated failover system is the decision making policy, and we have a whole documentation chapter online about pgautofailover faul tolerance mechanisms.

When using pgautofailover, multiple active agents are deployed to keep track of your production Postgres setup properties:

  • the monitor, a Postgres database itself equipped with the pg_auto_failover extension, registers and checks the health of the active Postgres nodes.

  • each Postgres node that is registered in the pg_auto_failover monitor must also run a local agent, the pg_autoctl run service.

  • each Postgres service that is managed has two Postgres nodes set up together in the same group. A single monitor setup may manage as many Postgres groups as needed.

With such a deployment, the monitor connects to every registered node on a regular schedule (20s by default) and registers success or failure in its pgautofailover.node table.

In additio

[...]
カテゴリー: postgresql

Dave Page: Avoiding Gmail's confidential mode

2019-05-30(木) 06:11:00
So this is one of the very few (maybe the first?) blog entries I've written that aren't directly related to PostgreSQL, however, it does affect how I (and others) may work on the project.

Last night I received email from Google about my personal G Suite account which I use for all my day-to-day email, which for the most part is related to work on pgAdmin and PostgreSQL. Google were proudly announcing the rollout of their new Gmail Confidential Mode update. If you've not come across this yet, then essentially what it does is allow users to send emails that can be deleted or expired after a certain amount of time, optionally require SMS verification to open them, and prevent printing (but not screen-shots of course), forwarding or downloading etc.

When using the Gmail web interface, this all works fairly seamlessly. I can see why some people would want it if that's all they use, however, like many people, I also use other clients, for example, via IMAP. In that case, instead of the original email Gmail sends a placeholder email to replace the actual message which contains a link to allow you to login to Google and view the message online (assuming the SMS verification passes and the message hasn't been deleted or expired of course). That's going to be quite inconvenient to me, besides which, I really don't want anyone to be able to control access to emails they've sent me, after I've received them.

There's another problem affecting PostgreSQL's mailing lists however. How long will it be until someone sends such a message to one of the PostgreSQL lists, where it will do nothing but clutter up the archives and annoy other users (who won't be able to read the message anyway as they won't be able to login to Google as pgsql-hackers@postgresql.org or whatever the list address was)?

Fixing the PostgreSQL mail servers After some discussion with some of the PostgreSQL sysadmin team, we discovered that Gmail adds a header to the messages that have confidential mode enabled (X-Gm-Locker: <token>). This is easy for us to [...]
カテゴリー: postgresql

elein mustain: Swoop de Dupe

2019-05-30(木) 01:31:36
The problem: duplicate rows Once upon a time, there was a database person who knows about referential integrity and its importance for creating database tables.  The interesting part  is that referential integrity, including primary keys, enable keeping with the standard of not having duplicate rows. However, they just wanted a private little table on their […]
カテゴリー: postgresql

Luca Ferrari: PostgreSQL is almost the best (according to Stack Overflow Survery)

2019-05-29(水) 09:00:00

Stack Overflow 2019 Suvery results are available, and PostgreSQL is almost leading in the database field.

PostgreSQL is almost the best (according to Stack Overflow Survery)

According to the 2019 suvery made by Stack Overflow and available here, PostgreSQL is the second top database, slightly ahead of Microsoft SQL Server and cleary ahead of Oracle. And this is true both for community and professional users that take the survey.



PostgreSQL is keeping its high position year after year and this means that the database is growing as a professional choice. In particular, in the professional users’ opinion PostgreSQL is more used and MySQL and MS SQL loose some points.

カテゴリー: postgresql

Dave Cramer: Deploying Active-Active PostgreSQL on Kubernetes

2019-05-29(水) 05:06:15

Kubernetes is a very popular container orchestration framework. I’ll show you how to get Symmetric-DS working on a single Kubernetes instance.

I had previously explored how to build an active-active PostgreSQL cluster using Symmetric-DS.  The steps are essentially the same on Kubernetes:

  1. Start 2 PostgreSQL pods
  2. Create a user and a database on each pod/instance
  3. Start  the primary symmetric-ds pod
  4. Add the symmetric-ds triggers and routes
  5. Open registration
  6. Start the secondary symmertic-ds pod

However, there are some interesting nuances I discovered while building out this setup, which I will discuss below. Before diving into the details, I'd also like to credit my colleagues Patrick McLaughlin and Simon Nielly as co-authors for helping me to put together this guide.

Anyway, let's explore how to deploy an active-active PostgreSQL cluster on Kubernetes.

カテゴリー: postgresql

Umair Shahid: Postgres is the coolest database – Reason #4: It is extendable

2019-05-28(火) 17:50:10
PostgreSQL is packed with features. What may not be part of the core is available as extensions. What are extensions – you ask? PostgreSQL exposes APIs that are designed to easily allow external programs to load into the database and function just like core features. So if you find that you need a feature in […]
カテゴリー: postgresql

Paolo Melchiorre: Upgrading PostgreSQL from version 10 to 11 on Ubuntu 19.04 (Disco Dingo)

2019-05-28(火) 07:00:00

Howto guide for upgrading PostgreSQL from version 10 to 11 on Ubuntu, after its upgrade from version 18.10 to 19.04.

カテゴリー: postgresql

Paul Ramsey: Parallel PostGIS and PgSQL 12

2019-05-28(火) 01:00:00

For the last couple years I have been testing out the ever-improving support for parallel query processing in PostgreSQL, particularly in conjunction with the PostGIS spatial extension. Spatial queries tend to be CPU-bound, so applying parallel processing is frequently a big win for us.

Initially, the results were pretty bad.

  • With PostgreSQL 10, it was possible to force some parallel queries by jimmying with global cost parameters, but nothing would execute in parallel out of the box.
  • With PostgreSQL 11, we got support for parallel aggregates, and those tended to parallelize in PostGIS right out of the box. However, parallel scans still required some manual alterations to PostGIS function costs, and parallel joins were basically impossible to force no matter what knobs you turned.

With PostgreSQL 12 and PostGIS 3, all that has changed. All standard query types now readily parallelize using our default costings. That means parallel execution of:

  • Parallel sequence scans,
  • Parallel aggregates, and
  • Parallel joins!!
TL;DR:

PostgreSQL 12 and PostGIS 3 have finally cracked the parallel spatial query execution problem, and all major queries execute in parallel without extraordinary interventions.

What Changed

With PostgreSQL 11, most parallelization worked, but only at much higher function costs than we could apply to PostGIS functions. With higher PostGIS function costs, other parts of PostGIS stopped working, so we were stuck in a Catch-22: improve costing and break common queries, or leave things working with non-parallel behaviour.

For PostgreSQL 12, the core team (in particular Tom Lane) provided us with a sophisticated new way to add spatial index functionality to our key functions. With that improvement in place, we were able to globally increase our function costs without breaking existing queries. That in turn has signalled the parallel query planning algorithms in PostgreSQL to parallelize spatial queries more aggressively.

Setup

In order to run these tests yourself, you will need:

  • PostgreSQL 12
  • PostGIS 3
[...]
カテゴリー: postgresql

Chris Travers: Table Inheritance: What's it Good For?

2019-05-26(日) 17:03:00
Table inheritance is one of the most misunderstood -- and powerful -- features of PostgreSQL.  With it, certain kinds of hard problems become easy.  While many folks who have been bitten by table inheritance tend to avoid the feature, this blog post is intended to provide a framework for reasoning about when table inheritance is actually the right tool for the job.

Table inheritance is, to be sure, a power tool and thus something to use only when it brings an overall reduction in complexity to the design.  Moreover the current documentation doesn't provide a lot of guidance regarding what the tool actually helps with and where are the performance costs and because inheritance sits orthogonal to relational design, working this out individually is very difficult.

This blog post covers uses of table inheritance which simplify overall database design and are not addressed by declarative partitioning, because they are used in areas other than table partitioning.

Table Inheritance Explained PostgreSQL provides the ability for tables to exist in an inheritance directed acyclic graph.  Columns provided by parent tables are merged in name and type into the child table.  Altering a parent table and adding a column thus cascades this operation to all child tables, though if any child table has a column with the same name and different type, the operation will fail.
Inheritance, Tables, and Types Every table in PostgreSQL has a corresponding campsite type, and any table can be implicitly cast to any parent table.  This is transitive.  Combined with tuple processing functions, this gives you a number of very powerful ways of working with data at various different levels of scale.
Indexes and foreign keys are not inherited.  Check constraints are inherited unless set to NO INHERIT.
Inheritance and Querying When a table is queried, by default all child tables are also queried and their results appended to the result.  Because of exclusion constraint processing, this takes out an ACCESS SHARE lock on all child tables at pl[...]
カテゴリー: postgresql

Regina Obe: PostGIS 3.0.0alpha1

2019-05-26(日) 09:00:00

The PostGIS development team is pleased to release PostGIS 3.0.0alpha1.

This release requires PostgreSQL 9.5-12beta1 and GEOS >= 3.6

Best served with PostgreSQL 12beta1.

Continue Reading by clicking title hyperlink ..
カテゴリー: postgresql

Fernando Laudares Camargos: An Overview of Sharding in PostgreSQL and How it Relates to MongoDB’s

2019-05-24(金) 17:44:38

A couple of weeks ago I presented at Percona University São Paulo about the new features in PostgreSQL that allow the deployment of simple shards. I’ve tried to summarize the main points in this post, as well as providing an introductory overview of sharding itself. Please note I haven’t included any third-party extensions that provide sharding for PostgreSQL in my discussion below.

Partitioning in PostgreSQL

In a nutshell, until not long ago there wasn’t a dedicated, native feature in PostgreSQL for table partitioning. Not that that prevented people from doing it anyway: the PostgreSQL community is very creative. There’s a table inheritance feature in PostgreSQL that allows the creation of child tables with the same structure as a parent table. That, combined with the employment of proper constraints in each child table along with the right set of triggers in the parent table, has provided practical “table partitioning” in PostgreSQL for years (and still works). Here’s an example:

Using table inheritance CREATE TABLE temperature ( id BIGSERIAL PRIMARY KEY NOT NULL, city_id INT NOT NULL, timestamp TIMESTAMP NOT NULL, temp DECIMAL(5,2) NOT NULL );

Figure 1a. Main (or parent) table

CREATE TABLE temperature_201901 (CHECK (timestamp >= DATE '2019-01-01' AND timestamp <= DATE '2019-01-31')) INHERITS (temperature); CREATE TABLE temperature_201902 (CHECK (timestamp >= DATE '2019-02-01' AND timestamp <= DATE '2019-02-28')) INHERITS (temperature); CREATE TABLE temperature_201903 (CHECK (timestamp >= DATE '2019-03-01' AND timestamp <= DATE '2019-03-31')) INHERITS (temperature);

Figure 1b. Child tables inherit the structure of the parent table and are limited by constraints

CREATE OR REPLACE FUNCTION temperature_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.timestamp >= DATE '2019-01-01' AND NEW.timestamp <= DATE '2019-01-31' ) THEN INSERT INTO temperature_201901 VALUES (NEW.*); ELSIF ( NEW.timestamp >= DATE '2019-02-01' AND NEW.timestamp <= DATE '2019-02-28' ) THEN INSERT INTO temperature_201902 [...]
カテゴリー: postgresql

Lukas Fittl: Managing multiple databases in Rails 6

2019-05-24(金) 00:30:00

If you’ve worked with Ruby on Rails you likely have some understanding of how your database works with Rails, traditionally that has always meant specifying a single database per environment in your config/database.yml, possibly together with an environment setting like DATABASE_URL. Based on that configuration all reads and writes will access the database.

With Rails 6 this is about to change, thanks to the work of Eileen M. Uchitelle together with contributors from GitHub, Basecamp and Shopify. In the upcoming Rails 6 (currently in RC1), you will be able to easily change which database server you are connecting to, to support a variety of scenarios such as using read replicas and splitting your database into dedicated components.

The most interesting part, which we wanted to detail in this post, is related to configuring automatic queries against a read replicas, or follower database.

First, let’s see how we can configure an additional database in Rails 6. In your config/database.yml, you can now specify multiple connections like this:

production: primary: <<: *defaults url: <%= ENV['DATABASE_URL'] %> follower: <<: *defaults url: <%= ENV['FOLLOWER_DATABASE_URL'] %>

Next, we configure our model to use this new database connection. There are two default connection types, called “reading” and “writing”, for which we’d want to specify the database it should connect to:

class ApplicationRecord < ActiveRecord::Base connects_to database: { writing: :primary, reading: :follower } end

Now, this doesn’t actually change anything yet. By default all queries would still go to the writing database, even if they are read queries.

First, if we want to verify this is working manually, we can enforce the use of the read database like this:

ActiveRecord::Base.connected_to(role: :reading) do puts MyObject.count end

But that would be a lot of work to modify our application code. Instead, we can tell Rails to automatically utilize the reading role for GET and HEAD requests (which are not sup

[...]
カテゴリー: postgresql

Pavel Stehule: new bugfix release of plpgsql_check

2019-05-21(火) 14:23:00
Last week I fixed some critical bugs on profiler integrated to plpgsql_check.

Now, integrated profiler is tested on real bigger project.
カテゴリー: postgresql

Dave Conlin: Multi-column indexes

2019-05-20(月) 21:08:11

This post covers standard Postgres B-tree indexes. To make it easier to read, I’ve just referred to “indexes” throughout, but it’s worth bearing in mind that the other Postgres index types behave very differently.

At pgMustard, we’ve been looking into multi-column indexes recently: particularly what problems they can cause and what advice we can offer to fix those problems. So I thought that now would be a good time to talk about their disadvantages, how to tell if they’re letting you down, and the alternatives.

In my introduction to indexing, I talked about multi-column indexes and how useful they can be if you have an expensive sort that you want to prepare in advance. They can also be used to speed up retrieving rows that match a filter or constraint, but unfortunately it’s not as simple as “I need to query on columns a, b and c, so I’ll make a multi-column index on a, b and c”.

Photo by Erol Ahmed When multi-column indexes fall flat

Suppose I have a table full of information on fiction books and I want to query it by author. Then you can imagine my index is a filing cabinet full of index cards, organised in alphabetical order by author, with dividers that make it easy to jump to any point in the catalogue. So if I want to find all the John le Carré books, I look him up under “C” and find a list of excellent spy novels. Great.

Now suppose I want to add another column to my index — publication date. If I add it as a second column after the author, then this is similar to sorting all the cards in my filing cabinet, first by author, and then by publication date within each author.

This works fine if I want to know, for example, all the Ursula K. le Guin books published after 1970. Turn to “Guin, Ursula K. le”, and then within that section, take all the cards that are after the first of January 1970. Sweet.

But if I want to know which books in my library were published before 1700? I have to go through every single author and look up which of their books are pre-1700. The index won’t save us much time, in fact

[...]
カテゴリー: postgresql

pgCMH - Columbus, OH: Users, Roles, and Permissions

2019-05-20(月) 13:00:00

The May meeting will be held at 18:00 EST on Tues, the 28th. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.

What

Seal Software’s very own Arlette will be presenting this month. She’s going to tell us all about PostgreSQL users, role and groups. The talk will discuss best practices around their user and GRANTing permissions as well as some lessons learned around their use. Arlette will even show us a trick using roles to cut down on having to manage your pg_hba.conf file.

Where

CoverMyMeds has graciously agreed to validate your parking if you use their garage so please park there:

You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive.

Park in any space that is not marked ‘24 hour reserved’.

Once parked, take the elevator/stairs to the 3rd floor to reach the Miranova lobby. Once in the lobby, the elevator bank is in the back (West side) of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11th floor. (If the elevator won’t let you pick the 11th floor, contact Doug or CJ (info below)). Once you exit the elevator, look to your left and right; one side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space:

The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079 or CJ at +1.740.407.7043

カテゴリー: postgresql

Stefan Fercot: Back from PGDay Belgium

2019-05-20(月) 09:00:00

For the first PGDay organized in Belgium by the PgBE PostgreSQL Users Group Belgium, this event regrouped around 40 PostgreSQL fans on 17 May 2019 in Leuven. It was for me a really nice day and I’d like to share it with you.

This day was all about PostgreSQL and a big opportunity to meet with other people interested in PostgreSQL in Belgium. The event was suitable for everybody, from first-time users, students, to experts and from clerks to decision-makers.

With not less than 10 talks, 2 parallel tracks in the afternoon, the list of speakers was impressive, with a lot of international speakers too.

After having learned some good advices with Hans-Jürgen Schönig during his Fixing common performance problems talk, Ilya Kosmodemiansky told us how to screw it with some PostgreSQL Worst practices.

In the afternoon, I choose to stay in the main room. So, I could watch Thijs Lemmens showing a demo on how to perform Downtimeless PG upgrades using logical replication. Based on docker and docker-compose, he created 2 PostgreSQL clusters and installed pgAdmin 4 and HAProxy. If you wish to try the demo by yourself, Thijs released everything you need on his GitHub account.

Tomas Vondra explained us next what Create statistics is and when to use it. He also gave us a quick overview of what will be new in PostgreSQL 12 on this topic. It was really interesting to see examples based on Belgian cities.

Then… the pressure was on my shoulders to talk about Streaming Replication, the basics. The idea was to summarize for beginners what WALs are, how does the Streaming Replication works, give some advices and best practices.

Boriss Mejías re-explained afterwards, with his own way, Replication, where things just work but you don’t know how. Even if we had some overlap, the two talks were in fact pretty much complementary.

Unfortunately, I missed:

  • Declarative Table Partitioning - What do I need it for? - Boriss Mejías
  • Dynamically switch between datasources in code - Marco Huygen
  • PostgreSQL Buffers - Vik Fearing
  • Data Vault 2.0 & Pivo
[...]
カテゴリー: postgresql

Pavel Stehule: how to don't emulate schema (global) variables

2019-05-19(日) 14:00:00
Postgres has not global variables for PLpgSQL. This week I worked for one customer who had emulation of global variables based on temporary tables.

Depends on usage, this implementation can be very slow - more due often using temporary tables, a pg_class, pg_attribute tables can bloat, and these functions are slower and slower. So don't use it. There are alternative, much better solution, based on custom configuration variables:

Original code:

CREATE OR REPLACE FUNCTION public.setvar(_varname text, _value text)
RETURNS void
LANGUAGE plpgsql
1 AS $function$
2 begin
3 create temporary table if not exists variables(varname text unique, value text);
4 insert into variables (varname, value) values(_varname, _value) on conflict (varname) do update set value = _value;
5 end;
6 $function$

CREATE OR REPLACE FUNCTION public.getvar(_varname text)
RETURNS text
LANGUAGE plpgsql
1 AS $function$
2 begin
3 return (select value from variables where varname = _varname);
4 exception when others then
5 return null;
6 end;
7 $function$

There are more slow things: a) creating of table (first time in session can be slow (or very slow if system catalog is bloated), b) handling a exception (inside a safe point is created and dropped every time).

Better code:

CREATE OR REPLACE FUNCTION public.setvar2(_varname text, _value text)
RETURNS void
LANGUAGE plpgsql
1 AS $function$
2 begin
3 perform set_config('variables.' || _varname, _value, false);
4 end
5 $function$

CREATE OR REPLACE FUNCTION public.getvar2(_varname text)
RETURNS text
LANGUAGE plpgsql
1 AS $function$
2 begin
3 return current_setting('variables.' || _varname, true);
4 end;
5 $function$

The differences can be measured by few synthetic benchmarks (attention - because it is tested on fresh postgresql instance, the result is best case for temporary tables solution, realit[...]
カテゴリー: postgresql

Paul Ramsey: Keynote @ FOSS4G NA 2019

2019-05-18(土) 01:00:00

Last month I was invited to give a keynote talk at FOSS4G North America in San Diego. I have been speaking about open source economics at FOSS4G conferences more-or-less every two years, since 2009, and I look forward to revisting the topic regularly: the topic is every-changing, just like the technology.

In 2009, the central pivot of thought about open source in the economy was professional open source firms in the Red Hat model. Since they we’ve taken a ride through a VC-backed “open core” bubble and are now grappling with an environment where the major cloud platforms are absorbing most of the value of open source while contributing back proportionally quite little.

What will the next two years hold? I dunno! But I have increasingly little faith that a good answer will emerge organically via market forces.

If you liked the video and want to use the materials, the slides are available here under CC BY.

カテゴリー: postgresql

ページ