planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 25分 前

Paul Ramsey: Proj6 in PostGIS

2019-02-22(金) 22:00:00

Map projection is a core feature of any spatial database, taking coordinates from one coordinate system and converting them to another, and PostGIS has depended on the Proj library for coordinate reprojection support for many years.

For most of those years, the Proj library has been extremely slow moving. New projection systems might be added from time to time, and some bugs fixed, but in general it was easy to ignore. How slow was development? So slow that the version number migrated into the name, and everyone just called it “Proj4”.

No more.

Starting a couple years ago, new developers started migrating into the project, and the pace of development picked up. Proj 5 in 2018 dramatically improved the plumbing in the difficult area of geodetic transformation, and promised to begin changing the API. Only a year later, here is Proj 6, with yet more huge infrastructural improvements, and the new API.

Some of this new work was funded via the GDALBarn project, so thanks go out to those sponsors who invested in this incredibly foundational library and GDAL maintainer Even Roualt.

For PostGIS that means we have to accomodate ourselves to the new API. Doing so not only makes it easier to track future releases, but gains us access to the fancy new plumbing in Proj.

For example, Proj 6 provides:

Late-binding coordinate operation capabilities, that takes metadata such as area of use and accuracy into account… This can avoid in a number of situations the past requirement of using WGS84 as a pivot system, which could cause unneeded accuracy loss.

Or, put another way: more accurate results for reprojections that involve datum shifts.

Here’s a simple example, converting from an old NAD27/NGVD29 3D coordinate with height in feet, to a new NAD83/NAVD88 coordinate with height in metres.

SELECT ST_Astext( ST_Transform( ST_SetSRID(geometry('POINT(-100 40 100)'),7406), 5500));

Note that the height in NGVD29 is 100 feet, if converted directly to meters, it would be 30.48 metres. The transformed po

[...]
カテゴリー: postgresql

Craig Kerstiens: Thinking in MapReduce, but with SQL

2019-02-22(金) 02:44:00

For those considering Citus, if your use case seems like a good fit, we often are willing to spend some time with you to help you get an understanding of the Citus database and what type of performance it can deliver. We commonly do this in a roughly two hour pairing session with one of our engineers. We’ll talk through the schema, load up some data, and run some queries. If we have time at the end it is always fun to load up the same data and queries into single node Postgres and see how we compare. After seeing this for years, I still enjoy seeing performance speed ups of 10 and 20x over a single node database, and in cases as high as 100x.

And the best part is it didn’t take heavy re-architecting of data pipelines. All it takes is just some data modeling, and parallelization with Citus.

The first step is sharding

We’ve talked about this before but the first key to these performance gains is that Citus splits up your data under the covers to smaller more manageable pieces. These are shards (which are standard Postgres tables) are spread across multiple physical nodes. This means that you have more collective horsepower within your system to pull from. When you’re targetting a single shard it is pretty simple: the query is re-routed to the underlying data and once it gets results it returns them.

Thinking in MapReduce

MapReduce has been around for a number of years now, and was popularized by Hadoop. The thing about large scale data is in order to get timely answers from it you need to divide up the problem and operate in parallel. Or you find an extremely fast system. The problem with getting a bigger and faster box is that data growth is outpacing hardware improvements.

MapReduce itself is a framework for splitting up data, shuffling the data to nodes as needed, and then performing the work on a subset of data before recombining for the result. Let’s take an example like counting up total pageviews. If we wanted leverage MapReduce on this we would split the pageviews into 4 separate buckets. We could do th

[...]
カテゴリー: postgresql

Andrew Staller: If PostgreSQL is the fastest growing database, then why is the community so small?

2019-02-22(金) 02:02:45

The database king continues its reign. For the second year in a row, PostgreSQL is still the fastest growing DBMS.

By comparison, in 2018 MongoDB was the second fastest growing, while Oracle, MySQL, and SQL Server all shrank in popularity.

For those who stay on top of news from database land, this should come as no surprise, given the number of PostgreSQL success stories that have been published recently:

Let’s all pat ourselves on the back, shall we? Not quite yet.

The PostgreSQL community by the numbers

As the popularity of PostgreSQL grows, attendance at community events remains small. This is the case even as more and more organizations and developers embrace PostgreSQL, so from our perspective, there seems to be a discrepancy between the size of the Postgres user base and that of the Postgres community.

The two main PostgreSQL community conferences are Postgres Conference (US) and PGConf EU. Below is a graph of Postgres Conference attendance for the last 5 years, with a projection for the Postgres Conference 2019 event occurring in March.

Last year, PGConf EU had around 500 attendees, a 100% increase since 4 years ago.

Combined, that’s about 1,100 attendees for the two largest conferences within the PostgreSQL community. By comparison, Oracle OpenWorld has about 60,000 attendees. Even MongoDB World had over 2,000 attendees in 2018.

We fully recognize that attendance at Postgres community events will be a portion of the user base. We also really enjoy these events and applaud the organizers for the effort they invest in running them. And in-person events may indeed be a lagging indicator of a systems growth in popularity. Let’s just grow faster!

One relatively new gathering point for the Postgres community is the P

[...]
カテゴリー: postgresql

Nickolay Ihalainen: Parallel queries in PostgreSQL

2019-02-21(木) 23:05:21

Modern CPU models have a huge number of cores. For many years, applications have been sending queries in parallel to databases. Where there are reporting queries that deal with many table rows, the ability for a query to use multiple CPUs helps us with a faster execution. Parallel queries in PostgreSQL allow us to utilize many CPUs to finish report queries faster. The parallel queries feature was implemented in 9.6 and helps. Starting from PostgreSQL 9.6 a report query is able to use many CPUs and finish faster.

The initial implementation of the parallel queries execution took three years. Parallel support requires code changes in many query execution stages. PostgreSQL 9.6 created an infrastructure for further code improvements. Later versions extended parallel execution support for other query types.

Limitations
  • Do not enable parallel executions if all CPU cores are already saturated. Parallel execution steals CPU time from other queries, and increases response time.
  • Most importantly, parallel processing significantly increases memory usage with high WORK_MEM values, as each hash join or sort operation takes a work_mem amount of memory.
  • Next, low latency OLTP queries can’t be made any faster with parallel execution. In particular, queries that returns a single row can perform badly when parallel execution is enabled.
  • The Pierian spring for developers is a TPC-H benchmark. Check if you have similar queries for the best parallel execution.
  • Parallel execution supports only SELECT queries without lock predicates.
  • Proper indexing might be a better alternative to a parallel sequential table scan.
  • There is no support for cursors or suspended queries.
  • Windowed functions and ordered-set aggregate functions are non-parallel.
  • There is no benefit for an IO-bound workload.
  • There are no parallel sort algorithms. However, queries with sorts still can be parallel in some aspects.
  • Replace CTE (WITH …) with a sub-select to support parallel execution.
  • Foreign data wrappers do not currently support parallel execution (but they c
[...]
カテゴリー: postgresql

Bruce Momjian: Trusted and Untrusted Languages

2019-02-21(木) 05:00:01

Postgres supports two types of server-side languages, trusted and untrusted. Trusted languages are available for all users because they have safe sandboxes that limit user access. Untrusted languages are only available to superusers because they lack sandboxes.

Some languages have only trusted versions, e.g., PL/pgSQL. Others have only untrusted ones, e.g., PL/Python. Other languages like Perl have both.

Why would you want to have both trusted and untrusted languages available? Well, trusted languages like PL/Perl limit access to only safe resources, while untrusted languages like PL/PerlU allow access to files system and network resources that would be unsafe for non-superusers, i.e., it would effectively give them the same power as superusers. This is why only superusers can use untrusted languages.

Continue Reading »

カテゴリー: postgresql

Paul Ramsey: Upgrading PostGIS on Centos 7

2019-02-21(木) 01:38:04

New features and better performance get a lot of attention, but one of the relatively unsung improvements in PostGIS over the past ten years has been inclusion in standard software repositories, making installation of this fairly complex extension a "one click" affair.

Once you've got PostgreSQL/PostGIS installed though, how are upgrades handled? The key is having the right versions in place, at the right time, for the right scenario and knowing a little bit about how PostGIS works.

カテゴリー: postgresql

Kaarel Moppel: Looking at MySQL 8 with PostgreSQL goggles on

2019-02-20(水) 18:00:55

First off – not trying to kindle any flame wars here, just trying to broaden my (your) horizons a bit, gather some ideas (maybe I’m missing out on something cool, it’s the most used Open Source RDBMS after all) and to somewhat compare the two despite being a difficult thing to do correctly / objectively. Also I’m leaving aside here performance comparisons and looking at just the available features, general querying experience and documentation clarity as this is I guess most important for beginners. So just a list of points I made for myself, grouped in no particular order.

Disclaimer: last time I used MySQL for some personal project it was 10 years ago, so basically I’m starting from zero and only took one and a half days to get to know it – thus if you see that I’ve gotten something screamingly wrong then please do leave a comment and I’ll change it. Also, my bias in this article probably tends to favour Postgres…but I’m pretty sure a MySQL veteran with good knowledge of pros and cons can write up something similar also on Postgres, so my hope is that you can leave this aside and learn a thing or two about either system.

To run MySQL I used the official Docker image, 8.0.14. Under MySQL the default InnoDB engine is meant.

docker run --rm -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root mysql:8 “mysql” CLI (“psql” equivalent) and general querying experience

* When the server requires a password why doesn’t it just ask for it?

mysql -h 0.0.0.0 -u root # adding '-p' will fix the error ERROR 1045 (28000): Access denied for user 'root'@'172.17.0.1' (using password: NO)

* Very poor tab-completion compared to “psql”. Using “mycli” instead makes much sense. I’m myself 99% of time on CLI-s, so it’s essential.
* Lot less shortcut helpers to list tables, views, functions, etc…
* Can’t set to “extended output” (columns as rows) permanently, only “auto” and “per query”.
* One does not need to specify a DB to connect to – I find it positive actually as it’s easy to forgot those database names and when once in, one can call “sh

[...]
カテゴリー: postgresql

Stefan Fercot: Monitor pgBackRest backups with Nagios

2019-02-20(水) 09:00:00

pgBackRest is a well-known powerful backup and restore tool.

Relying on the status information given by the “info” command, we’ve build a specific plugin for Nagios : check_pgbackrest.

This post will help you discover this plugin and assume you already know pgBackRest and Nagios.

Let’s assume we have a PostgreSQL cluster with pgBackRest working correctly.

Given this simple configuration:

[global] repo1-path=/some_shared_space/ repo1-retention-full=2 [mystanza] pg1-path=/var/lib/pgsql/11/data

Let’s get the status of our backups with the pgbackrest info command:

stanza: mystanza status: ok cipher: none db (current) wal archive min/max (11-1): 00000001000000040000003C/000000010000000B0000004E full backup: 20190219-121527F timestamp start/stop: 2019-02-19 12:15:27 / 2019-02-19 12:18:15 wal start/stop: 00000001000000040000003C / 000000010000000400000080 database size: 3.0GB, backup size: 3.0GB repository size: 168.5MB, repository backup size: 168.5MB incr backup: 20190219-121527F_20190219-121815I timestamp start/stop: 2019-02-19 12:18:15 / 2019-02-19 12:20:38 wal start/stop: 000000010000000400000082 / 0000000100000004000000B8 database size: 3.0GB, backup size: 2.9GB repository size: 175.2MB, repository backup size: 171.6MB backup reference list: 20190219-121527F incr backup: 20190219-121527F_20190219-122039I timestamp start/stop: 2019-02-19 12:20:39 / 2019-02-19 12:22:55 wal start/stop: 0000000100000004000000C1 / 0000000100000004000000F4 database size: 3.0GB, backup size: 3.0GB repository size: 180.9MB, repository backup size: 177.3MB backup reference list: 20190219-121527F, 20190219-121527F_20190219-121815I full backup: 20190219-122255F timestamp start/stop: 2019-02-19 12:22:55 / 2019-02-19 12:25:47 wal start/stop: 000000010000000500000000 / 00000001000000050000003[...]
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: why-upgrade updates

2019-02-20(水) 05:00:55
Recent change in layout of PG Docs broke my spider for why-upgrade.depesz.com. Today got some time and decided to bite the bullet. Fixed spider code, used it to get new changelog, and while I was at it, did couple of slight modifications of the site: display count of all changes that are there in given … Continue reading "why-upgrade updates"
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Allow user control of CTE materialization, and change the default behavior.

2019-02-19(火) 22:25:15
On 16th of February 2019, Tom Lane committed patch: Allow user control of CTE materialization, and change the default behavior.   Historically we've always materialized the full output of a CTE query, treating WITH as an optimization fence (so that, for example, restrictions from the outer query cannot be pushed into it). This is appropriate … Continue reading "Waiting for PostgreSQL 12 – Allow user control of CTE materialization, and change the default behavior."
カテゴリー: postgresql

Gabriele Bartolini: Geo-redundancy of PostgreSQL database backups with Barman

2019-02-19(火) 21:13:30

Barman 2.6 introduces support for geo-redundancy, meaning that Barman can now copy from another Barman instance, not just a PostgreSQL database.

Geographic redundancy (or simply geo-redundancy) is a property of a system that replicates data from one site (primary) to a geographically distant location as redundancy, in case the primary system becomes inaccessible or is lost. From version 2.6, it is possible to configure Barman so that the primary source of backups for a server can also be another Barman instance, not just a PostgreSQL database server as before.

Briefly, you can define a server in your Barman instance (passive, according to the new Barman lingo), and map it to a server defined in another Barman instance (primary).

All you need is an SSH connection between the Barman user in the primary server and the passive one. Barman will then use the rsync copy method to synchronise itself with the origin server, copying both backups and related WAL files, in an asynchronous way. Because Barman shares the same rsync method, geo-redundancy can benefit from key features such as parallel copy and network compression. Incremental backup will be included in future releases.

Geo-redundancy is based on just one configuration option: primary_ssh_command.

Our existing scenario

To explain how geo-redundancy works, we will use the following example scenario. We keep it very simple for now.

We have two identical data centres, one in Europe and one in the US, each with a PostgreSQL database server and a Barman server:

  • Europe:
    • PostgreSQL server: eric
    • Barman server: jeff, backing up the database server hosted on eric
  • US:
    • PostgreSQL server: duane
    • Barman server: gregg, backing up the database server hosted on duane

Let’s have a look at how jeff is configured to backup eric, by reading the content of the /etc/barman.d/eric.conf file:

[eric] description = Main European PostgreSQL server 'Eric' conninfo = user=barman-jeff dbname=postgres host=eric ssh_command = ssh postgres@eric backup_method = rsync parallel_jobs = 4 r[...]
カテゴリー: postgresql

Bruce Momjian: Order of SELECT Clause Execution

2019-02-19(火) 01:30:01

SQL is a declaritive language, meaning you specify what you want, rather than how to generate what you want. This leads to a natural language syntax, like the SELECT command. However, once you dig into the behavior of SELECT, it becomes clear that it is necessary to understand the order in which SELECT clauses are executed to take full advantage of the command.

I was going to write up a list of the clause execution ordering, but found this webpage that does a better job of describing it than I could. The ordering bounces from the middle clause (FROM) to the bottom to the top, and then the bottom again. It is hard to remember the ordering, but memorizing it does help in constructing complex SELECT queries.

カテゴリー: postgresql

Robert Haas: Tuning autovacuum_naptime

2019-02-19(火) 00:23:00
One of the things I sometimes get asked to do is review someone's postgresql.conf settings.  From time to time, I run across a configuration where the value of autovacuum_naptime has been increased, often by a large multiple.  The default value of autovacuum_naptime is 1 minute, and I have seen users increase this value to 1 hour, or in one case, 1 day.  This is not a good idea.  In this blog post, I will attempt to explain why it's not a good idea, and also something about the limited circumstances under which you might want to change autovacuum_naptime.

Read more »
カテゴリー: postgresql

Jonathan Katz: WITH Queries: Present & Future

2019-02-18(月) 08:46:00

Common table expressions, aka CTEs, aka WITH queries, are not only the gateway to writing recursive SQL queries, but also help developers write maintainable SQL. WITH query clauses can help developers who are more comfortable writing in imperative languages to feel more comfortable writing SQL, as well as help reduce writing redundant code by reusing a particular common table expressions multiple times in a query.

A new patch, scheduled to be a part of PostgreSQL 12 major release later in the year, introduces the ability, under certain conditions, to inline common table expressions within a query. This is a huge feature: many developers could suddenly see their existing queries speed up significantly, and the ability to explicitly specify when to inline (i.e. the planner "substitutes" a reference to the CTE in the main query and can then optimize further) or, conversely, materialize (i.e. place the CTE into memory but lose out on certain planning & execution optimizations).

But why is this a big deal? Before we look into the future, first let's understand how WITH queries currently work in PostgreSQL.

カテゴリー: postgresql

Bruce Momjian: Imperative to Declarative to Imperative

2019-02-16(土) 02:00:01

This email thread from 2017 asks the question of whether there is an imperative language that generates declarative output that can be converted into an imperative program and executed. Specifically, is there an imperative syntax that can output SQL (a declarative language) which can be executed internally (imperatively) by Postgres?

The real jewel in this email thread is from Peter Geoghegan, who has some interesting comments. First, he explains why developers would want an imperative language interface, even if it has to be converted to declarative:

Some developers don't like SQL because they don't have a good intuition for how the relational model works. While SQL does have some cruft — incidental complexity that's a legacy of the past — any language that corrected SQL's shortcomings wouldn't be all that different to SQL, and so wouldn't help with this general problem. QUEL wasn't successful because it was only somewhat better than SQL was at the time.

Continue Reading »

カテゴリー: postgresql

Baron Schwartz: Citus: Scale-Out Clustering and Sharding for PostgreSQL

2019-02-15(金) 10:02:02

I wrote yesterday about Vitess, a scale-out sharding solution for MySQL. Another similar product is Citus, which is a scale-out sharding solution for PostgreSQL. Similar to Vitess, Citus is successfully being used to solve problems of scale and performance that have previously required a lot of custom-built middleware.

カテゴリー: postgresql

Markus Winand: PostgreSQL 11 Reestablishes Window Functions Leadership

2019-02-14(木) 09:00:00
What’s new in PostgreSQL 11

PosgreSQL 11 was released four months ago and my review is long overdue. Here we go!

With respect to standard SQL, the main theme in PostgreSQL 11 is window functions (over). For almost eight years, from 2009 until 2017, PostgreSQL was the only major free open-source product to support SQL window functions. Just a year later, by September 2018, all open-source competitors have caught up…and some even overtook PostgreSQL. The PostgreSQL community was prepared. PostgreSQL 11 was just released in 2018, and it has restored and even expanded its leadership position.0

This article explains this race and covers other improvements in PostgreSQL 11.

Complete SQL:2011 Over Clause

The over clause defines which rows are visible to a window function. Window functions were originally standardized with SQL:2003, and PostgreSQL has supported them since PostgreSQL 8.4 (2009). In some areas, the PostgreSQL implementation was less complete than the other implementations (range frames, ignore nulls), but in other areas it was the first major system to support them (the window clause). In general, PostgreSQL was pretty close to the commercial competitors, and it was the only major free database to support window functions at all—until recently.

In 2017, MariaDB introduced window functions. MySQL and SQLite followed in 2018. At that time, the MySQL implementation of the over clause was even more complete than that of PostgreSQL, a gap that PostgreSQL 11 closed. Furthermore, PostgreSQL is again the first to support some aspects of the over clause, namely the frame unit groups and frame exclusion. These are not yet supported by any other major SQL database—neither open-source, nor commercial.

The only over clause feature not supported by PostgreSQL 11 are pattern and related clauses. These clauses were just standardized with SQL:2016 and do a framing based on a regular expression. No major database supports this this framing yet.1

Frame Units

Before looking into the new functionality in PostgreSQL 11, I’l

[...]
カテゴリー: postgresql

Jobin Augustine: plprofiler – Getting a Handy Tool for Profiling Your PL/pgSQL Code

2019-02-14(木) 03:20:18

PostgreSQL is emerging as the standard destination for database migrations from proprietary databases. As a consequence, there is an increase in demand for database side code migration and associated performance troubleshooting. One might be able to trace the latency to a plsql function, but explaining what happens within a function could be a difficult question. Things get messier when you know the function call is taking time, but within that function there are calls to other functions as part of its body. It is a very challenging question to identify which line inside a function—or block of code—is causing the slowness. In order to answer such questions, we need to know how much time an execution spends on each line or block of code. The plprofiler project provides great tooling and extensions to address such questions.

Demonstration of plprofiler using an example

The plprofiler source contains a sample for testing plprofiler. This sample serves two purposes. It can be used for testing the configuration of plprofiler, and it is great place to see how to do the profiling of a nested function call. Files related to this can be located inside the “examples” directory. Don’t worry—I’ll be running through the installation of plprofiler later in this article.

$ cd examples/

The example expects you to create a database with name “pgbench_plprofiler”

postgres=# CREATE DATABASE pgbench_plprofiler; CREATE DATABASE

The project provides a shell script along with a source tree to test plprofiler functionality. So testing is just a matter of running the shell script.

$ ./prepdb.sh dropping old tables... .... Running session level profiling

This profiling uses session level local-data. By default the plprofiler extension collects runtime data in per-backend hashtables (in-memory). This data is only accessible in the current session, and is lost when the session ends or the hash tables are explicitly reset. plprofiler’s run command will execute the plsql code and capture the profile information.

This is illustrated by below

[...]
カテゴリー: postgresql

Joe Conway: PostgreSQL Deep Dive: How Your Data Model Affects Storage

2019-02-14(木) 02:35:00

I want to take a few minutes for a deep dive into the effect your data model has on storage density when using PostgreSQL. When this topic came up with a customer, I explained my thoughts on the matter, but I realized at the time that I had never done a reasonably careful apples-to-apples test to see just exactly what the effect is, at least for a model sample size of one. So here it is.

カテゴリー: postgresql

Bruce Momjian: Composite Values

2019-02-14(木) 02:15:01

You might not be aware that you can store a virtual row, called a composite value, inside a database field. Composite values have their own column names and data types. This is useful if you want to group multiple statically-defined columns inside a single column. (The JSON data types are ideal for dynamically-defined columns.)

This email thread explains how to define and use them, I have a presentation that mentions them, and the Postgres manual has a section about them.

カテゴリー: postgresql

ページ