planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 35分 前

Haroon .: PostgreSQL: Regular expressions and pattern matching

2019-07-31(水) 21:31:25
A regular expression is a special text string used to describe a search pattern. PostgreSQL’s regular expressions supports three separate approaches to pattern matching: POSIX-style regular expressions (BREs and EREs) SIMILAR TO operator added in SQL:1999 SQL LIKE operator There are some more advanced techniques for advanced pattern matching requirements but those will very likely […]
カテゴリー: postgresql

Paul Ramsey: Waiting for PostGIS 3: ST_AsMVT Performance

2019-07-31(水) 07:39:17

Vector tiles are the new hotness, allowing large amounts of dynamic data to be sent for rendering right on web clients and mobile devices, and making very beautiful and highly interactive maps possible.

Since the introduction of ST_AsMVT(), people have been generating their tiles directly in the database more and more, and as a result wanting tile generation to go faster and faster.

Every tile generation query has to carry out the following steps:

  • Gather all the relevant rows for the tile
  • Simplify the data appropriately to match the resolution of the tile
  • Clip the data to the bounds of the tile
  • Encode the data into the MVT protobuf format

For PostGIS 3.0, performance of tile generation has been vastly improved.

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – Add support for –jobs in reindexdb

2019-07-31(水) 07:04:31
On 27th of July 2019, Michael Paquier committed patch: Add support for --jobs in reindexdb   When doing a schema-level or a database-level operation, a list of relations to build is created which gets processed in parallel using multiple connections, based on the recent refactoring for parallel slots in src/bin/scripts/. System catalogs are processed first … Continue reading "Waiting for PostgreSQL 13 – Add support for –jobs in reindexdb"
カテゴリー: postgresql

Paul Ramsey: Simple SQL GIS

2019-07-31(水) 05:16:14

And, late on a Friday afternoon, the plaintive cry was heard!

Anyone got a KML/Shapefile of B.C. elxn boundaries that follows the water (Elections BC's KML has ridings going out into the sea)

— Chad Skelton (@chadskelton) November 16, 2012

And indeed, into the sea they do go!

And ‘lo, the SQL faeries were curious, and gave it a shot!

##### Commandline OSX/Linux ##### # Get the Shape files # wget # Exe? No prob, it's actually a self-extracting ZIP unzip ED_Province # Get a PostGIS database ready for the data createdb ed_clip psql -c "create extension postgis" -d ed_clip # Load into PostGIS # The .prj says it is "Canada Albers Equal Area", but they # lie! It's actually BC Albers, EPSG:3005 shp2pgsql -s 3005 -i -I ED_Province ed | psql -d ed_clip # We need some ocean! Use Natural Earth... # wget unzip # Load the Ocean into PostGIS! shp2pgsql -s 4326 -i -I ne_10m_ocean ocean | psql -d ed_clip # OK, now we connect to PostGIS and start working in SQL psql -e ed_clip -- How big is the Ocean table? SELECT Count(*) FROM ocean; -- Oh, only 1 polygon. Well, that makes it easy... -- For each electoral district, we want to difference away the ocean. -- The ocean is a one big polygon, this will take a while (if we -- were being more subtle, we'd first clip the ocean down to -- a reasonable area around BC.) CREATE TABLE ed_clipped AS SELECT CASE WHEN ST_Intersects(o.geom, ST_Transform(e.geom,4326)) THEN ST_Difference(ST_Transform(e.geom,4326), o.geom) ELSE ST_Transform(e.geom,4326) END AS geom, e.edabbr, e.edname FROM ed e, ocean o; -- Check our geometry types... SELECT DISTINCT ST_GeometryType(geom) FROM ed_clipped; -- Oh, they are heterogeneous. Let's force them all multi U[...]
カテゴリー: postgresql

Avinash Kumar: Using plpgsql_check to Find Compilation Errors and Profile Functions

2019-07-31(水) 03:02:26

There is always a need for profiling tools in databases for admins or developers. While it is easy to understand the point where an SQL is spending more time using

EXPLAINor EXPLAIN ANALYZEin PostgreSQL, the same would not work for functions. Recently, Jobin has published a blog post where he demonstrated how plprofiler can be useful in profiling functions. plprofilerbuilds call graphs and creates flame graphs which make the report very easy to understand. Similarly, there is another interesting project called plpgsql_checkwhich can be used for a similar purpose as plprofiler, while it also looks at code and points out compilation errors. Let us see all of that in action, in this blog post. Installing plpgsql-check

You could use yum on RedHat/CentOS to install this extension from PGDG repository. Steps to perform source installation on Ubuntu/Debian are also mentioned in the following logs.

On RedHat/CentOS

$ sudo yum install plpgsql_check_11

On Ubuntu/Debian

$ sudo apt-get install postgresql-server-dev-11 libicu-dev gcc make $ git clone $ cd plpgsql_check/ $ make && make install Creating and enabling this extension

There are 3 advantages of using

  1. Checking for compilation errors in a function code
  2. Finding dependencies in functions
  3. Profiling functions

When using plpgsql_check for the first 2 requirements, you may not need to add any entry to

shared_preload_libraries. However, if you need to use it for profiling functions (3), then you should add appropriate entries to shared_preload_libraries so that it could load both plpgsqland plpgsql_check. Due to dependencies, plpgsqlmust be before plpgsql_checkin the shared_preload_librariesconfig as you see in the following example : shared_preload_libraries = plpgsql, plpgsql_check

Any change to

shared_preload_librariesrequires a restart. You may see the following error when you do not have plpgsqlbefore plpgsql_checkin the shared_preload_librariesconfig. $ g[...]
カテゴリー: postgresql

Viorel Tabara: Cloud Vendor Deep-Dive: PostgreSQL on AWS Aurora

2019-07-30(火) 23:56:33

How deep should we go with this? I’ll start by saying that as of this writing, I could locate only 3 books on Amazon about PostgreSQL in the cloud, and 117 discussions on PostgreSQL mailing lists about Aurora PostgreSQL. That doesn’t look like a lot, and it leaves me, the curious PostgreSQL end user, with the official documentation as the only place where I could really learn some more. As I don’t have the ability, nor the knowledge to adventure myself much deeper, there is AWS re:Invent 2018 for those who are looking for that kind of thrill. I can settle for Werner’s article on quorums.

To get warmed up, I started from the Aurora PostgreSQL homepage where I noted that the benchmark showing that Aurora PostgreSQL is three times faster than a standard PostgreSQL running on the same hardware dates back to PostgreSQL 9.6. As I’ve learned later, 9.6.9 is currently the default option when setting up a new cluster. That is very good news for those who don’t want to, or cannot upgrade right away. And why only 99.99% availability? One explanation can be found in Bruce Momjian’s article.


According to AWS, Aurora PostgreSQL is a drop-in replacement for PostgreSQL, and the documentation states:

The code, tools, and applications you use today with your existing MySQL and PostgreSQL databases can be used with Aurora.

That is reinforced by Aurora FAQs:

It means that most of the code, applications, drivers and tools you already use today with your PostgreSQL databases can be used with Aurora with little or no change. The Amazon Aurora database engine is designed to be wire-compatible with PostgreSQL 9.6 and 10, and supports the same set of PostgreSQL extensions that are supported with RDS for PostgreSQL 9.6 and 10, making it easy to move applications between the two engines.

“most” in the above text suggests that there isn’t a 100% guarantee in which case those seeking certainty should consider purchasing technical support fro

カテゴリー: postgresql

Ibrar Ahmed: Parallelism in PostgreSQL

2019-07-30(火) 22:31:52

PostgreSQL is one of the finest object-relational databases, and its architecture is process-based instead of thread-based. While almost all the current database systems utilize threads for parallelism, PostgreSQL’s process-based architecture was implemented prior to POSIX threads. PostgreSQL launches a process “postmaster” on startup, and after that spans new process whenever a new client connects to the PostgreSQL.

Before version 10 there was no parallelism in a single connection. It is true that multiple queries from the different clients can have parallelism because of process architecture, but they couldn’t gain any performance benefit from one another. In other words, a single query runs serially and did not have parallelism. This is a huge limitation because a single query cannot utilize the multi-core. Parallelism in PostgreSQL was introduced from version 9.6. Parallelism, in a sense, is where a single process can have multiple threads to query the system and utilize the multicore in a system. This gives PostgreSQL intra-query parallelism. 

Parallelism in PostgreSQL was implemented as part of multiple features which cover sequential scans, aggregates, and joins.

Components of Parallelism in PostgreSQL

There are three important components of parallelism in PostgreSQL. These are the process itself, gather, and workers. Without parallelism the process itself handles all the data, however, when planner decides that a query or part of it can be parallelized, it adds a Gather node within the parallelizable portion of the plan and makes a gather root node of that subtree.  Query execution starts at the process (leader) level and all the serial parts of the plan are run by the leader. However, if parallelism is enabled and permissible for any part (or whole) of the query, then gather node with a set of workers is allocated for it. Workers are the threads that run in parallel with part of the tree (partial-plan) that needs to be parallelized. The relation’s blocks are divided amongst threads su

カテゴリー: postgresql

Paul Ramsey: PostGIS Overlays

2019-07-30(火) 17:00:00

One question that comes up often during our PostGIS training is “how do I do an overlay?” The terminology can vary: sometimes they call the operation a “union” sometimes an “intersect”. What they mean is, “can you turn a collection of overlapping polygons into a collection of non-overlapping polygons that retain information about the overlapping polygons that formed them?”

So an overlapping set of three circles becomes a non-overlapping set of 7 polygons.

Calculating the overlapping parts of a pair of shapes is easy, using the ST_Intersection() function in PostGIS, but that only works for pairs, and doesn’t capture the areas that have no overlaps at all.

How can we handle multiple overlaps and get out a polygon set that covers 100% of the area of the input sets? By taking the polygon geometry apart into lines, and then building new polygons back up.

Let’s construct a synthetic example: first, generate a collection of random points, using a Gaussian distribution, so there’s more overlap in the middle. The crazy math in the SQL below just converts the uniform random numbers from the random() function into normally distributed numbers.

CREATE TABLE pts AS WITH rands AS ( SELECT generate_series as id, random() AS u1, random() AS u2 FROM generate_series(1,100) ) SELECT id, ST_SetSRID(ST_MakePoint( 50 * sqrt(-2 * ln(u1)) * cos(2*pi()*u2), 50 * sqrt(-2 * ln(u1)) * sin(2*pi()*u2)),4326) AS geom FROM rands;

The result looks like this:

Now, we turn the points into circles, big enough to have overlaps.

CREATE TABLE circles AS SELECT id, ST_Buffer(geom, 10) AS geom FROM pts;

Which looks like this:

Now it’s time to take the polygons apart. In this case we’ll take the exterior ring of the circles, using ST_ExteriorRing(). If we were dealing with complex polygons with holes, we’d have to use ST_DumpRings(). Once we have the rings, we want to make sure that everywhere rings cross the lines are broken, so that no lines cross, they only touch at their e

カテゴリー: postgresql

Hans-Juergen Schoenig: Combined indexes vs. separate indexes in PostgreSQL

2019-07-29(月) 19:01:56

A “composite index”, also known as “concatenated index”, is an index on multiple columns in a table. Many people are wondering, what is more beneficial: Using separate or using composite indexes? Whenever we do training, consulting or support this question is high up on the agenda and many people keep asking this question. Therefore, I decided to shed some light on this question.

Which indexes shall one create?

To discuss the topic on a more practical level, I created a table consisting of three columns. Then I loaded 1 million rows and added a composite index covering all three columns:

test=# CREATE TABLE t_data (a int, b int, c int); CREATE TABLE test=# INSERT INTO t_data SELECT random()*100000, random()*100000, random()*100000 FROM generate_series(1, 1000000); INSERT 0 1000000 test=# CREATE INDEX idx_data ON t_data(a, b, c); CREATE INDEX

The layout of the table is therefore as follows:

test=# \d t_data Table "public.t_data" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | b | integer | | | c | integer | | | Indexes: "idx_data" btree (a, b, c)

Let us run ANALYZE now to ensure that optimizer statistics are there. Usually autovacuum will kick in and create statistics for your table, but to make sure running ANALYZE does not hurt in this case.

test=# ANALYZE t_data; ANALYZE PostgreSQL will rearrange filters for you

The first important thing to observe is that PostgreSQL will try to arrange the filters in your query for you. The following query will filter on all indexed columns:

test=# explain SELECT * FROM t_data WHERE c = 10 AND b = 20 AND a = 10; QUERY PLAN --------------------------------------------------- Index Only Scan [...]
カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - Replication slot copy

2019-07-29(月) 16:54:51

Replication slots can be used in streaming replication, with physical replication slots, and logical decoding, with logical replication slots, to retain WAL in a more precise way than wal_keep_segments so as past WAL segments are removed at checkpoint using the WAL position a client consuming the slot sees fit. A feature related to replication slots has been committed to PostgreSQL 12:

commit: 9f06d79ef831ffa333f908f6d3debdb654292414 author: Alvaro Herrera date: Fri, 5 Apr 2019 14:52:45 -0300 Add facility to copy replication slots This allows the user to create duplicates of existing replication slots, either logical or physical, and even changing properties such as whether they are temporary or the output plugin used. There are multiple uses for this, such as initializing multiple replicas using the slot for one base backup; when doing investigation of logical replication issues; and to select a different output plugins. Author: Masahiko Sawada Reviewed-by: Michael Paquier, Andres Freund, Petr Jelinek Discussion:

This introduces two new SQL functions adapted for each slot type:

  • pg_copy_logical_replication_slot
  • pg_copy_physical_replication_slot

By default pg_basebackup uses a temporary replication slot to make sure that while transferring the data of the main data folder the WAL segments necessary for recovery from the beginning to the end of the backup are transferred properly, and that the backup does not fail in the middle of processing. In this case the slot is called pg_basebackup_N where N is the PID of the backend process running the replication connection. However there are cases where it makes sense to not use a temporary slot but a permanent one, particularly when reusing a base backup as a standby with no WAL archiving around, so as it is possible to keep WAL around for longer without having a primary’s checkpoint interfere with the recycling of WAL segments. O

カテゴリー: postgresql

Sebastian Insausti: Big Data with PostgreSQL and Apache Spark

2019-07-26(金) 18:48:00

PostgreSQL is well known as the most advanced opensource database, and it helps you to manage your data no matter how big, small or different the dataset is, so you can use it to manage or analyze your big data, and of course, there are several ways to make this possible, e.g Apache Spark. In this blog, we’ll see what Apache Spark is and how we can use it to work with our PostgreSQL database.

For big data analytics, we have two different types of analytics:

  • Batch analytics: Based on the data collected over a period of time.
  • Real-time (stream) analytics: Based on an immediate data for an instant result.
What is Apache Spark?

Apache Spark is a unified analytics engine for large-scale data processing that can work on both batch and real-time analytics in a faster and easier way.

It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.

Apache Spark Components Apache Spark Libraries Related resources  ClusterControl for PostgreSQL  Scaling PostgreSQL for Large Amounts of Data  An Introduction to Data Lakes

Apache Spark includes different libraries:

  • Spark SQL: It’s a module for working with structured data using SQL or a DataFrame API. It provides a common way to access a variety of data sources, including Hive, Avro, Parquet, ORC, JSON, and JDBC. You can even join data across these sources.
  • Spark Streaming: It makes easy to build scalable fault-tolerant streaming applications using a language-integrated API to stream processing, letting you write streaming jobs the same way you write batch jobs. It supports Java, Scala and Python. Spark Streaming recovers both lost work and operator state out of the box, without any extra code on your part. It lets you reuse the same code for batch processing, join streams again
カテゴリー: postgresql

Álvaro Hernández: Benchmarking: Do it with transparency or don't do it at all

2019-07-26(金) 02:30:13
Introduction This post is a reply to MongoDB’s “Benchmarking: Do it right or don’t do it at all” post. Which they wrote as a response to the whitepaper “Performance Benchmark: PostgreSQL / MongoDB”, published and sponsored by EnterpriseDB and performed by OnGres. While a long read at close to 50 pages, we encourage you to at least read the executive summary (2 pages) and any other relevant section, to have the right context.
カテゴリー: postgresql

Paul Ramsey: Waiting for PostGIS 3: Hilbert Geometry Sorting

2019-07-26(金) 02:20:32

With the release of PostGIS 3.0, queries that ORDER BY geometry columns will return rows using a Hilbert curve ordering, and do so about twice as fast.


The history of "ordering by geometry" in PostGIS is mostly pretty bad. Up until version 2.4 (2017), if you did ORDER BY on a geometry column, your rows would be returned using the ordering of the minimum X coordinate value in the geometry.

One of the things users expect of "ordering" is that items that are "close" to each other in the ordered list should also be "close" to each other in the domain of the values. For example, in a set of sales orders ordered by price, rows next to each other have similar prices.

To visualize what geometry ordering looks like, I started with a field of 10,000 random points.

カテゴリー: postgresql

Thom Brown: jsquery vs SQL/JSON

2019-07-25(木) 20:42:00
SQL/JSON is coming to PostgreSQL 12 and provides a native way to query JSON data (although, to be specific, JSON data stored as a JSONB data type).  This also introduces the jsonpath data type which is used for SQL/JSON query expressions.  I'll not be going into its usage in detail, or covering performance characteristics (at least not in this post), but I will compare the syntax and functionality with jsquery.  Note that this may potentially change prior to final release.

jsquery was introduced as a non-core extension for PostgreSQL 9.4 and higher by Teodor Sigaev, Alexander Korotkov and Oleg Bartunov.  Like jsonpath, it also used its own datatype, jsquery.

We have some functions to which we can pass jsonpath expressions to, 2 of which have operators which can be used as shorthand (albeit without additional parameter control, as they exist primarily for indexing purposes):
Function Operator Description jsonb_path_exists @? This will return true if it matched something, false if not, or null if it resulted in an operation on a missing value. jsonb_path_match @@ This does the same thing as jsonb_path_exists, but only first result item is tested. jsonb_path_query None Returns the JSON data resulting from the jsonpath expression. jsonb_path_query_array None Same as jsonb_path_query, but puts the result in a JSON array. jsonb_path_query_first None Same as jsonb_path_query, but only selects the first value.
These will suppress errors where there's a lack of an array element, object field, an unexpected JSON type or numeric errors.

Here are some examples [...]
カテゴリー: postgresql

Johann Oskarsson: Running Pgbadger Automatically with Zsh

2019-07-25(木) 20:11:40

Here I present a simple script that queries the database for the log file locations and automatically chooses the one for yesterday. The trick here is to apply strftime to the log filename as configured in the database. This way, it doesn’t matter how complex the log file name is in the database, it’s alawys easy to guess. All this works as long as the system’s strftime and PostgreSQL’s log file escapes are equivalent; this should never be an issue.

There are some limits to the current script. It assumes log_directory is to be found within data_directory and is not an absolute path. It also assumes there are no strftime escape sequences in the directory names themselves. Fixing either or both of these is not hard if a more general script is needed. It is also hardcoded to choose the csv log file, but this is easy to change.

Finally it runs pgbadger on the log file from yesterday, and outputs html in the webserver’s directory. The assumption here is that PostgreSQL will write its log into a new file every day, possibly in a rotating sequence.

The script is mean to be called every day with cron.

#!/usr/local/bin/zsh zmodload zsh/datetime datadir=`/usr/local/bin/psql -A -t -q -w -c "show data_directory" ;` logdir=`/usr/local/bin/psql -A -t -q -w -c "show log_directory" ;` filename=`/usr/local/bin/psql -A -t -q -w -c "show log_filename" ;` strftime -s log $datadir/$logdir/$filename $(( $epochtime[ 1 ] - 24 * 60 * 60 )) /usr/local/bin/pgbadger -q -I -O /var/www/html/pgbadger/ -o index.html ${log:r}.csv
カテゴリー: postgresql

Liaqat Andrabi: Webinar: Postgres Vacuuming Through Pictures [Follow up]

2019-07-25(木) 19:31:50
Vacuuming is a key feature of PostgreSQL databases to keep databases healthy and optimized. For this, Autovacuum is configured to conserve space by removing obsolete disk usage, and to optimize database performance by speeding up sequential scans, just as one example. To discuss this topic in-depth, 2ndQuadrant hosted the “Postgres Vacuuming Through Pictures” webinar. The […]
カテゴリー: postgresql

Devrim GÜNDÜZ: Installing PostgreSQL on RHEL 8

2019-07-25(木) 17:24:00
RHEL 8 was released 2 months ago, and we already released RPMs for many of the packages. RHEL 8 support is limited, though -- we will release more packages when EPEL 8 is out.

Because of the modularity feature in RHEL 8, installing PGDG RPMson RHEL 8 requires extra step. Continue reading "Installing PostgreSQL on RHEL 8 "
カテゴリー: postgresql

Pavel Stehule: new releases of plpgsql_check and pspg

2019-07-24(水) 22:06:00
Here are plpgsql_check and pspg.

I fixed few significant errors in profiler integrated to plpgsql_check. I hope, so profiler should to work well now.

The bugs fixed in pspg are minor bugfixes.
カテゴリー: postgresql

Tatsuo Ishii: When primary server is far away from standby server

2019-07-24(水) 17:22:00
Sometimes we want to create a cluster configuration in which the primary server is physically located far away from standby servers. For example, the primary server is located in an AWS region A, while the standby server is located in an AWS region B. This type of configuration is suited for a DR (Disaster Recovery) or a company which has a branch in different countries. In the figure below, a company's head quarter is located in region A,  and people in the HQ office access the primary server database through Pgpool-II.

People in a branch office located in region B access standby database in mostly read only manner. In a few occasion they need to write the database but they understand that it needs to access HQ database which would be slow. Problem for them is, even if they just do a read access to the database, Pgpool-II needs to access system catalog in the primary server to obtain meta info of the tables involved in their query.

From Pgpool-II 4.1,  which is under development, user could eliminate the slow system catalog access in from region B to region A by setting relcache_query_target = load_balance_node and backend weight of primary server to 0 in Pgpool-II running in region B.

Now the system catalog access is redirected to the standby server, rather than to the primary server and the access speed should faster.

Please note that , however, there may be replication delay and that could cause a trouble if the target table is pretty new because the system catalog in the standby server may not the have the new table info. So this configuration is best suitable for a system the table creation is rare, or they are created while users do not access the system.

By the way, Pgpool-II developers define the target date for release 4.1 of Pgpool-II in this October. Please stay tuned.

カテゴリー: postgresql

Paul Ramsey: Waiting for PostGIS 3: ST_AsGeoJSON(record)

2019-07-24(水) 02:20:19

With PostGIS 3.0, it is now possible to generate GeoJSON features directly without any intermediate code, using the new ST_AsGeoJSON(record) function.

The GeoJSON format is a common transport format, between servers and web clients, and even between components of processing chains. Being able to create useful GeoJSON is important for integrating different parts in a modern geoprocessing application.

カテゴリー: postgresql