planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 14分 前

Pavel Stehule: Pager for data

2018-10-08(月) 17:03:00
Any Unix like systems has great feature - a pager. The history of pagers is pretty old. Originally was used manually. The result of some programs can be redirected to any program with pipe operator.

ls -la | more

One, most simple pager is more. It can scroll only in one direction. Usually this is default pager in Unix systems.

Much more powerful pager is less. Has lot of advantages and functionality - mainly possibility to scroll in two direction, strong navigation inside result. It can display long lines, and allow horizontal scrolling. This pager is necessary for comfortable usage of psql console.

export PAGER="less -S"
psql postgres
postgres=>select * from pg_class;

It is good to know some less commands:
  • g - move to document begin,
  • G - move to document end,
  • / - search string,
  • n - next occurrence of string,
  • N - previous occurrence of string,
  • q - quit.
less is very rich pager has special support for displaying man pages. But this feature we cannot to use for database data. For tabular data, there are special pager pspg. This pager can freeze n first columns, and m first rows. It knows almost all keyboard commands of less pager, and append some new (based on mcedit keywords).
  • Alt k - new bookmark
  • Alt i - previous bookmark
  • Alt j - next bookmark
  • Alt n - show row numbers
  • Alt l - go to line
  • Ctrl Home - move to document begin
  • Ctrl End - move to document end
  • F9 - menu
  • F10 - quit
The usage of pspg is same: export PAGER=pspg
psql postgres
This pager is available from PostgreSQL community repository or from git
カテゴリー: postgresql

Avinash Kumar: PostgreSQL Extensions for an Enterprise-Grade System

2018-10-06(土) 01:43:42

In this current series of blog posts we have been discussing various relevant aspects when building an enterprise-grade PostgreSQL setup, such as security, back up strategy, high availability, and different methods to scale PostgreSQL. In this blog post, we’ll get to review some of the most popular open source extensions for PostgreSQL, used to expand its capabilities and address specific needs. We’ll cover some of them during a demo in our upcoming webinar on October 10.

Expanding with PostgreSQL Extensions

PostgreSQL is one of the world’s most feature-rich and advanced open source RDBMSs. Its features are not just limited to those released by the community through major/minor releases. There are hundreds of additional features developed using the extensions capabilities in PostgreSQL, which can cater to needs of specific users. Some of these extensions are very popular and useful to build an enterprise-grade PostgreSQL environment. We previously blogged about a couple of FDW extensions (mysql_fdw and postgres_fdw ) which will allow PostgreSQL databases to talk to remote homogeneous/heterogeneous databases like PostgreSQL and MySQL, MongoDB, etc. We will now cover a few other additional extensions that can expand your PostgreSQL server capabilities.


The pg_stat_statements module provides a means for tracking execution statistics of all SQL statements executed by a server. The statistics gathered by the module are made available via a view named pg_stat_statements. This extension must be installed in each of the databases you want to track, and like many of the extensions in this list, it is available in the contrib package from the PostgreSQL PGDG repository.


Tables in PostgreSQL may end up with fragmentation and bloat due to the specific MVCC implementation in PostgreSQL, or simply due to a high number of rows being naturally removed. This could lead to not only unused space being held inside the table but also to sub-optimal execution of SQL statements. pg_repack is the most popul

カテゴリー: postgresql

Sebastian Insausti: A New Way to Personalize Your PostgreSQL Monitoring with Prometheus

2018-10-05(金) 17:54:06

Monitoring is one of the fundamental tasks in any system. It can help us to detect problems and take action, or simply to know the current state of our systems. Using visual displays can make us more effective as we can easier detect performance problems.

In this blog, we will see how to use SCUMM to monitor our PostgreSQL databases and what metrics we can use for this task. We’ll also go through the dashboards available, so you can easily figure out what’s really happening with your PostgreSQL instances.

What is SCUMM?

First of all, let’s see what is SCUMM (Severalnines CMON Unified Monitoring and Management ).

It’s a new agent-based solution with agents installed on the database nodes.

The SCUMM Agents are Prometheus exporters which export metrics from services like PostgreSQL as Prometheus metrics.

A Prometheus server is used to scrape and store time series data from the SCUMM Agents.

Prometheus is an open-source system monitoring and alerting toolkit originally built at SoundCloud. It is now a standalone open source project and maintained independently.

Prometheus is designed for reliability, to be the system you go to during an outage to allow you to quickly diagnose problems.

How to use SCUMM?

When using ClusterControl, when we select a cluster, we can see an overview of our databases, as well as some basic metrics that can be used to identify a problem. In the below dashboard, we can see a master-slave setup with one master and 2 slaves, with HAProxy and Keepalived.

ClusterControl Overview

If we go to the “Dashboards” option, we can see a message like the following.

ClusterControl Dashboards Disabled

To use this feature, we must enable the agent mentioned above. For this, we only have to press on the "Enable Agent Based Monitoring" button in this section.

ClusterControl Enable Agent Based Monitoring

To enable our agent, we must specify the host where we will install our Prometheus server, which, as we can see in the example, can be our ClusterControl server.

We must also specify:

  • Scrape Interval
カテゴリー: postgresql

Niksa Jakovljevic: How to manage Prometheus high-availability with PostgreSQL + TimescaleDB

2018-10-05(金) 04:14:20

Prometheus has become one of the most commonly used open-source monitoring and alerting systems. What makes Prometheus so great is that it solves monitoring problems in a simple and effective way. Yet there are still challenges with operating Prometheus, including long-term storage and managing high-availability.

In a previous post, we describe how and why to use PostgreSQL + TimescaleDB for enabling long-term storage (as well as for simplifying operations, and adding query power/flexibility) to Prometheus.

Uniting SQL and NoSQL for Monitoring: Why PostgreSQL is the ultimate data store for Prometheus

In this post, we describe how PostgreSQL + TimescaleDB can also help with managing high-availability.

There are several reasons why to add high-availability (HA) to Prometheus deployments:

  • Avoiding data loss and a single point of failure
  • Avoiding missing any critical alerts generated by an Alertmanager
  • Replicating Prometheus data (although the data will not be fully consistent between replicas — more on this later)

Setting up HA for Prometheus requires running two of more identically configured Prometheus servers in parallel. Yet this simple setup does come with limitations.

One problem is that it can be hard to keep data in sync. Often parallel Prometheus instances do not have identical data. There are few reasons for this. One reason is that scrape intervals may differ, since each instance has its own clock. Another reason is that, in case of instance failures, either of the Prometheus nodes may miss some data, which would mean that in the long run none of the Prometheus instances would have a complete data set. In other words, there would be no single source of truth.

Another problem is with remote storage. When running HA Prometheus with remote storage, one can decide to just have a separate remote storage for each Prometheus instance. But this approach, while valid, does introduce other weaknesses: again, no ground truth; the complexity of operating multiple remote storage instances; and hardware costs.


カテゴリー: postgresql

Lukas Fittl: New in Postgres 11: Monitoring JIT performance, Auto Prewarm & Stored Procedures

2018-10-04(木) 21:00:00
Everyone’s favorite database, PostgreSQL, has a new release coming out soon: Postgres 11 In this post we take a look at some of the new features that are part of the release, and in particular review the things you may need to monitor, or can utilize to increase your application and query performance. Just-In-Time compilation (JIT) in Postgres 11 Just-In-Time compilation (JIT) for query execution was added in Postgres 11. It's not going to be enabled for queries by default, similar to parallel…
カテゴリー: postgresql

Bruce Momjian: Moving Tablespaces

2018-10-04(木) 00:00:01

Tablespaces are designed to allow Postgres clusters to be spread across multiple storage devices. CREATE TABLESPACE creates a symbolic link in the pg_tblspc directory in the cluster's data directory pointing to the newly-created tablespace directory.

Unfortunately, though there is a command to move tables and indexes between tablespaces, there is no command to move tablespaces to different directories. However, since Postgres 9.2, the process of moving tablespaces is quite simple:

  1. Record the OID of the tablespace you want to move
  2. Shut down the Postgres cluster
  3. Move the tablespace directory, either within the same file system or to a different file system
  4. Update the OID symbolic link that represents the moved tablespace to the new tablespace directory location
  5. Restart the server

Here's an example of moving a tablespace:

Continue Reading »

カテゴリー: postgresql

Liaqat Andrabi: Webinar : New Features in PostgreSQL 11 [Follow Up]

2018-10-03(水) 21:44:24

PostgreSQL 11, the next major release of the world’s most advanced open source database, is just around the corner. The new release of PostgreSQL will include enhancements in partitioning, parallelism, SQL stored procedures and much more.

To give PostgreSQL enthusiasts a deeper look into the upcoming release, 2ndQuadrant hosted a Webinar discussing the new features in PostgreSQL 11. The webinar was presented by Peter Eisentraut, Core Team Member and Major PostgreSQL Contributor.

If you weren’t able to make it to the live session, you can now view the recording here.

Questions that Peter couldn’t respond to during the live webinar have been answered below.

Q: Could you use a custom hash function for partitioning? (or is this in future plans)

A: This is currently not planned.

Q: As far as i understand, there are still no global indexes on partitioned tables. Am i right?

A: Right, these are only local indexes that get automatically added to new partitions and can be managed more simply. Global indexes would be a separate feature that is not currently on the horizon.

Q: What about commit in function? Is this available as well?

A: Commit and rollback are only possible in procedures.

Q: Will JIT-enabled builds be available through PGDG?

A: Yes, community produced Debian and RPM packages will have LLVM support. It might be in a separate sub-package. But some older operating systems don’t have a recent enough LLVM version, so in that case it won’t be available.

Q: Does JIT inlining work with user-defined operators?

A: Yes. The extension module needs to be built in a certain way to install the associated LLVM bitcode. But the normal extension build system takes care of that.

Q: Are JIT compiled queries cached? So they can be reused either by the same session or another session.

A: No, the result of JIT compilation is not cached. This was considered but was not deemed worthwhile at this time.

Q: What about TDE in PostgreSQL 11?

A: TDE is not in PostgreSQL 11.

Q: Can we expect data at rest encryption enhancements in Postgre

カテゴリー: postgresql

Jobin Augustine: Scaling PostgreSQL using Connection Poolers and Load Balancers for an Enterprise Grade environment

2018-10-03(水) 01:15:30

In the previous blog posts in this series we’ve covered some of the essential aspects of an Enterprise-grade solution: security, high availability, and backups. Another important aspect is the scalability of the solution: as our application grows how do we accommodate an increase in traffic while maintaining the quality of the service (response time)? The answer to this question depends on the nature of the workload at play but it is often shaped around:

(a) improving its efficiency and
(b) increasing the resources available.

Why connection pooling ?

When it comes to improving the efficiency of a database workload, one of the first places we start looking at is the list of slow queries; if the most popular ones can be optimized to run faster then we can easily gain some overall performance back. Arguably, we may look next at the number and frequency of client connections: is the workload composed of a high number of very frequent but short-lived connections? Or are clients connections of a more moderate number, and tend to stick around for longer ?

If we consider the first scenario further–a high number of short lived connections–and that each connection spawns a new OS process, the server may hit a practical limit as to the number of transactions—or connections—it can manage per second, considering the hardware available and the workload being processed. Remember that PostgreSQL is process-based, as opposed to thread-based, which is itself an expensive operation in terms of resources, both CPU and memory.

A possible remedy for this would be the use of a connection pooler, acting as a mediator between the application and the database. The connection pooler keeps a number of connections permanently opened with the database, and receives and manages all incoming requests from clients itself, allowing them to temporarily use one of the connections it already has established with PostgreSQL. This removes the burden of creating a new process each time a client establishes a connection with PostgreSQL, and allows it

カテゴリー: postgresql

Oleg Bartunov: Bloom index for bigint

2018-10-03(水) 01:14:17
Bloom index by default works for int4 and text, but other types with hash function and equality operator could be supported. Just use opclass interface, for example, for type bigint

OPERATOR 1 =(bigint, bigint),
FU>CTION 1 hashint8(bigint);

Now, you can build bloom index for bigint data type.
カテゴリー: postgresql

Alexey Lesovsky: pgCenter - stress free Postgres stats.

2018-10-02(火) 23:16:00
pgCenter has seen a new release with several significant changes. If you used pgCenter before, you might have noticed that I stopped issuing updates for a while. This is because I was working on this new release with several new features and overall improvements that will make the use of the tool easier and will allow additional functionality to make your life easier when working with Postgres stats.

This new version has been completely rewritten using Golang.

Why Go? To me, Golang is much simpler than pure C, but, at the same time, it is an amazing and powerful language. I had a few requests that I was aiming to address in this release and frankly, it was much easier to implement these using Golang.

With this release pgCenter became quite comprehensive toolbox for working with Postgres stats.

In this 0.5.0 version I’ve added two new key tools - for recording stats into history files and building reports based on those files. People who are familiar with pgCenter, often asked me about these features, and now they are available and ready to use.

Additional features
There are also a few additional minor changes that were made in this version of pgCenter:
  • pgCenter’s functionality is now split into subcommands and looks like Git or Perf style. pgCenter obtained new functions that have different aims and distributed across several sub-commands with different execution options and parameters.
  • pgCenter uses lib/pq library for handling the connections to Postgres, and one of the advantages is the built-in support of .pgpass file and environment variables. And the cherry on the cake is that you can override connections settings using PGOPTIONS variable.
pgCenter top changes
As mentioned earlier, pgCenter split into several sub-commands, the “top” is now a dedicated sub-command, so in a new version it has seen some improvements:
  • Activity statistics are shown by default. In previous versions, database statistics were default, but majority of pgCenter users often switched to activity stats, so in this version act
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Foreign Key to partitioned table

2018-10-02(火) 18:02:48
One of the long standing limitations of partitions is that you can't have foreign keys pointing to them. Let's see if I can make it possible to have some kind of constraint that would do the same thing as fkey. First, obviously, we need partitioned table with some data: =$ CREATE TABLE USERS ( id […]
カテゴリー: postgresql

Don Seiler: Sequence Caching: Oracle vs. PostgreSQL

2018-10-02(火) 14:00:00

Many RDBMSes use sequence caching to pre-generate sequence values in advance and store them in memory, allowing them to return new values quickly. If you are doing a lot of inserts that each call the sequence to get the next value, sequence caching is a good thing. Having to wait for the sequence to generate a value on every call could slow things down greatly.

When I made the move from Oracle to PostgreSQL, I noticed some interesting differences in sequence caching behavior and wanted to share them here for those that may be curious about them.

All examples below were conducted in Oracle (12c Enterprise Edition) and PostgreSQL 9.6.9 (via PGDG Yum Repository).

Cache Value Sharing

In Oracle, when a sequence cache is generated, all sessions access the same cache. However in PostgreSQL, each session gets its own cache. We demonstrate this with a couple of quick-and-easy examples below.


First let’s create our sequence, using a cache size of 25.

SQL> CREATE SEQUENCE dts_seq 2 INCREMENT BY 1 3 CACHE 25; Sequence created.

Now I can open a couple of sessions and call nextval on that sequence to increment it from two different sessions:

-- First session SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 1 SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 2 SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 3 -- Second session SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 4 SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 5 SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 6

It’s even more clear when alternating between the two sessions:

-- Session A SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 151 -- Session B SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 152 -- Session A SQL> SELECT dts_seq.nextval FROM dual; NEXTVAL ---------- 153 -- Session B SQL> SELECT dts_seq.nextval FROM dual[...]
カテゴリー: postgresql

Bruce Momjian: Switchover/Failover and Session Migration

2018-10-02(火) 07:45:01

I have already covered switchover and failover. In the case of failover, the old primary is offline so there are no options for migrating clients from the old primary to the new primary. If there is a switchover, there are options for client migration. Assuming you are using streaming replication, only one server can accept writes. Therefore, for client migration, you can either:

  • Force all clients on the old primary to exit, then promote the new primary
  • Wait for all clients on the old primary to exit, then promote the new primary

If you choose force, it will disrupt applications; they must be designed to handle disconnections and potentially reconfigure their sessions, e.g., session variables, cursors, open transactions. If you choose wait, how do you handle clients that want to connect while you are waiting for the existing clients to disconnect? Minor upgrades, which require a restart of the database server, have similar issues.

The only clean solution is to use multi-master replication so new clients can connect to the new primary while waiting for old-primary clients to finish and disconnect. However, it is operationally expensive to support multi-master just to minimize switchover disruption.

カテゴリー: postgresql

Keith Fiske: Per-Table Autovacuum Tuning

2018-10-02(火) 03:43:06

A pattern that seems to drive my blog posts definitely seems to be the frequency of client questions. And that is definitely the case here again. Vacuum tuning to manage bloat and transaction id wraparound on production systems has been a hot topic and lately this has even been getting down to tuning autovacuum on the individual table basis. I’ve already discussed bloat pretty extensively in previous posts. While I’d like to get into the details of transaction ID wraparound, that really isn’t the focus of this post, so I’ll defer you to the documentation.

One setting I will discuss first though is autovacuum_freeze_max_age. Basically when any table’s max XID value reaches this, a more aggressive “emergency” autovacuum is kicked off. If many tables hit this at the same time (a common occurrence with data-warehouses that have many large, sparsely written tables), that can kick off some pretty high and long lasting IO caused by these autovacuums. I highly recommend increasing autovacuum_freeze_max_age from the default value of 200 million to 1 billion. However, I don’t recommend doing this unless you have some monitoring in place to watch for tables reaching both autovacuum_freeze_max_age and wraparound, the latter of which can shut your database down for some pretty extensive downtime.

So ideally, we want autovacuum running often enough on its own so you never encounter any of those bad situations above. The next thing I’d recommend tuning on any database before getting down to the table-level methods, no matter its size or traffic, are the default settings that control when autovacuum initially kicks in.

autovacuum_analyze_scale_factor = .10 autovacuum_analyze_threshold = 50 autovacuum_vacuum_scale_factor = .20 autovacuum_vacuum_threshold = 50

In my opinion, the defaults on these are not ideal for any situation. The scales are too high and the thresholds are too low. The scale factor settings are percentage values that say, “When this percentage of the table’s rows have changed (updates/deletes), run vacuum or ana

カテゴリー: postgresql

Andreas 'ads' Scherbaum: PGConf.EU 2018 - Speaker Interviews

2018-10-01(月) 19:00:00

Like in previous years, we conduct interviews with our speakers ahead of this year's PostgreSQL Conference Europe.

The first interviews are already online: 2018 - Speaker Interviews

If you are speaking in Lisbon, and haven't already submitted your answers, please check your mailbox. There should be an email asking you if you want to participate.

カテゴリー: postgresql

Paul Ramsey: PostGIS Code Sprint 2018 #2

2018-10-01(月) 17:00:00

An important topic of conversation this sprint was what kinds of core PostgreSQL features might make PostGIS better in the future?

Parallel GIST Scan

The PostGIS spatial index is built using the PostgreSQL GIST index infrastructure, so anything that makes GIST scans faster is a win for us. This would be a big win for folks with large tables (and thus deep trees) and who run scans that return a lot of indexed tuples.

Faster GIST Index Building

B-Tree index builds are accellerated by pre-sorting the inputs; could the same trick be used in building GIST indexes? Again, for large tables, GIST index building is slower than B-Tree and “faster” is the #1 feature all existing users want.

Multi-Threading in Functions

This isn’t a feature request, so much as a request for clarification and assurance: PostGIS calls out to other libraries, like GEOS, and it’s possible we could make some of our algorithms there faster via parallel processing. If we do our parallel processing within a function call, so the PostgreSQL thread of execution isn’t doing anything until we return, is it OK for us to do threading? We use pthreads, or maybe OpenMP.

Compressed Datum Slicing

“Detoast datum slice” doesn’t actually get around to the slicing step until after the datum is decompressed, which can make some queries quite slow. We already try to read boxes from the headers of our objects, and for large objects that means decompressing the whole thing: it would be nice to only decompress the first few bytes. I have an ugly patch I will be testing to try and get committed.

Forced Inlining

A problem we have with PostgreSQL right now is that we cannot effectively cost our functions due to the current inlining behaviour on our wrapper functions like ST_Intersects(). When raised on the list, the hackers came to a tentative conclusion that improving the value caching behaviour would be a good way to avoid having inlining in incorrect places. It sounded like subtle and difficult work, and nobody jumped to it.

We propose leaving the current inlining

カテゴリー: postgresql

Paul Ramsey: PostGIS Code Sprint 2018 #1

2018-09-30(日) 17:00:00

When I tell people I am heading to an open source “code sprint”, which I try to do at least once a year, they ask me “what do you do there?”

When I tell them, “talk, mostly”, they are usually disappointed. There’s a picture, which is not unearned, of programmers bent over their laptops, quietly tapping away. And that happens, but the real value, even when there is lots of tapping, is in the high-bandwidth, face-to-face communication.

So, inevitably I will be asked what I coded, this week at the PostGIS Code Sprint and I will answer… “uhhhhh”. I did a branch of PostgreSQL that will do partial decompression of compressed tuples, but didn’t get around to testing it. I tested some work that others had done. But mostly, we talked.

PostGIS 3

Why move to PostGIS 3 for the next release? Not necessarily because we will have any earth-shattering features, but to carry out a number of larger changes. Unlike most PostgreSQL extensions, PostGIS has a lot of legacy from past releases and has added, removed and renamed functions over time. These things are disruptive, and we’d like to do some of the known disruptive things at one time.

Split Vector and Raster

When we brought raster into PostGIS, we included it in the “postgis” extension, so if you CREATE EXTENSION postgis you get both vector and raster features. The rationale was that if we left it optional, packagers wouldn’t build it, and thus most people wouldn’t have access to the functionality, so it wouldn’t get used, so we’d be maintaining unused garbage code.

Even being included in the extension, by and large people haven’t used it much, and the demand from packagers and other users to have a “thin” PostGIS with only vector functionality have finally prevailed: when you ALTER EXTENSION postgis UPDATE TO '3.0.0' the raster functions will be unbundled from the extension. They can then be re-bundled into a “postgis_raster” dependent package and either dropped or kept around depending on user preference.

Remove Minor Version

For users in production, working with

カテゴリー: postgresql

Jobin Augustine: High Availability for Enterprise-Grade PostgreSQL environments

2018-09-29(土) 02:09:33

High availability (HA) and database replication is a major topic of discussion for database technologists. There are a number of informed choices to be made to optimize PostgreSQL replication so that you achieve HA. In this post we introduce an overview of the topic, cover some options available to achieve high availability in PostgreSQL. We’ll then focus in on just one way to implement HA for postgres, using Patroni.  

In our previous blog posts, we have discussed the features available to build a secured PostgreSQL environment and the tools available to help you set up a reliable backup strategy. The series of articles is designed to provide a flavor of how you might go about building an enterprise-grade PostgreSQL environment using open source tools. If you’d like to see this implemented, then please do check our our webinar presentation of October 10 – we think you might find it both useful and intriguing! 

Replication in PostgreSQL

The first step towards achieving high availability is making sure you don’t rely on a single database server: your data should be replicated to at least one standby replica/slave. Database replication can be done using the two options available with PostgreSQL community software:

  1. Streaming replication
  2. Logical replication & logical decoding

When we setup streaming replication, a standby replica connects to the master (primary) and streams WAL records from it. Streaming replication is considered to be one of the safest and fastest methods for replication in PostgreSQL. A standby server becomes an exact replica of the primary with potentially minimal lag between primary and standby even on very busy transactional servers. PostgreSQL allows you to build synchronous and asynchronous replication while in streaming replication. Synchronous replication ensures that a client is given a success message only when the change is not only committed to the master but also successfully replicated on the standby server as well. As standby servers can accept read requests from clients, we can ma

カテゴリー: postgresql

Achilleas Mantzios: The “O” in ORDBMS: PostgreSQL Inheritance

2018-09-28(金) 18:58:00

In this blog entry we’ll go through PostgreSQL inheritance, traditionally one of PostgreSQL’s top features since the early releases. Some typical uses of inheritance in PostgreSQL are:

  • table partitioning
  • multi-tenancy

PostgreSQL until version 10 implemented table partitioning using inheritance. PostgreSQL 10 provides a new way of declarative partitioning. PostgreSQL partitioning using inheritance is a pretty mature technology, well documented and tested, however inheritance in PostgreSQL from a data model perspective is (in my opinion) not so widespread, therefore we’ll concentrate on more classic use cases in this blog. We saw from the previous blog (multi-tenancy options for PostgreSQL) that one of the methods to achieve multi-tenancy is to use separate tables and then consolidate them via a view. We also saw the drawbacks of this design. In this blog we will enhance this design using inheritance.

Introduction to Inheritance

Looking back at the multi-tenancy method implemented with separates tables and views we recall that its major disadvantage is the inability to do inserts/updates/deletes. The moment we try an update on the rental view we’ll get this ERROR:

ERROR: cannot insert into view "rental" DETAIL: Views containing UNION, INTERSECT, or EXCEPT are not automatically updatable. HINT: To enable inserting into the view, provide an INSTEAD OF INSERT trigger or an unconditional ON INSERT DO INSTEAD rule.

So, we would need to create a trigger or a rule on the rental view specifying a function to handle the insert/update/delete. The alternative is to use inheritance. Let us change the schema of the previous blog:

template1=# create database rentaldb_hier; template1=# \c rentaldb_hier rentaldb_hier=# create schema boats; rentaldb_hier=# create schema cars;

Now let’s create the main parent table:

rentaldb_hier=# CREATE TABLE rental ( id integer NOT NULL, customerid integer NOT NULL, vehicleno text, datestart date NOT NULL, dateend date );

In OO terms this table corresponds to the super

カテゴリー: postgresql

Paul Ramsey: 5x Faster Spatial Join with this One Weird Trick

2018-09-28(金) 17:00:00

My go-to performance test for PostGIS is the point-in-polygon spatial join: given a collection of polygons of variables sizes and a collection of points, count up how many points are within each polygon. It’s a nice way of testing indexing, point-in-polygon calculations and general overhead.


First download some polygons and some points.

Load the shapes into your database.

shp2pgsql -s 4326 -D -I ne_10m_admin_0_countries.shp countries | psql performance shp2pgsql -s 4326 -D -I ne_10m_populated_places.shp places | psql performance

Now we are ready with 255 countries and 7343 places.

One thing to note about the countries is that they are quite large objects, with 149 of them having enough vertices to be stored in TOAST tuples.

SELECT count(*) FROM countries WHERE ST_NPoints(geom) > (8192 / 16); Baseline Performance

Now we can run the baseline performance test.

SELECT count(*), FROM countries c JOIN places p ON ST_Intersects(c.geom, p.geom) GROUP BY;

On my laptop, this query takes 25 seconds.

If you stick the process into a profiler while running it you’ll find that over 20 of those seconds are spent in the pglz_decompress function. Not doing spatial algorithms or computational geometry, just decompressing the geometry before handing it on to the actual processing.

Among the things we talked about this week at our PostGIS code sprint have been clever ways to avoid this overhead:

  • Patch PostgreSQL to allow partial decompression of geometries.
  • Enrich our serialization formation to include a unique hash key at the front of geometries.

These are cool have-your-cake-and-eat-too ways to both retain compression for large geometries and be faster when feeding them into the point-in-polygon machinery.

However, they ignore a more brutal and easily testable approach to avoiding decompression: just don’t compress in the first place.

One Weird Trick

PostGIS uses the “main” storage option for it’s geometry type. The main option tries to keep ge

カテゴリー: postgresql