planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 45分 前

Andrew Staller: Space exploration and time-series data: Why the European Space Agency uses PostgreSQL

2018-09-21(金) 01:04:48

Big, small, relational, time-series, & geospatial data — How the European Space Agency uses PostgreSQL & TimescaleDB for the “Digital Library of the Universe”

Our Sun and Heliosphere.

(The following post was written based on a joint talk given at PostgresConf US 2018 between ESA and TimescaleDB.)

Space. That object of endless human curiosity. That setting for countless stories, books, and movies. The subject of large coordinated scientific efforts aimed at exploration and understanding. And a source of a massive amount of data.

The European Space Agency (ESA) is an intergovernmental organization of 22 member states dedicated to the exploration of space for the citizens of the world.

A crucial part of ESA’s efforts is collecting massive amounts of data and putting it to good use. In this post, we take a closer look at one of its departments called the ESDC (“ESA Science Data Center”), and how it chose to standardize on PostgreSQL for its “Digital Library of the Universe”. We also look closely at the role that time-series data plays in space exploration, and how ESDC is using TimescaleDB to harness the power of this data.

Announcing TimescaleDB 1.0: First enterprise-ready time-series database to support full SQL & scale

Developing the Digital Library of the Universe

There are over 20 planetary, heliophysics, and astronomy missions at ESA. ESDC is tasked with archiving the data from all of them into a “Digital Library of the Universe.”

This universal archive serves as a repository of every mission history, hosting data for all research needs with the main goal of serving the public and scientific communities. ESDC makes this archive available by providing scientific teams all around the world privileged access to this data.

The Jedi Library: What we picture when we think of ESA’s “Digital Library of the Universe”

When it came to developing this “Digital Library”, ESDC had to find a database system that was reliable but would also provide the broadest access to the general scientific community at large.

In the p

カテゴリー: postgresql

Craig Kerstiens: Postgres 11 - A First Look

2018-09-21(金) 00:00:29

Postgres 11 is almost here, in fact the latest beta shipped today, and it features a lot of exciting improvements. If you want to get the full list of features it is definitely worth checking out the release notes, but for those who don’t read the release notes I put together a run down of some what I consider the highlight features.

Quitting Postgres

This is a small usability feature, but so long over due. Now you can quit Postgres by simply typing quit or exit. Previously you had to use Ctrl + D or \q. As a begginer it’s one thing to jump into a psql terminal, but once in if you can’t figure out how to quit it’s a frustrating experience. Small usability features, such as this and watch in an earlier release, are often lost in the highlighted features which talk about performance or new data types. Improvements like this really go a long way for making Postgres a better database for everyone.

Fear column addition no more

Brandur had a great in depth write-up on this feature already, but it falls somewhere into the category of the above as well as a performance improvement. Previously when you added a new column that was NOT NULL with a default value Postgres would have to take a lock and re-write the entire table. In a production environment on any sizable table for all practical purposes the result was an outage. The work around was to break your migrations apart to be a several step process.

With Postgres 11 you can add a new column to a table that is not null with a default value. The new row will get materialized on your database without requiring a full re-write. Here is to having to think less about your migrations.

Of course performance is a highlight

No Postgres release would be complete without some performance improvements. This release there are really two areas that feature key improvements around performance.

Parallelism continuing to mature

We first saw parallelism support back in PostgreSQL 9.6. At the time it was primarily for sequential scans, which if you used parallelism for your sequential sc

カテゴリー: postgresql

Álvaro Herrera: Partitioning Improvements in PostgreSQL 11

2018-09-20(木) 04:29:27

A partitioning system in PostgreSQL was first added in PostgreSQL 8.1 by 2ndQuadrant founder Simon Riggs. It was based on relation inheritance and used a novel technique to exclude tables from being scanned by a query, called “constraint exclusion”. While it was a huge step forward at the time, it is nowadays seen as cumbersome to use as well as slow, and thus needing replacement.

In version 10, it was replaced thanks to heroic efforts by Amit Langote with modern-style “declarative partitioning”. This new tech meant you no longer needed to write code manually to route tuples to their correct partitions, and no longer needed to manually declare correct constraints for each partition: the system did those things automatically for you.

Sadly, in PostgreSQL 10 that’s pretty much all it did. Because of the sheer complexity and the time constraints, there were many things in the PostgreSQL 10 implementation that were lacking. Robert Haas gave a talk about it in Warsaw’s PGConf.EU.

Many people worked on improving the situation for PostgreSQL 11; here’s my attempt at a recount. I split these in three areas:

  1. New partitioning features
  2. Better DDL support for partitioned tables
  3. Performance optimizations.
New Partitioning Features

In PostgreSQL 10, your partitioned tables can be so in RANGE and LIST modes. These are powerful tools to base many real-world databases on, but for many others designs you need the new mode added in PostgreSQL 11: HASH partitioning. Many customers need this, and Amul Sul worked hard to make it possible. Here’s a simple example:


It is not mandatory to use the same modulus value for all partitions; this lets you create more partitions later and r

カテゴリー: postgresql

Craig Kerstiens: Use cases for followers (read replicas) in Postgres

2018-09-20(木) 02:04:00

Citus extends Postgres to be a horizontally scalable database. By horizontally scalable, we mean the data is spread across multiple machines, and you’re able to scale not only storage but also memory and compute—thus providing better performance. Without using something like Citus to transform PostgreSQL into a distributed database, sure you can add read replicas to scale, but you’re still maintaining a single copy of your data. When you run into scaling issues with your Postgres database, adding a read replica and offloading some of your traffic to your read replica is a common bandaid to slow down the bleeding, but it is only a matter of time until even that doesn’t work any further. Whereas with Citus, scaling out your database is as simple as dragging a slider and rebalancing your data.

Are read replicas still useful with horizontally scalable databases?

But that leaves a question, are read-replicas still useful? Well, sure they are.

In Citus Cloud (our fully-managed database as a service), we have support for read replicas, in our case known as followers. Follower clusters leverage much of our same underlying disaster recovery infrastructure that forks leverage, but support a very different set of use cases.

Previously we talked about how forks can be used in Citus Cloud to get a production set of data over to staging that can help with testing migrations, rebalancing, or SQL query optimization. Forks are often used as a one-off for a short period of time. In contrast, followers are often long running Citus database clusters that can be a key mechanism to run your business, helping you get insights when you need them.

Follower cluster is often only a few seconds (if any) behind your primary database cluster

A follower cluster receives all updates from the primary Citus cluster in an asynchronous fashion. Often followers are only a few seconds (if any) behind your primary database cluster, though can lag at times by a few minutes. Followers have a full copy of your data, but reside on a separate cluster.

カテゴリー: postgresql

Laurenz Albe: Correlation of PostgreSQL columns explained

2018-09-19(水) 17:00:49

After you ANALYZE a PostgreSQL table to collect value distribution statistics, you will find the gathered statistics for each column in the pg_stats system view. This article will explain the meaning of the correlation column and its impact on index scans.

Physical vs. logical ordering

Most common PostgreSQL data types have an ordering: they support the operators <, <=, =, >= and >.
Such data types can be used with a B-tree index (the “standard” index type).

The values in a column of such a type provide a logical ordering of the table rows. An index on this column will be sorted according to that ordering.

A PostgreSQL table consists of one or more files of 8KB blocks. The order in which the rows are stored in the file is the physical ordering.
You can examine the physical ordering of the rows by selecting the ctid system column: it contains the block number and the item number inside the block, which describe the physical location of the table row.


The correlation for a column is a value between -1 and 1. It tells how good the match between logical and physical ordering is.

  • If the correlation is 1, the rows are stored in the table file in ascending column order; if it is -1, they are stored in descending order.
  • Values between -1 and 1 mean a less perfect match.
  • A value of 0 means that there is no connection between the physical and the logical order.
Why should I care?

You will create indexes on your tables for faster access (but not too many!).
The correlation of a column has an impact on the performance of an index scan.

During an index scan, the whole index or part of it are read in index sequential order. For each entry that is found, the corresponding row is fetched from the table (this is skipped in an “index only scan”, but that is a different story).

If the correlation of the indexed column is close to zero, the fetched rows will be from all over the table. This will result in many randomly distributed reads of many different table blocks.

However, if the correlation is close to 1 or -1,

カテゴリー: postgresql

Sebastian Insausti: Custom Graphs to Monitor your MySQL, MariaDB, MongoDB and PostgreSQL Systems - ClusterControl Tips & Tricks

2018-09-18(火) 22:26:07

Graphs are important, as they are your window onto your monitored systems. ClusterControl comes with a predefined set of graphs for you to analyze, these are built on top of the metric sampling done by the controller. Those are designed to give you, at first glance, as much information as possible about the state of your database cluster. You might have your own set of metrics you’d like to monitor though. Therefore ClusterControl allows you to customize the graphs available in the cluster overview section and in the Nodes -> DB Performance tab. Multiple metrics can be overlaid on the same graph.

Cluster Overview tab

Let’s take a look at the cluster overview - it shows the most important information aggregated under different tabs.

Cluster Overview Graphs

You can see graphs like “Cluster Load” and “Galera - Flow Ctrl” along with couple of others. If this is not enough for you, you can click on “Dash Settings” and then pick “Create Board” option. From there, you can also manage existing graphs - you can edit a graph by double-clicking on it, you can also delete it from the tab list.

Dashboard Settings

When you decide to create a new graph, you’ll be presented with an option to pick metrics that you’d like to monitor. Let’s assume we are interested in monitoring temporary objects - tables, files and tables on disk. We just need to pick all three metrics we want to follow and add them to our new graph.

New Board 1

Next, pick some name for our new graph and pick a scale. Most of the time you want scale to be linear but in some rare cases, like when you mix metrics containing large and small values, you may want to use logarithmic scale instead.

New Board 2

Finally, you can pick if your template should be presented as a default graph. If you tick this option, this is the graph you will see by default when you enter the “Overview” tab.

Once we save the new graph, you can enjoy the result:

New Board 3 Node Overview tab

In addition to the graphs on our cluster, we can also use this functionality on each of o

カテゴリー: postgresql

Jonathan Katz: Why Covering Indexes Are Incredibly Helpful

2018-09-17(月) 20:55:00

The PostgreSQL 11 release is nearly here (maybe in the next couple of weeks?!), and while a lot of the focus will be on the improvements to the overall performance of the system (and rightly so!), it's important to notice some features that when used appropriately, will provide noticeable performance improvements to your applications.

One example of such feature is the introduction of "covering indexes" for B-tree indexes. A covering index allows a user to perform an index-only scan if the looks in the query match the columns that are included in the index. You can specify the additional columns for the index using the "INCLUDE" keyword, e.g.

CREATE INDEX a_b_idx ON x (a,b) INLCUDE (c);

Theoretically, this can reduce the amount of I/O your query needs to use in order to retrieve information (traditionally, I/O is the biggest bottleneck on database systems). Additionally, the data types including in a covering index do not need to be B-tree indexable; you can add any data type to the INCLUDE part of a CREATE INDEX statement.

However, you still need to be careful for how you deploy covering indexes: each column you add to the index still takes up space on disk, and there is still a cost for maintaining the index, for examples, on row updates.

Understanding these trade offs, you can still apply covering indexes in very helpful ways that can significantly help your applications.

A Simple Example: Tracking Coffee Shop Visits
カテゴリー: postgresql

brian davis: Cost of a Join - Part 2: Enums, Wider Tables

2018-09-17(月) 14:00:00

A follow-up to the previous post where the performance of queries with many joins is investigated.

Great discussion on hacker news and r/programming brought up a couple ideas I hadn't considered.

  1. What about enums?
  2. What about tables with more columns?

I also figured it would be neat if other people could run these benchmarks with their own parameters / hardware.

So I adjusted my script to support enums and wider tables, and packaged it up into a tool anyone can use. It supports three different join types: enum, foreign keys, and what I'm calling "chained". The benchmark definitions are described in a json file which looks like this:

$ cat input.json [ { "join-type": "chained", "max-tables": 10, # Queries will start by joining 2 tables, increasing by one until all tables are joined. Number of tables joined will be the X axis on the plot. "max-rows": 10000, # Benchmarks will be performed at 10 rows, 100 rows, etc. until max-rows is reached, creating a separate line on the plot for each. "extra_columns": 2, "max_id": 5, "create-indexes": true, "output-filename": "benchmark_1", "plot-title": "My Chained Benchmark Title" }, { "join-type": "enums", "max-rows": 10000, # Benchmarks will be performed at 10 rows in the primary table, increasing by a factor of 10 until max-rows is reached "max-enums": 100, # Queries will start by selecting (and optionally filtering by) 1 enum column, increasing by one until max-enums is reached "possible-enum-values": 10, "extra-columns": 2, "where-clause": true, "output-filename": "benchmark_1", "plot-title": "My Enum Benchmark Title" }, { "join-type": "foreign-keys", "max-primary-table-rows": 10000, # Benchmarks will be performed at 10 rows in the primary table, increasing by a factor of 10 until max-rows is reached "max-fk-tables": 100, # Queries will start by selecting from (and optionally filtering by) 1 fo[...]
カテゴリー: postgresql

Regina Obe: PostGIS 2.5.0rc2

2018-09-16(日) 09:00:00

The PostGIS development team is pleased to release PostGIS 2.5.0rc2.

Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11beta3+ and GEOS 3.7.0 which were released recently.

Best served with PostgreSQL 11beta3.


Changes since PostGIS 2.5.0rc1 release are as follows:

  • 4162, ST_DWithin documentation examples for storing geometry and radius in table (Darafei Praliaskouski, github user Boscop).
  • 4163, MVT: Fix resource leak when the first geometry is NULL (Raúl Marín)
  • 4172, Fix memory leak in lwgeom_offsetcurve (Raúl Marín)
  • 4164, Parse error on incorrectly nested GeoJSON input (Paul Ramsey)
  • 4176, ST_Intersects supports GEOMETRYCOLLECTION (Darafei Praliaskouski)
  • 4177, Postgres 12 disallows variable length arrays in C (Laurenz Albe)
  • 4160, Use qualified names in topology extension install (Raúl Marín)
  • 4180, installed liblwgeom includes sometimes getting used instead of source ones (Regina Obe)

View all closed tickets for 2.5.0.

After installing the binaries or after running pg_upgrade, make sure to do:


— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE; ALTER EXTENSION postgis_topology UPDATE; ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

カテゴリー: postgresql

Bruce Momjian: Postgres 11 Features Presentation

2018-09-15(土) 03:30:01

Now that I have given a presentation about Postgres 11 features in New York City, I have made my slides available online.

カテゴリー: postgresql

Avinash Kumar: PostgreSQL Webinar Wed Oct 10th – Enterprise-Grade PostgreSQL: Built on Open Source Tools

2018-09-15(土) 02:23:22

Please join Percona’s PostgreSQL Support Technical Lead,  Avinash Vallarapu; Senior Support Engineer, Fernando Laudares; and Senior Support Engineer, Jobin Augustine, on Wednesday, October 10th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4), as they demonstrate an enterprise-grade PostgreSQL® environment built using a combination of open source tools and extensions.

Register Now

“We built our application on top of PostgreSQL. It works great but only now that we took it to the market and it became a hit we realize how much it relies on the database. How can we “harden” PostgreSQL? How can we make the solution we built around PostgreSQL enterprise-grade?”

“I migrated from a proprietary database software to PostgreSQL. I am curious to know whether I can get the same features I used to have in the proprietary database software.”

You’ll find the answer to these questions and more in a series of blog posts we will be publishing on this topic, which will be followed by a live demo we planned for our webinar on October 10th, 2018.

The market coined the term “enterprise grade” or “enterprise ready” to differentiate products and service offerings for licensed database software. For example: there may be a standard database software or an entry-level package that delivers the core functionality and basic features. Likewise, there may be an enterprise version, a more advanced package which goes beyond the essentials to include features and tools indispensable for running critical solutions in production. With such a differentiation found in commercial software, we may wonder whether a solution built on top of an open source database like PostgreSQL can satisfy all the enterprise requirements.

It starts with building a secured PostgreSQL environment, tuning the database for the production workload, building a high availability strategy that avoids single-point-of-failures, scaling PostgreSQL using connection poolers to avoid excessive usage of server resources, and finally load balancing the reads between master and all

カテゴリー: postgresql

Sebastian Insausti: How to Deploy PostgreSQL for High Availability

2018-09-14(金) 17:56:00

Nowadays, high availability is a requirement for many systems, no matter what technology we use. This is especially important for databases, as they store data that applications rely upon. There are different ways to replicate data across multiple servers, and failover traffic when e.g. a primary server stops responding.


There are several architectures for PostgreSQL high availability, but the basic ones would be master-slave and master-master architectures.


This may be the most basic HA architecture we can setup, and often times, the more easy to set and maintain. It is based on one master database with one or more standby servers. These standby databases will remain synchronized (or almost synchronized) with the master, depending on whether the replication is synchronous or asynchronous. If the main server fails, the standby contains almost all of the data of the main server, and can quickly be turned into the new master database server.

We can have two categories of standby databases, based on the nature of the replication:

  • Logical standbys - The replication between the master and the slaves is made via SQL statements.
  • Physical standbys - The replication between the master and the slaves is made via the internal data structure modifications.

In the case of PostgreSQL, a stream of write-ahead log (WAL) records is used to keep the standby databases synchronized. This can be synchronous or asynchronous, and the entire database server is replicated.

From version 10, PostgreSQL includes a built in option to setup logical replication which is based on constructing a stream of logical data modifications from the information in the WAL. This replication method allows the data changes from individual tables to be replicated without the need of designating a master server. It also allows data to flow in multiple directions.

But a master-slave setup is not enough to effectively ensure high availability, as we also need to handle failures. To handle failures, we need to be able t

カテゴリー: postgresql

Liaqat Andrabi: Webinar : Database Security in PostgreSQL [Follow Up]

2018-09-13(木) 21:26:04

Database security is an increasingly critical topic for any business handling personal data. Data breach can have serious ramifications for an organization, especially if the proper security protocols are not in place.

There are many ways to harden your database. As an example PostgreSQL addresses security using firewalls, encryption and authentication levels among other ways.

2ndQuadrant hosted a webinar on Database Security in PostgreSQL to highlight security concepts, features and architecture. The webinar was presented by Kirk Roybal, Principal Consultant at 2ndQuadrant – the recording is now available here.

Some of the questions that Kirk responded to are listed below:

Q1: What are your thoughts on performance of row-level security vs. doing that filtering via WHERE at the application level and how that affects development? I.E. now that you’re filtering via DB capabilities you lose the visibility of that being done at the application level – it becomes a bit of a black box of “it just works” for the development team.

A1: The PostgreSQL query parser is involved in evaluating the constraint either way. Since this is mostly dependent on PostgreSQL, there will be very little or no measurable difference in performance. Putting the security in the database has the advantage of being modifiable without changes to the application layer.

Q2: Do you have any suggestions for encrypting data at rest?

A2: PostgreSQL provides pgcrypto as an extension. PostgreSQL also allows you to create your own datatypes, operators and aggregates. Put the two together and you have encryption at rest.

Q3: Is it possible to configure Azure AD authentication too?

A3: Yes, if you create a bare Linux machine, you can configure anything you want.

Q4: Do you support performance tuning on AWS RDS Postgres?

A4: Yes, we do provide the Performance Tuning service for RDS. Because of the closed nature of the system, however, there might be some advanced settings that we won’t be able to tune.

Q5: What are the main differences between the PostgreS

カテゴリー: postgresql

Christophe Pettus: “Securing PostgreSQL” at PDXPUG PostgreSQL Day 2018

2018-09-13(木) 12:22:26

The slides from my presentation, Securing PostgreSQL at PDXPUG PostgreSQL Day 2018 are now available.

カテゴリー: postgresql

Joe Abbate: The Future of Pyrseas, revisited

2018-09-13(木) 10:55:02

Over two years ago, I lamented that this blog had remained silent for too long and about the lack of development activity on the Pyrseas project. I also announced immediate and longer term plans. Pyrseas 0.8 was finally released before the end of last year.

From time we get interest in Pyrseas, sometimes from people who use it in unexpected ways. However, my own interest in developing it further and actual commits have been declining considerably. I probably spend less than four hours a week on it. Therefore, I’ve decided to put the project on life support. I will attempt to notify owners of open issues via GitHub as to this status. I may continue to work on some issues or enhancements but on an even more reduced scale. If requested, I’ll make maintenance releases as needed. Should any enterprising developer want to take it over, I’ll gladly consider handing the reins.

Thanks to everyone who has contributed, inquired or shown interest in Pyrseas over the past eight years. In particular, I’d like to extend my gratitude to Daniele Varrazzo (not only for his contributions to Pyrseas but also for his work on Psycopg2), Roger Hunwicks and Josep Martínez Vila.

カテゴリー: postgresql

Bruce Momjian: Multi-Host Pg_dump

2018-09-13(木) 02:00:01

You have probably looked at logical dumps as supported by pg_dump and restores by pg_restore or, more simply, psql. What you might not have realized are the many options for dumping and restoring when multiple computers are involved.

The most simple case is dumping and restoring on the same server:

$ pg_dump -h localhost -Fc test > /home/postgres/dump.sql $ pg_restore -h localhost test < /home/postgres/dump.sql

Continue Reading »

カテゴリー: postgresql

Ajay Kulkarni: Announcing TimescaleDB 1.0: First enterprise-ready time-series database to support full SQL & scale

2018-09-12(水) 21:45:40
Announcing TimescaleDB 1.0: The first enterprise-ready time-series database to support full SQL and scale

Over 1M downloads; production deployments at Comcast, Bloomberg, Cray, and more; native Grafana integration; first-class Prometheus support; and dozens of new features signify positive momentum for TimescaleDB and the future of the time-series market

Today, we are excited to officially announce the first release candidate for TimescaleDB 1.0.

If you work in the software industry, you already know that 1.0 announcements generally signify that your product is “production-ready.”

Ironically, just last week we got this question on Twitter from a TimescaleDB user, who founded a weather mapping company:

@TimescaleDB Any reason why Timescale is not yet version 1.0.0? We use it for a few months under heavy load and so far no problems.

 — @ilblog

Yes, our 1.0 release is a little overdue as we’ve actually been production-ready for quite some time now.

Today, just a year and a half after our launch in April 2017, businesses large and small all over the world trust TimescaleDB for powering mission-critical applications including industrial data analysis, complex monitoring systems, operational data warehousing, financial risk management, geospatial asset tracking, and more.

“At Bloomberg, we have millions of data feeds and trillions of data points dating back over 100 years. My team and I have been extremely pleased with TimescaleDB’s capability to accommodate our workload while simplifying geo-financial analytics and data visualization. If you are looking to support large scale time-series datasets, then TimescaleDB is a good fit.” Erik Anderson, Lead Software Engineer at Bloomberg From 0 to over 1 million downloads in less than 18 months

Since our launch, we’ve experienced some significant momentum:

カテゴリー: postgresql

pgCMH - Columbus, OH: Inside PG 11

2018-09-12(水) 13:00:00

The Sep meeting will be held at 18:00 EST on Tues, the 25th. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.


CoverMyMeds’ very own Andy will be presenting this month. He’s going to tell us all about the upcoming major PostgreSQL release 11. You’ll definitely want to attend this one to find out all that’s new and cool in the upcoming release!


CoverMyMeds has graciously agreed to validate your parking if you use their garage so please park there:

You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive.

Park in any space that is not marked ‘24 hour reserved’.

Once parked, take the elevator/stairs to the 3rd floor to reach the Miranova lobby. Once in the lobby, the elevator bank is in the back (West side) of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11th floor. (If the elevator won’t let you pick the 11th floor, contact Doug or CJ (info below)). Once you exit the elevator, look to your left and right; one side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space:

The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079 or CJ at +1.740.407.7043

カテゴリー: postgresql

Dimitri Fontaine: PostgreSQL 11 and Just In Time Compilation of Queries

2018-09-12(水) 01:34:00

PostgreSQL 11 is brewing and will be released soon. In the meantime, testing it with your own application is a great way to make sure the community catches all the remaining bugs before the dot-zero release.

One of the big changes in the next PostgreSQL release is the result of Andres Freund’s work on the query executor engine. Andres has been working on this part of the system for a while now, and in the next release we are going to see a new component in the execution engine: a JIT expression compiler!

Benchmarks and TPC-H

Benchmarks are a great tool to show where performance improvements provide a benefit. The JIT expression compiler currently works best in the following situation:

  • the query contains several complex expression such as aggregates.
  • the query reads a fair amount of data but isn’t starved on IO resources.
  • the query is complex enough to warrant spending JIT efforts on it.

A query that fetches some information over a primary key surrogate id would not be a good candidate to see the improvements given by the new JIT infrastructure in PostgreSQL.

The TPC-H benchmark Q1 query is a good candidate for measuring the impact of the new executor stack at its best, so that’s the one we’re using here.

The specifications of the benchmark are available in a 137 pages PDF document named TPC Benchmark™ H. Each query in this specification comes with a business question, so here’s Q1:

Pricing Summary Report Query (Q1)

This query reports the amount of business that was billed, shipped, and returned.

The Pricing Summary Report Query provides a summary pricing report for all lineitems shipped as of a given date. The date is within 60 - 120 days of the greatest ship date contained in the database. The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS, and listed in ascending order of RETURNFLAG and LINESTATUS. A count of the number of lineitems

カテゴリー: postgresql

Haroon .: PostgreSQL for IoT Data Retention and Archiving

2018-09-11(火) 16:08:38

We do understand that IoT revolution is resulting in enormous amounts of data. With brisk data growth where data is mostly time series append-only, relational databases and DBAs have a rather tough task to store, maintain, archive and in some cases get rid of the old data in an efficient manner. In my previous posts, I talked about various strategies and techniques around better scalability for data storage using PostgreSQL and Postgres-BDR extension. Data retention is becoming ever so important. So let’s see what PostgreSQL 10 and above have to offer to efficiently manage your data retention needs.

PostgreSQL has supported time based partitioning in some form for quite some time. However, it wasn’t part of the core PostgreSQL. PostgreSQL 10 made a major improvement in this area by introducing declarative partitioning.

As Postgres-BDR runs as an extension on top of PostgreSQL 10 and above, we get all partitioning features and improvements of PostgreSQL 10 and above in Postgres-BDR as well. So from the implementation perspective in the IoT domain, you can now create and manage partitions over time and space in PostgreSQL for multimaster environments using Postgres-BDR extension.

So let’s see how easy it is to implement data retention policies with PostgreSQL’s partitioning improvements.

CREATE TABLE iot_temperature_sensor_data ( ts timestamp without time zone, device_id text, reading float ) PARTITION BY RANGE (ts);

PARTITION BY RANGE tells PostgreSQL that we are partitioning this table by range using column logdate.

Let’s create some partitions

CREATE TABLE iot_temperature_sensor_data_2018_february PARTITION OF iot_temperature_sensor_data FOR VALUES FROM ( '2018-02-01') TO ( '2018-03-01' ); CREATE TABLE iot_temperature_sensor_data_2018_march PARTITION OF iot_temperature_sensor_data FOR VALUES FROM ( '2018-03-01') TO ( '2018-04-01' ); CREATE TABLE iot_temperature_sensor_data_2018_april PARTITION OF iot_temperature_sensor_data FOR VALUES FROM ( '2018-04-01') TO ( [...]
カテゴリー: postgresql