Pavan Deolasee: [Video] Power of Indexing in PostgreSQL

planet postgresql - 2018-09-26(水) 02:27:06

The video of my presentation below walks you through ‘Indexing in PostgreSQL’ – a key component to database performance.

This presentation covers the following topics:

  • Various ways to access data in PostgreSQL
  • Different types of indexes supported by PostgreSQL
  • Internals of BTree and BRIN indexes
  • Overview of GIN and GiST indexes
  • How to find missing indexes
  • How to find unused indexes
カテゴリー: postgresql

Jobin Augustine: PostgreSQL Backup Strategy for an Enterprise-Grade Environment

planet postgresql - 2018-09-26(水) 02:13:16

In this post we cover the methods used to achieve an enterprise-grade backup strategy for the PostgreSQL® cluster. In setting up our demonstration system, we use pg_basebackup and continuous archiving. The size of the database and your database environment—if it is on colo or cloud—matters. It is very important to understand how we can ensure minimalistic or no data loss at the time of disasters using our preferred backup tools.

As discussed in the introductory blog post, in our webinar of October 10, 2018 we highlight important aspects an enterprise should consider for their PostgreSQL environments. In our last blog post we looked at security.

Backing up…

The ability to recover a database to a certain point in time is always the ultimate aim of a backup strategy. Without a recoverable backup, you have no backup! So there are always two parts to a backup strategy: backup AND recovery. The backup tool you use to achieve your aims may vary. Let’s discuss some of the options.


The community version of PostgreSQL comes with a trusted backup tool called pg_basebackup. This utility can help you take a consistent online binary backup that is very fast, can be used for point-in-time-recovery, and also for setting up slaves/replicas.

Archiving and continuous archiving

Postgres does not flush data pages to disk upon every transaction commit. To ensure that PostgreSQL achieves durability and endures crash recovery, changes are written to transactions logs (a.k.a. WALs, Write-Ahead Logs) stored on disk.

WALs in PostgreSQL are similar to transaction log files in the InnoDB storage engine for MySQL. However, WAL files are recycled according to the values set to the parameters wal_keep_segments and max_wal_size. Hence, if WAL files are not copied to a safe location—such as a backup server or another file system—it won’t be possible to achieve point-in-time-recovery (PITR).

In order to archive WAL segments to a safe location, the parameter archive_mode must be set to ‘ON’ and we must pass an appropriate shell comma

カテゴリー: postgresql

Joshua Drake: PostgresConf 2019: Back on the island and call for papers

planet postgresql - 2018-09-26(水) 02:04:00
We are pleased to announce that PostgresConf 2019 will be held at the Sheraton Times Square March 18th - 22nd, 2019. 
The call for papers is now open!
Following on the success of PostgresConf 2018, 2019 will include five days with added training and partner summits.

March 18th and 19th will have immersive training. Instructors are encouraged to submit half and full day material for consideration. The preferred topics are centered around Postgres but anything People, Postgres, or Data related will be considered.

Monday, March 18th through Friday, March 22nd will host several partner summits, including popular and upcoming topics within the Postgres community and the annually hosted Regulated Industry Summit. Break out sessions will be held from Wednesday - Friday.

Important Dates:
  • Call for Papers Open: 09/12/2018 
  • Call for Papers Close: 01/11/2019 
  • Confirmation/Acceptance: 01/18/2019
Can't wait until March? Join us at our West Coast event, PostgresConf Silicon Valley, October 15th and 16th, 2018 at the Hilton San Jose.
About PostgresConf:
PostgresConf is a global nonprofit conference series with a focus on growing community through increased awareness and education of Postgres and related technologies. PostgresConf is known for its highly attended national conference with the mission of:

カテゴリー: postgresql

Vasilis Ventirozos: Logical Replication in aws RDS

planet postgresql - 2018-09-25(火) 22:51:00
Recently Amazon annnounced that postgres 10 supports logical replication syntax and that it can work as either a publisher or a subscriber. The announcement can be found here.

In order to test it out i made 2 10.4 RDS instances. Set them up to be in the same security,
subnet and parameter groups so i ended up with something like this :

user : vasilis password : Ap4ssw0rd db instance : lrnode1 & 2 dbname : lrtest port : 5432
only thing i had to change in order to enable logical replication is to change rds.enable_logical_replication to 1 in my parameter group.  
After verifying connectivity
psql -h -U vasilis -d lrtest -W psql -h -U vasilis -d lrtest -W
i created a dummy table on my publisher and subscriber (lrnode1 and 2)
create table lr_test_tbl (     id int primary key,     name text );
on lrnode1 (publisher) i created a publication CREATE PUBLICATION lr_pub FOR TABLE lr_test_tbl ;

on lrnode2 and as vasilis i created the subscription CREATE SUBSCRIPTION lr_sub CONNECTION '  dbname=lrtest user=vasilis password=Ap4ssw0rd' PUBLICATION lr_pub ;
There only 3 things you need to know, you need to adjust your subnet / security groups allowing instances to see each other, you have to change the RDS.enable_logical_replication parameter  and that the user you are supposed to use for this is the user you set when you created the instance.   That was it, pretty fast and simple.
Vasilis Ventirozos OmniTI Computer Consulting Inc.
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: pg_terminator released

planet postgresql - 2018-09-25(火) 19:04:07
I just released first version of pg_terminator. It's a tool that is supposed to be run on PostgreSQL db server, monitor a database, and cancel or terminate offending queries/connections. It can be used, for example to: kill leftover psql sessions that are not doing anything cancel too long queries on production servers kill connections that […]
カテゴリー: postgresql

Douglas Hunley: Monitoring pgBackRest with tail_n_mail

planet postgresql - 2018-09-24(月) 23:58:22

After a lively discussion at work today about monitoring tools and uses cases, I decided to see if I could use tail_n_mail, which I already use to monitor my PostgreSQL logs, to monitor my pgBackRest logs. It turns out that it can, and can do so fairly trivially.

For reference, our .tail_n_mail.conf looks like this:

## Config file for the tail_n_mail program ## This file is automatically updated LOG_LINE_PREFIX: %t P%p EMAIL: MAILSUBJECT: HOST pgBackRest errors NUMBER INCLUDE: WARN: INCLUDE: ERROR: INCLUDE: FATAL: INCLUDE: PANIC: FILE1: /var/log/pgbackrest/pgbackrest.log

You would need to change the EMAIL and FILE1 and the rest should just work.

To actually invoke the monitoring, simply add a cron entry that looks like:

* * * * * /path/to/tail_n_mail ~/.tailnmail.conf --pgmode=0

(Obviously, you should adjust the periodicity. I highly doubt you need to check every minute.)

Once you’ve saved the crontab, you should be all good. If you doubt the setup, you can add --mailzero to the invocation to get an email even if everything is OK.

See? Easy.

カテゴリー: postgresql

Luca Ferrari: pgenv get configuration!

planet postgresql - 2018-09-23(日) 22:01:43

I have already written about a very useful and powerful small pearl by theory: pgenv. Now the tool does support for user configuration!

pgenv get configuration!

I spent some time implementing a very rudimental approach to configuration for pgenv. The idea was simple: since the program is a single Bash script, the configuration can be done using a single file to source variables in.

But before this was possible, I had to do a little refactoring over here and there in order to make all the commands behave smooth across the configuration. And at least, it seems to work, with some parts that can be improved and implemented better (as always it is!). However, I designed from scratch to support every single version of PostgreSQL, that means configuration could be different depending on the specific version you are running. This allows, for example, to set particular flags for ancient versions, without having to get crazy when switching to more recent ones.

Now pgenv supports a config command that, in turn, support for several subcommands:

  • write to store the configuration in an hidden file named after the PostgreSQL version (e.g., .pgenv.10.5.conf);
  • edit just to launch you $EDITOR to manipulate the configuration;
  • delete to remove a configuration file;
  • show to dump the configuration.

The idea is simple: each time a new PostgreSQL version is built, a configuration file is created for such instance. You can then customize the file in order to make pgenv behave differently for that particular version of PostgreSQL. As an example, you can set different languages (e.g., PL/Perl) or different startup/stop modes. If the configuration file for a particular version is not found, a global configuration is loaded. If neither that is...

カテゴリー: postgresql

Regina Obe: PostGIS 2.5.0

planet postgresql - 2018-09-23(日) 09:00:00

The PostGIS development team is pleased to release PostGIS 2.5.0.

Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11beta4+ and GEOS 3.7.0 which were released recently.

Best served with PostgreSQL 11 beta4 and pgRouting 2.6.1.

WARNING: If compiling with PostgreSQL+JIT, LLVM >= 6 is required Supported PostgreSQL versions for this release are: PostgreSQL 9.4 - PostgreSQL 12 (in development) GEOS >= 3.5


Continue Reading by clicking title hyperlink ..
カテゴリー: postgresql

Chris Travers: PostgreSQL at 20TB and Beyond Talk (PGConf Russia 2018)

planet postgresql - 2018-09-22(土) 00:48:00
It came out a while ago but I haven't promoted it much yet.

This is the recorded version of the PostgreSQL at 20TB and Beyond talk.  It covers a large, 500TB analytics pipeline and how we manage data.

For those wondering how well PostgreSQL actually scales, this talk is worth watching.
カテゴリー: postgresql

Chris Travers: Thoughts on the Code of Conduct Controversy

planet postgresql - 2018-09-22(土) 00:37:00
My overall perspective here is that the PostgreSQL community needs a code of conduct, and one which allows the committee to act in some cases for off-infrastructure activity, but that the current code of conduct has some problems which could have been fixed if efforts had been better taken ensure that feedback was gathered when it was actionable.

This piece discusses what I feel was done poorly but also what was done well and why, despite a few significant missteps, I think PostgreSQL as a project is headed in the right direction in this area.

But a second important point here is to defend the importance of a code of conduct to dissenters here, explain why we need one, and why the scope needs to extend where it needs to extend to, and why we should not be overly worried about this going in a very bad direction.  The reason for this direction is that in part I found myself defending the need for a code of conduct to folks I collaborate with in Europe and the context had less to do with PostgreSQL than with the Linux kernel.  But the projects in this regard are far more different than they are similar.
Major Complaint:  Feedback Could Have Been Handled Better (Maybe Next Time)
In early May there was discussion about the formation of a code of conduct committee, in which I argued (successfully) that it was extremely important that the committee be geographically and culturally diverse so as to avoid one country's politics being unintentionally internationalized through a code of conduct.  This was accepted and as I will go into below this is the single most important protection we have against misuse of the code of conduct to push political agendas on the community.  However after this discussion there was no further solicitation for feedback until mid-September.
In Mid-September, the Code of Conduct plan was submitted to the list.  In the new code of conduct was a surprising amendment which had been made the previous month, expanding the code of conduct to all interactions between community members unless anot[...]
カテゴリー: postgresql

Shaun M. Thomas: On Rocks and Sand

planet postgresql - 2018-09-22(土) 00:00:15

When working with database capacity planning, there are a lot of variables to consider, and Postgres is no different in this regard. One of the elements which requires management is storage. However, there’s an aspect of storage that escapes inspection almost without exception, and it’s hiding in the shadows between the columns themselves.

Alignment Basics

In most low-level computer languages like C, in which the venerable Postgres is written, data types are addressed by their maximum size, regardless of how large they actually are. Thus a standard 32-bit integer which can store a value of just over 2-billion, must be read as a whole unit. This means even the value of 0 requires 4 bytes of storage.

Further, Postgres is designed such that its own internal natural alignment is 8 bytes, meaning consecutive fixed-length columns of differing size must be padded with empty bytes in some cases. We can see that with this example:

SELECT pg_column_size(row()) AS empty, pg_column_size(row(0::SMALLINT)) AS byte2, pg_column_size(row(0::BIGINT)) AS byte8, pg_column_size(row(0::SMALLINT, 0::BIGINT)) AS byte16; empty | byte2 | byte8 | byte16 -------+-------+-------+-------- 24 | 26 | 32 | 40

This suggests that an empty Postgres row requires 24 bytes of various header elements, a SMALLINT is 2 bytes, a BIGINT is 8 bytes, and combining them is… 16 bytes? That’s no mistake; Postgres is padding the smaller column to match the size of the following column for alignment purposes. Instead of 2 + 8 = 10, our math becomes 8 + 8 = 16.

Intensity Intervals

By itself, this may not necessarily be a problem. But consider a contrived ordering system with this table:

CREATE TABLE user_order ( is_shipped BOOLEAN NOT NULL DEFAULT false, user_id BIGINT NOT NULL, order_total NUMERIC NOT NULL, order_dt TIMESTAMPTZ NOT NULL, order_type SMALLINT NOT NULL, ship_dt TIMESTAMPTZ, item_ct INT NOT NULL, ship_cost NUMERIC, receive_dt TIMESTAMPTZ, tracking_cd [...]
カテゴリー: postgresql

Avinash Kumar: Securing PostgreSQL as an Enterprise-Grade Environment

planet postgresql - 2018-09-21(金) 22:50:08

In this post, we review how you can build an enhanced and secure PostgreSQL database environment using community software. We look at the features that are available in PostgreSQL that, when implemented, provide improved security.

As discussed in the introductory blog post of this series, in our webinar of October 10, 2018 we highlight important aspects an enterprise should consider for their PostgreSQL environments. This series of blogs addressing particular aspects of the enterprise-grade postgres environment complements the webinar. This post addresses security.

Authentication Layer Client connections to PostgreSQL Server using host based authentication

PostgreSQL uses a host based authentication file (pg_hba.conf) to authorize incoming connections. This file contains entries with a combination of 5 categories: type, database, user, address, and method. A client is allowed to connect to a database only when the combination of username, database and the hostname of the client matches an entry in the pg_hba.conf file.

Consider the following entry in pg_hba.conf file :

# TYPE DATABASE USER ADDRESS METHOD host percona pguser md5

This entry says that connections from server are only allowed from user pguser and only to the database percona. The method md5 forces password authentication.

The order of the entries in the pg_hba.conf file matters. If you have an entry that rejects connections from a given server followed by another that allows connections from it, the first entry in the order is considered. So, in this case, the connection is rejected.

This is the first layer of protection in authentication. If this criteria is not satisfied in this Access Control List (ACL), PostgreSQL will discard the request without considering even the server authentication.

Server Authentication

Historically, PostgreSQL uses MD5 digest as a password hash by default. The problem with pure MD5 hashing is that this function will always return the same hash for a given password, which renders a MD5 digest

カテゴリー: postgresql

Andrew Staller: Space exploration and time-series data: Why the European Space Agency uses PostgreSQL

planet postgresql - 2018-09-21(金) 01:04:48

Big, small, relational, time-series, & geospatial data — How the European Space Agency uses PostgreSQL & TimescaleDB for the “Digital Library of the Universe”

Our Sun and Heliosphere.

(The following post was written based on a joint talk given at PostgresConf US 2018 between ESA and TimescaleDB.)

Space. That object of endless human curiosity. That setting for countless stories, books, and movies. The subject of large coordinated scientific efforts aimed at exploration and understanding. And a source of a massive amount of data.

The European Space Agency (ESA) is an intergovernmental organization of 22 member states dedicated to the exploration of space for the citizens of the world.

A crucial part of ESA’s efforts is collecting massive amounts of data and putting it to good use. In this post, we take a closer look at one of its departments called the ESDC (“ESA Science Data Center”), and how it chose to standardize on PostgreSQL for its “Digital Library of the Universe”. We also look closely at the role that time-series data plays in space exploration, and how ESDC is using TimescaleDB to harness the power of this data.

Announcing TimescaleDB 1.0: First enterprise-ready time-series database to support full SQL & scale

Developing the Digital Library of the Universe

There are over 20 planetary, heliophysics, and astronomy missions at ESA. ESDC is tasked with archiving the data from all of them into a “Digital Library of the Universe.”

This universal archive serves as a repository of every mission history, hosting data for all research needs with the main goal of serving the public and scientific communities. ESDC makes this archive available by providing scientific teams all around the world privileged access to this data.

The Jedi Library: What we picture when we think of ESA’s “Digital Library of the Universe”

When it came to developing this “Digital Library”, ESDC had to find a database system that was reliable but would also provide the broadest access to the general scientific community at large.

In the p

カテゴリー: postgresql

Craig Kerstiens: Postgres 11 - A First Look

planet postgresql - 2018-09-21(金) 00:00:29

Postgres 11 is almost here, in fact the latest beta shipped today, and it features a lot of exciting improvements. If you want to get the full list of features it is definitely worth checking out the release notes, but for those who don’t read the release notes I put together a run down of some what I consider the highlight features.

Quitting Postgres

This is a small usability feature, but so long over due. Now you can quit Postgres by simply typing quit or exit. Previously you had to use Ctrl + D or \q. As a begginer it’s one thing to jump into a psql terminal, but once in if you can’t figure out how to quit it’s a frustrating experience. Small usability features, such as this and watch in an earlier release, are often lost in the highlighted features which talk about performance or new data types. Improvements like this really go a long way for making Postgres a better database for everyone.

Fear column addition no more

Brandur had a great in depth write-up on this feature already, but it falls somewhere into the category of the above as well as a performance improvement. Previously when you added a new column that was NOT NULL with a default value Postgres would have to take a lock and re-write the entire table. In a production environment on any sizable table for all practical purposes the result was an outage. The work around was to break your migrations apart to be a several step process.

With Postgres 11 you can add a new column to a table that is not null with a default value. The new row will get materialized on your database without requiring a full re-write. Here is to having to think less about your migrations.

Of course performance is a highlight

No Postgres release would be complete without some performance improvements. This release there are really two areas that feature key improvements around performance.

Parallelism continuing to mature

We first saw parallelism support back in PostgreSQL 9.6. At the time it was primarily for sequential scans, which if you used parallelism for your sequential sc

カテゴリー: postgresql

Álvaro Herrera: Partitioning Improvements in PostgreSQL 11

planet postgresql - 2018-09-20(木) 04:29:27

A partitioning system in PostgreSQL was first added in PostgreSQL 8.1 by 2ndQuadrant founder Simon Riggs. It was based on relation inheritance and used a novel technique to exclude tables from being scanned by a query, called “constraint exclusion”. While it was a huge step forward at the time, it is nowadays seen as cumbersome to use as well as slow, and thus needing replacement.

In version 10, it was replaced thanks to heroic efforts by Amit Langote with modern-style “declarative partitioning”. This new tech meant you no longer needed to write code manually to route tuples to their correct partitions, and no longer needed to manually declare correct constraints for each partition: the system did those things automatically for you.

Sadly, in PostgreSQL 10 that’s pretty much all it did. Because of the sheer complexity and the time constraints, there were many things in the PostgreSQL 10 implementation that were lacking. Robert Haas gave a talk about it in Warsaw’s PGConf.EU.

Many people worked on improving the situation for PostgreSQL 11; here’s my attempt at a recount. I split these in three areas:

  1. New partitioning features
  2. Better DDL support for partitioned tables
  3. Performance optimizations.
New Partitioning Features

In PostgreSQL 10, your partitioned tables can be so in RANGE and LIST modes. These are powerful tools to base many real-world databases on, but for many others designs you need the new mode added in PostgreSQL 11: HASH partitioning. Many customers need this, and Amul Sul worked hard to make it possible. Here’s a simple example:


It is not mandatory to use the same modulus value for all partitions; this lets you create more partitions later and r

カテゴリー: postgresql

Craig Kerstiens: Use cases for followers (read replicas) in Postgres

planet postgresql - 2018-09-20(木) 02:04:00

Citus extends Postgres to be a horizontally scalable database. By horizontally scalable, we mean the data is spread across multiple machines, and you’re able to scale not only storage but also memory and compute—thus providing better performance. Without using something like Citus to transform PostgreSQL into a distributed database, sure you can add read replicas to scale, but you’re still maintaining a single copy of your data. When you run into scaling issues with your Postgres database, adding a read replica and offloading some of your traffic to your read replica is a common bandaid to slow down the bleeding, but it is only a matter of time until even that doesn’t work any further. Whereas with Citus, scaling out your database is as simple as dragging a slider and rebalancing your data.

Are read replicas still useful with horizontally scalable databases?

But that leaves a question, are read-replicas still useful? Well, sure they are.

In Citus Cloud (our fully-managed database as a service), we have support for read replicas, in our case known as followers. Follower clusters leverage much of our same underlying disaster recovery infrastructure that forks leverage, but support a very different set of use cases.

Previously we talked about how forks can be used in Citus Cloud to get a production set of data over to staging that can help with testing migrations, rebalancing, or SQL query optimization. Forks are often used as a one-off for a short period of time. In contrast, followers are often long running Citus database clusters that can be a key mechanism to run your business, helping you get insights when you need them.

Follower cluster is often only a few seconds (if any) behind your primary database cluster

A follower cluster receives all updates from the primary Citus cluster in an asynchronous fashion. Often followers are only a few seconds (if any) behind your primary database cluster, though can lag at times by a few minutes. Followers have a full copy of your data, but reside on a separate cluster.

カテゴリー: postgresql

Laurenz Albe: Correlation of PostgreSQL columns explained

planet postgresql - 2018-09-19(水) 17:00:49

After you ANALYZE a PostgreSQL table to collect value distribution statistics, you will find the gathered statistics for each column in the pg_stats system view. This article will explain the meaning of the correlation column and its impact on index scans.

Physical vs. logical ordering

Most common PostgreSQL data types have an ordering: they support the operators <, <=, =, >= and >.
Such data types can be used with a B-tree index (the “standard” index type).

The values in a column of such a type provide a logical ordering of the table rows. An index on this column will be sorted according to that ordering.

A PostgreSQL table consists of one or more files of 8KB blocks. The order in which the rows are stored in the file is the physical ordering.
You can examine the physical ordering of the rows by selecting the ctid system column: it contains the block number and the item number inside the block, which describe the physical location of the table row.


The correlation for a column is a value between -1 and 1. It tells how good the match between logical and physical ordering is.

  • If the correlation is 1, the rows are stored in the table file in ascending column order; if it is -1, they are stored in descending order.
  • Values between -1 and 1 mean a less perfect match.
  • A value of 0 means that there is no connection between the physical and the logical order.
Why should I care?

You will create indexes on your tables for faster access (but not too many!).
The correlation of a column has an impact on the performance of an index scan.

During an index scan, the whole index or part of it are read in index sequential order. For each entry that is found, the corresponding row is fetched from the table (this is skipped in an “index only scan”, but that is a different story).

If the correlation of the indexed column is close to zero, the fetched rows will be from all over the table. This will result in many randomly distributed reads of many different table blocks.

However, if the correlation is close to 1 or -1,

カテゴリー: postgresql

Sebastian Insausti: Custom Graphs to Monitor your MySQL, MariaDB, MongoDB and PostgreSQL Systems - ClusterControl Tips & Tricks

planet postgresql - 2018-09-18(火) 22:26:07

Graphs are important, as they are your window onto your monitored systems. ClusterControl comes with a predefined set of graphs for you to analyze, these are built on top of the metric sampling done by the controller. Those are designed to give you, at first glance, as much information as possible about the state of your database cluster. You might have your own set of metrics you’d like to monitor though. Therefore ClusterControl allows you to customize the graphs available in the cluster overview section and in the Nodes -> DB Performance tab. Multiple metrics can be overlaid on the same graph.

Cluster Overview tab

Let’s take a look at the cluster overview - it shows the most important information aggregated under different tabs.

Cluster Overview Graphs

You can see graphs like “Cluster Load” and “Galera - Flow Ctrl” along with couple of others. If this is not enough for you, you can click on “Dash Settings” and then pick “Create Board” option. From there, you can also manage existing graphs - you can edit a graph by double-clicking on it, you can also delete it from the tab list.

Dashboard Settings

When you decide to create a new graph, you’ll be presented with an option to pick metrics that you’d like to monitor. Let’s assume we are interested in monitoring temporary objects - tables, files and tables on disk. We just need to pick all three metrics we want to follow and add them to our new graph.

New Board 1

Next, pick some name for our new graph and pick a scale. Most of the time you want scale to be linear but in some rare cases, like when you mix metrics containing large and small values, you may want to use logarithmic scale instead.

New Board 2

Finally, you can pick if your template should be presented as a default graph. If you tick this option, this is the graph you will see by default when you enter the “Overview” tab.

Once we save the new graph, you can enjoy the result:

New Board 3 Node Overview tab

In addition to the graphs on our cluster, we can also use this functionality on each of o

カテゴリー: postgresql

Jonathan Katz: Why Covering Indexes Are Incredibly Helpful

planet postgresql - 2018-09-17(月) 20:55:00

The PostgreSQL 11 release is nearly here (maybe in the next couple of weeks?!), and while a lot of the focus will be on the improvements to the overall performance of the system (and rightly so!), it's important to notice some features that when used appropriately, will provide noticeable performance improvements to your applications.

One example of such feature is the introduction of "covering indexes" for B-tree indexes. A covering index allows a user to perform an index-only scan if the looks in the query match the columns that are included in the index. You can specify the additional columns for the index using the "INCLUDE" keyword, e.g.

CREATE INDEX a_b_idx ON x (a,b) INLCUDE (c);

Theoretically, this can reduce the amount of I/O your query needs to use in order to retrieve information (traditionally, I/O is the biggest bottleneck on database systems). Additionally, the data types including in a covering index do not need to be B-tree indexable; you can add any data type to the INCLUDE part of a CREATE INDEX statement.

However, you still need to be careful for how you deploy covering indexes: each column you add to the index still takes up space on disk, and there is still a cost for maintaining the index, for examples, on row updates.

Understanding these trade offs, you can still apply covering indexes in very helpful ways that can significantly help your applications.

A Simple Example: Tracking Coffee Shop Visits
カテゴリー: postgresql

brian davis: Cost of a Join - Part 2: Enums, Wider Tables

planet postgresql - 2018-09-17(月) 14:00:00

A follow-up to the previous post where the performance of queries with many joins is investigated.

Great discussion on hacker news and r/programming brought up a couple ideas I hadn't considered.

  1. What about enums?
  2. What about tables with more columns?

I also figured it would be neat if other people could run these benchmarks with their own parameters / hardware.

So I adjusted my script to support enums and wider tables, and packaged it up into a tool anyone can use. It supports three different join types: enum, foreign keys, and what I'm calling "chained". The benchmark definitions are described in a json file which looks like this:

$ cat input.json [ { "join-type": "chained", "max-tables": 10, # Queries will start by joining 2 tables, increasing by one until all tables are joined. Number of tables joined will be the X axis on the plot. "max-rows": 10000, # Benchmarks will be performed at 10 rows, 100 rows, etc. until max-rows is reached, creating a separate line on the plot for each. "extra_columns": 2, "max_id": 5, "create-indexes": true, "output-filename": "benchmark_1", "plot-title": "My Chained Benchmark Title" }, { "join-type": "enums", "max-rows": 10000, # Benchmarks will be performed at 10 rows in the primary table, increasing by a factor of 10 until max-rows is reached "max-enums": 100, # Queries will start by selecting (and optionally filtering by) 1 enum column, increasing by one until max-enums is reached "possible-enum-values": 10, "extra-columns": 2, "where-clause": true, "output-filename": "benchmark_1", "plot-title": "My Enum Benchmark Title" }, { "join-type": "foreign-keys", "max-primary-table-rows": 10000, # Benchmarks will be performed at 10 rows in the primary table, increasing by a factor of 10 until max-rows is reached "max-fk-tables": 100, # Queries will start by selecting from (and optionally filtering by) 1 fo[...]
カテゴリー: postgresql