postgresql

Avinash Kumar: Bloom Indexes in PostgreSQL

planet postgresql - 2019-06-15(土) 04:29:56

There is a wide variety of indexes available in PostgreSQL. While most are common in almost all databases, there are some types of indexes that are more specific to PostgreSQL. For example, GIN indexes are helpful to speed up the search for element values within documents. GIN and GiST indexes could both be used for making full-text searches faster, whereas BRIN indexes are more useful when dealing with large tables, as it only stores the summary information of a page. We will look at these indexes in more detail in future blog posts. For now, I would like to talk about another of the special indexes that can speed up searches on a table with a huge number of columns and which is massive in size. And that is called a bloom index.

In order to understand the bloom index better, let’s first understand the bloom filter data structure. I will try to keep the description as short as I can so that we can discuss more about how to create this index and when will it be useful.

Most readers will know that an array in computer sciences is a data structure that consists of a collection of values and variables. Whereas a bit or a binary digit is the smallest unit of data represented with either 0 or 1. A bloom filter is also a bit array of m bits that are all initially set to 0.

A bit array is an array that could store a certain number of bits (0 and 1). It is one of the most space-efficient data structures to test whether an element is in a set or not.

Why use bloom filters?

Let’s consider some alternates such as list data structure and hash tables. In the case of a list data structure, it needs to iterate through each element in the list to search for a specific element. We can also try to maintain a hash table where each element in the list is hashed, and we then see if the hash of the element we are searching for matches a hash in the list. But checking through all the hashes may be a higher order of magnitude than expected. If there is a hash collision, then it does a linear probing which may be time-consuming. When we

[...]
カテゴリー: postgresql

Luca Ferrari: A recursive CTE to get information about partitions

planet postgresql - 2019-06-12(水) 09:00:00

I was wondering about writing a function that provides a quick status about partitioning. But wait, PostgreSQL has recursive CTEs!

A recursive CTE to get information about partitions

I’m used to partitioning, it allows me to quickly and precisely split data across different tables. PostgreSQL 10 introduced the native partitioning, and since that I’m using native partitioning over inheritance whenever it is possible.
But how to get a quick overview of the partition status? I mean, knowing which partition is growing the more?
In the beginning I was thinking to write a function to do that task, quickly finding myself iterating recursively over pg_inherits, the table that links partitions to their parents. But the keyword here is recursively: PostgreSQL provides recursive Common Table Expression, and a quick search revelead I was right: it is possible to do it with a single CTE. Taking inspiration from this mailing list message, here it is a simple CTE to get a partition status (you can find it on my GitHub repository):

WITH RECURSIVE inheritance_tree AS ( SELECT c.oid AS table_oid , c.relname AS table_name , NULL::text AS table_parent_name , c.relispartition AS is_partition FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind = 'p' AND c.relispartition = false UNION ALL SELECT inh.inhrelid AS table_oid , c.relname AS table_name ,...
カテゴリー: postgresql

Jeff McCormick: What's New in Crunchy PostgreSQL Operator 4.0

planet postgresql - 2019-06-12(水) 00:27:48

Crunchy Data is pleased to release PostgreSQL Operator 4.0.

カテゴリー: postgresql

Hans-Juergen Schoenig: Tech preview: How PostgreSQL 12 handles prepared plans

planet postgresql - 2019-06-11(火) 17:00:50

PostgreSQL 12 is just around the corner and therefore we already want to present some of the new features we like. One important new feature gives users and devops the chance to control the behavior of the PostgreSQL optimizer. Prepared plans are always a major concern (especially people moving from Oracle seem to be most concerned) and therefore it makes sense to discuss the way plans are handled in PostgreSQL 12.

Firing up a PostgreSQL test database

To start I will create a simple table consisting of just two fields:

db12=# CREATE TABLE t_sample (id serial, name text); CREATE TABLE

Then some data is loaded:

db12=# INSERT INTO t_sample (name) SELECT 'hans' FROM generate_series(1, 1000000); INSERT 0 1000000 db12=# INSERT INTO t_sample (name) SELECT 'paul' FROM generate_series(1, 2); INSERT 0 2

Note that 1 million names are identical (“hans”) and just two people are called “paul”. The distribution of data is therefore quite special, which has a major impact as you will see later in this post.

To show how plans can change depending on the setting, an index on “name” is defined as shown in the next listing:

db12=# CREATE INDEX idx_name ON t_sample (name); CREATE INDEX

The PostgreSQL query optimizer at work

Let us run a simple query and see what happens:

db12=# explain SELECT count(*) FROM t_sample WHERE name = 'hans'; QUERY PLAN ------------------------------------------------------------------ Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8) -> Gather (cost=12656.01..12656.22 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8) -> Parallel Seq Scan on t_sample (cost=0.00..10614.34 rows=416668 width=0) Filter: (name = 'hans'::text) (6 rows)

In this case PostgreSQL decided to ignore the index and go for a sequential scan. It has even seen that the table is already quite large and opted for a parallel query. Still, what we see is a sequential scan. All data in the table has to be

[...]
カテゴリー: postgresql

Luca Ferrari: Checking the sequences status on a single pass

planet postgresql - 2019-06-11(火) 09:00:00

It is quite simple to wrap a couple of queries in a function to have a glance at all the sequences and their cycling status.

Checking the sequences status on a single pass

The catalog pg_sequence keeps track about the definition of a single sequence, including the increment value and boundaries. Combined with pg_class and a few other functions it is possible to create a very simple administrative function to keep track about the overall sequences status.

I’ve created a seq_check() function that provides an output as follows:

testdb=# select * from seq_check() ORDER BY remaining; seq_name | current_value | lim | remaining ------------------------|---------------|------------|------------ public.persona_pk_seq | 5000000 | 2147483647 | 214248 public.root_pk_seq | 50000 | 2147483647 | 2147433647 public.students_pk_seq | 7 | 2147483647 | 2147483640 (3 rows)

As you can see, the function provides the current value of the sequence, the maximum value (limit) and how much values the sequence can still provide before it overflows or cycles. For example, persona_pk_seq has remained with 214248 values to provide. Combined with the current value, that is 5000000, this provides hint about the fact that the sequence has probably a too large increment interval.

The code of the function is as follows:

CREATE OR REPLACE FUNCTION seq_check() RETURNS TABLE( seq_name text, current_value bigint, lim...
カテゴリー: postgresql

Paul Ramsey: Parallel PostGIS and PgSQL 12 (2)

planet postgresql - 2019-06-08(土) 01:00:00

In my last post I demonstrated that PostgreSQL 12 with PostGIS 3 will provide, for the first time, automagical parallelization of many common spatial queries.

This is huge news, as it opens up the possibility of extracting more performance from modern server hardware. Commenters on the post immediately began conjuring images of 32-core machines reducing their query times to miliseconds.

So, the next question is: how much more performance can we expect?

To investigate, I acquired a 16 core machine on AWS (m5d.4xlarge), and installed the current development snapshots of PostgreSQL and PostGIS, the code that will become versions 12 and 3 respectively, when released in the fall.

How Many Workers?

The number of workers assigned to a query is determined by PostgreSQL: the system looks at a given query, and the size of the relations to be processed, and assigns workers proportional to the log of the relation size.

For parallel plans, the “explain” output of PostgreSQL will include a count of the number of workers planned and assigned. That count is exclusive of the leader process, and the leader process actually does work outside of its duties in coordinating the query, so the number of CPUs actually working is more than the num_workers, but slightly less than num_workers+1. For these graphs, we’ll assume the leader fully participates in the work, and that the number of CPUs in play is num_workers+1.

Forcing Workers

PostgreSQL’s automatic calculation of the number of workers could be a blocker to performing analysis of parallel performance, but fortunately there is a workaround.

Tables support a “storage parameter” called parallel_workers. When a relation with parallel_workers set participates in a parallel plan, the value of parallel_workers over-rides the automatically calculated number of workers.

ALTER TABLE pd SET ( parallel_workers = 8);

In order to generate my data, I re-ran my queries, upping the number of parallel_workers on my tables for each run.

Setup

Before running the tests, I set all the global limits o

[...]
カテゴリー: postgresql

Bruce Momjian: Exploring Postgres Tips and Tricks

planet postgresql - 2019-06-07(金) 07:00:01

I did a webinar two weeks ago titled, "Exploring Postgres Tips and Tricks." The slides are now online, as well as a video recording. I wasn't happy with the transition I used from the PDF to the blog entries, but now know how to improve that next time.

I think I might do more of these by expanding on some of the topics I covered, like psql and monitoring. Also, a new video is available of the sharding presentation I mentioned previously.

カテゴリー: postgresql

Álvaro Hernández: PostgreSQL Ibiza: 2 weeks to go

planet postgresql - 2019-06-07(金) 00:37:11
Five-day PostgreSQL networking experience, embedding a 2-day conference Just 2 weeks to go for PostgreSQL Ibiza. The new, innovative PostgreSQL Conferences that happens 50m away from a beach. The conference for thinkers, for networking, for partnering. The conference to be at. But a conference is nothing without great content. And after receiving more than 71 talk submissions, and the hard work that the Committee has done to select the talks, PostgreSQL Ibiza will have top-notch talks from top-notch international speakers.
カテゴリー: postgresql

Robert Treat: The Lost Art of plpgsql

planet postgresql - 2019-06-06(木) 14:12:00

One of the big features talked about when PostgreSQL 11 was released was that of the new stored procedure implementation. This gave Postgres a more standard procedure interface compared to the previous use of functions. This is perticularly useful for folks who are doing database migrations where they may have been using the standards CALL syntax vs Postgres traditional use of SELECT function(); syntax. So it struck me as odd earlier this year when I noticed that, despite the hoopla, that a year later that there was almost zero in the way of presentations and blog posts on either the new stored procedure functionality or the use of plpgsql in general.

And so I got the idea that maybe I would write such a talk and present it at PGCon; a nod to the past and the many years I've spent working with plpgsql in a variety of roles. The commitee liked the idea (disclosure that I am on the pgcon committee, but didn't advocate for myself) and so this talk was born. For a first time talk I think it turned out well, though it could definitly use some polish; but I'm happy that it did help spark some conversation and actually has given me a few items worth following up on, hopefully in future blog posts.

Video should be available in a few weeks, but for now, I've gone ahead and uploaded the slides on slideshare.

カテゴリー: postgresql

Robert Haas: The Contributors Team

planet postgresql - 2019-06-06(木) 02:37:00
Recently, the PostgreSQL project spun up a "contributors" team, whose mission is to ensure that the PostgreSQL contributors list is up-to-date and fair. The contributors page has a note which says "To suggest additions to the list, please email contributors@postgresql.org."  The current members of the team are Dave Page, Stephen Frost, Vik Fearing, and me.

Read more »
カテゴリー: postgresql

Mark Wong: PDXPUG June Meetup: Accessing Postgres with Java

planet postgresql - 2019-06-06(木) 01:06:21

When: 6-8pm Thursday June 20, 2019
Where: PSU Business Accelerator (Parking is open after 5pm.)
Who: Will McLean

To follow the presentations on accessing Postgres from Python and Scala, I will lead a discussion on accessing Postgres with Java. I’ll start with a jdbc tutorial and finish with adding data access to a springboot webapp.

I have twenty years experience in e-commerce applications, the last eight here in Portland, mostly at Nike.For the last few years everything has been moving to Amazon RDS Postgres, that’s a trend pdxpug can get behind!  I am currently working for Navis on CRM applications for the hospitality industry.

カテゴリー: postgresql

Julien Rouhaud: PoWA 4: changes in powa-archivist!

planet postgresql - 2019-06-05(水) 23:26:17

This article is part of the PoWA 4 beta series, and describes the changes done in powa-archivist.

For more information about this v4, you can consult the general introduction article.

Quick overview

First of all, you have to know that there is not upgrade possible from v3 to v4, so a DROP EXTENSION powa is required if you were already using PoWA on any of your servers. This is because this v4 involved a lot of changes in the SQL part of the extension, making it the most significant change in the PoWA suite for this new version. Looking at the amount changes at the time I’m writing this article, I get:

CHANGELOG.md | 14 + powa--4.0.0dev.sql | 2075 +++++++++++++++++++++------- powa.c | 44 +- 3 files changed, 1629 insertions(+), 504 deletions(-)

The lack of upgrade shouldn’t be a problem in practice though. PoWA is a performance tool, so it’s intended to have data with high precision but with a very limited history. If you’re looking for a general monitoring solution keeping months of counters, PoWA is definitely not the tool you need.

Configuring the list of remote servers

Concerning the features themselves, the first small change is that powa-archivist does not require the background worker to be active anymore, as it won’t be used for remote setup. That means that a PostgreSQL restart is not needed needed anymore to install PoWA. Obviously, a restart is still required if you want to use the local setup, using the background worker, or if you want to install additional extensions that themselves require a restart.

Then, as PoWA needs some configuration (frequency of snapshot, data retention and so on), some new tables are added to be able to configure all of that. The new powa_servers table stores the configuration for all the remote instances whose data should be stored on this instance. This local PoWA instance is call a repository server (that typically should be dedicated to storing PoWA data), in opposition to remote instances which are the instances you want to monitor. The conte

[...]
カテゴリー: postgresql

Hans-Juergen Schoenig: Tech preview: Improving COPY and bulkloading in PostgreSQL 12

planet postgresql - 2019-06-05(水) 22:03:47

If you are relying heavily on the PostgreSQL COPY command to load data into PostgreSQL quickly, PostgreSQL 12 might offer a feature, which is most likely very beneficial to you. Bulkloading is an important operation and every improvement in this area is certainly going to help many people out there, who want to import data into PostgreSQL as fast as possible.

COPY: Loading and unloading data as fast as possible

When taking a closer look at the syntax of the COPY command in PostgreSQL 12 you will quickly see two things:

• \h will now point to the correct page in the documentation
• COPY now supports a WHERE condition

Here is the complete syntax overview:

db12=# \h COPY Command: COPY Description: copy data between a file and a table Syntax: COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | PROGRAM 'command' | STDIN } [ [ WITH ] ( option [, ...] ) ] [ WHERE condition ] COPY { table_name [ ( column_name [, ...] ) ] | ( query ) } TO { 'filename' | PROGRAM 'command' | STDOUT } [ [ WITH ] ( option [, ...] ) ] where option can be one of: FORMAT format_name FREEZE [ boolean ] DELIMITER 'delimiter_character' NULL 'null_string' HEADER [ boolean ] QUOTE 'quote_character' ESCAPE 'escape_character' FORCE_QUOTE { ( column_name [, ...] ) | * } FORCE_NOT_NULL ( column_name [, ...] ) FORCE_NULL ( column_name [, ...] ) ENCODING 'encoding_name' URL: https://www.postgresql.org/docs/12/sql-copy.html

While having a link to the documentation around is certainly beneficial, the WHERE condition added to PostgreSQL 12 might even be more important. What is the purpose of this new feature? So far it was possible to completely import a file. However, in some cases this has been a problem: More often than not people only wanted to load a subset of data and had to write a ton of code to filter data before the import or once data has been written into the database already.

COPY … WHERE: Applying filters while importing data

Im PostgreSQL data can be filtered

[...]
カテゴリー: postgresql

Venkata Nagothi: How to Optimize PostgreSQL Logical Replication

planet postgresql - 2019-06-05(水) 18:48:00

Logical Replication or Pglogical is a table level, WAL based replication mechanism which replicates the data of specific Tables between two PostgreSQL instances. There seems to be a confusion between “pglogical” and “Logical Replication”. Both of them provide the same kind of replication mechanism with some differences in features and capabilities. Logical Replication is introduced in PostgreSQL-10 as an in-built feature unlike pglogical which is an extension. “Pglogical” with ongoing continuous developments, remains as the only option for implementing Logical Replication for those environments using PostgreSQL versions prior to 10. Eventually, all the features part of pglogical will be part of Logical Replication. In other words, pglogical (extension) became Logical Replication (in-built feature). The basic advantage of Logical Replication is that it does not need any extensions to be installed / created which is in turn beneficial to those environments where-in installing extensions is restricted.

Related resources  ClusterControl for PostgreSQL  An Overview of Logical Replication in PostgreSQL  PostgreSQL Streaming Replication vs Logical Replication

This blog will focus on optimizing Logical Replication. That means, the optimization tips and techniques highlighted in this blog will apply for both pglogical and Logical Replication.

Logical Replication is a WAL based replication which is first of its kind. As a DBA, this would be much more reliable and performant replication mechanism when compared to other trigger based replication solutions. The changes made to the tables part of pglogical replication are replicated in real-time via WAL records which makes it highly efficient and non complex. All of the other replication mechanisms in the market are trigger based which can pose performance and maintenance challenges. With Logical Replication coming in, dependency on trigger based replication is almost gone.

There are other blogs which explain how to configure Logical Replication in quite a detail.

In

[...]
カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - Table Access Methods and blackholes

planet postgresql - 2019-06-04(火) 14:16:34

Postgres is very nice when it comes to extending with custom plugins, with many set of facilities available, like:

After a heavy refactoring of the code, Postgres 12 ships with a basic infrastructure for table access methods which allow to customize how table data is stored and accessed. By default, all tables in PostgreSQL use the historical heap, which works on a page-based method of 8kB present in segment files of 1GB (default sizes), with full tuple versions stored. This means, in simple words, that even updating one attribute of a tuple requires storing a full new version. This makes the work related to vacuum and autovacuum more costly as well. Well, the goal of this post is not to discuss about that, and there is documentation on the matter. So please feel free to refer to it.

Table access methods are really cool, because they basically allow to plugin directly into Postgres a kind of equivalent to MySQL storage engines, making it possible to implement things like columnar storage, which is something where heap is weak at. It is possible to roughly classify what is possible to do into two categories:

  • Access method going through the storage manager of Postgres, which makes use of the existing shared buffer layer, with the exiting paging format. This has two advantages: backups and checksums are normally, and mostly, automatically supported.
  • Access method not going through Postgres, which has the advantage to not rely on Postgres shared buffers (page format can be a problem as well), making it possible to rely fully on the OS cache. Note that it is then up to you to add support for checksumming, backups, and such.

Access methods could make a comparison with foreign data wrappers, but the reliability is much different, one big point being that they are fully transactional with the backend they work with, which is usually a big deal for applications, and have transparent DDL and command su

[...]
カテゴリー: postgresql

Wim Bertels: PGConf.be 2019

planet postgresql - 2019-06-03(月) 21:47:00
A round up of the first PGConf.be

Stefan F. already wrote a nice blog about the conference.

Most of the presentations, a couple recordings and a few photos are available at the pgconf.be website

Using the well known and broadly spread technique of inductive reasoning we came to the conclusion that this first PGConf.be conference was a success, as well as the art work. No animals or elephants we’re hurt during this event.

The statistics are

  • 53 participants with a low standardeviation regarding attendance time

    • depending on the session, an extra 5 to 30 students attended as well

  • 11 speakers

  • 5 sponsors

This conference wouldn’t have been possible without the help of volunteers.
To conclude a big thank you to all the speakers, sponsors and attendants.
Without them a conference is just a like tee party.

カテゴリー: postgresql

Regina Obe: PostGIS 3.0.0alpha2

planet postgresql - 2019-06-02(日) 09:00:00

The PostGIS development team is pleased to release PostGIS 3.0.0alpha2.

This release works with PostgreSQL 9.5-12beta1 and GEOS >= 3.6

Best served with PostgreSQL 12beta1.

Continue Reading by clicking title hyperlink ..
カテゴリー: postgresql

elein mustain: Beautiful things, strings.

planet postgresql - 2019-06-01(土) 08:45:09
This blog today is going to talk about strings: how they are stored, how they are input, and lots of examples of how to use string operators and functions in order to manipulate them. Strings, strings, strings. What we are not going to cover is regular expressions, although we will use them. The Fine Manual […]
カテゴリー: postgresql

Magnus Hagander: Nordic PGDay 2020 - Date and location

planet postgresql - 2019-06-01(土) 06:44:22

We're happy to announce that Nordic PGDay 2020 will be held in Helsinki, Finland, on March 24th, 2020, at the Hilton Helsinki Strand. Join us for a full day of PostgreSQL content!

For now, mark your calendars -- registrations and call for papers will open in the fall!

カテゴリー: postgresql

ページ