planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 14分 前

Álvaro Hernández: Having lunch with PostgreSQL, MongoDB and JSON

2019-04-26(金) 17:15:00

On a post titled “Postgres JSON, Developer Productivity, and The MongoDB Advantage”, Buzz Moschetti discussed about PostgreSQL’s handling of JSON and how (inconvenient) it is for developers, specially when compared to MongoDB. While the post is almost 18 months old, the principles described there have not changed, and I (mostly) respectfully disagree. Here is my opinion on the topic.

Let’s see what there is on today’s menu.

Small bites

SQL syntax and, indeed, the relational model as a whole are designed to work with single, scalar values which carry the same type from row to row, not rich shapes like JSON that can contain substructures and arrays and different elements from row to row.

If anything, SQL is about set operations on tuples, not scalar values. But, I get Buzz’s point, he probably meant “columns”. Yet still not correct. The SQL standard has had support for arrays as a column type since 1999! Including functions to access, construct or create arrays. PostgreSQL is actually more advanced, supporting multidimensional arrays, and even a set of key-value pairs with the hstore datatype (again: all that within a single column). On top of that, PostgreSQL also supports custom data types (which can also be row types or data structures) and combinations of all that. So not simple scalar values. And it obviously supports JSON (with the jsonb data type), which will be further discussed here.

And the extensions to SQL utilized by Postgres to manipulate JSON are proprietary, unknown to most SQL developers, and not supported or even recognized by most 3rd party SQL tools.

I’m not aware of MongoDB’s language being part of any standard, so we should assume Buzz’s comment about proprietary language applies to both MongoDB and PostgreSQL equally. Being that true, there are some important catches:

  • PostgreSQL’s proprietary syntax is only for accesing JSON data. The rest of the language, and data accessed via non JSON data types, is pure, standard SQL. None of MongoDB’s query language is standard.

  • It is ironic to m

[...]
カテゴリー: postgresql

Laurenz Albe: Triggers to enforce constraints

2019-04-25(木) 17:00:23
© Laurenz Albe 2019

 

Sometimes you want to enforce a condition on a table that cannot be implemented by a constraint. In such a case it is tempting to use triggers instead. This article describes how to do this and what to watch out for.

It will also familiarize you with the little-known PostgreSQL feature of “constraint triggers”.

A test case

Suppose we have a table of prisons and a table of prison guards:

CREATE SCHEMA jail_app; CREATE TABLE jail_app.prison ( prison_id integer PRIMARY KEY, prison_name text NOT NULL ); INSERT INTO jail_app.prison (prison_id, prison_name) VALUES (1, 'Karlau'), (2, 'Stein'); CREATE TABLE jail_app.guard ( guard_id integer PRIMARY KEY, guard_name text NOT NULL ); INSERT INTO jail_app.guard (guard_id, guard_name) VALUES (41, 'Alice'), (42, 'Bob'), (43, 'Chris');

Then we have a junction table that stores which guard is on duty in which prison:

CREATE TABLE jail_app.on_duty ( prison_id integer REFERENCES prison, guard_id integer REFERENCES guard, PRIMARY KEY (prison_id, guard_id) ); INSERT INTO jail_app.on_duty (prison_id, guard_id) VALUES (1, 41), (2, 42), (2, 43);

So Alice is on duty in Karlau, and Bob and Chris are on duty in Stein.

Naïve implementation of a constraint as trigger

As guards go on and off duty, rows are added to and deleted from on_duty. We want to establish a constraint that at least one guard has to be on duty in any given prison.

Unfortunately there is no way to write this as a normal database constraint (if you are tempted to write a CHECK constraint that counts the rows in the table, think again).

But it would be easy to write a BEFORE DELETE trigger that ensures the condition:

CREATE FUNCTION jail_app.checkout_trig() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN IF (SELECT count(*) FROM jail_app.on_duty WHERE prison_id = OLD.prison_id ) < 2 THEN RAISE EXCEPTION 'sorry, you are the only guard on duty'; END IF; RETURN OLD; END;$$; CREATE TRIGGER checkout_trig B[...]
カテゴリー: postgresql

Laurenz Albe: Triggers to enforce constraints

2019-04-25(木) 17:00:23
© Laurenz Albe 2019

 

Sometimes you want to enforce a condition on a table that cannot be implemented by a constraint. In such a case it is tempting to use triggers instead. This article describes how to do this and what to watch out for.

It will also familiarize you with the little-known PostgreSQL feature of “constraint triggers”.

A test case

Suppose we have a table of prisons and a table of prison guards:

CREATE SCHEMA jail_app; CREATE TABLE jail_app.prison ( prison_id integer PRIMARY KEY, prison_name text NOT NULL ); INSERT INTO jail_app.prison (prison_id, prison_name) VALUES (1, 'Karlau'), (2, 'Stein'); CREATE TABLE jail_app.guard ( guard_id integer PRIMARY KEY, guard_name text NOT NULL ); INSERT INTO jail_app.guard (guard_id, guard_name) VALUES (41, 'Alice'), (42, 'Bob'), (43, 'Chris');

Then we have a junction table that stores which guard is on duty in which prison:

CREATE TABLE jail_app.on_duty ( prison_id integer REFERENCES prison, guard_id integer REFERENCES guard, PRIMARY KEY (prison_id, guard_id) ); INSERT INTO jail_app.on_duty (prison_id, guard_id) VALUES (1, 41), (2, 42), (2, 43);

So Alice is on duty in Karlau, and Bob and Chris are on duty in Stein.

Naïve implementation of a constraint as trigger

As guards go on and off duty, rows are added to and deleted from on_duty. We want to establish a constraint that at least one guard has to be on duty in any given prison.

Unfortunately there is no way to write this as a normal database constraint (if you are tempted to write a CHECK constraint that counts the rows in the table, think again).

But it would be easy to write a BEFORE DELETE trigger that ensures the condition:

CREATE FUNCTION jail_app.checkout_trig() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN IF (SELECT count(*) FROM jail_app.on_duty WHERE prison_id = OLD.prison_id ) < 2 THEN RAISE EXCEPTION 'sorry, you are the only guard on duty'; END IF; RETURN OLD; END;$$; CREATE TRIGGER checkout_trig B[...]
カテゴリー: postgresql

Umair Shahid: Postgres is the coolest database – Reason #2: The License

2019-04-24(水) 23:31:40
Legal documents = SCARY!! That’s the typical equation, and it’s true – except when it comes to PostgreSQL. Let me explain… I have been told by both prospects and clients, that when they sit down to negotiate terms with Oracle, they are faced with more lawyers than they have engineers. No wonder one shudders at […]
カテゴリー: postgresql

Daniel Vérité: OIDs demoted to normal columns: a glance at the past

2019-04-24(水) 21:17:04

In PostgreSQL 12, oid columns in system tables will loose their “special” nature, and the optional clause WITH OIDS will disapppear from CREATE TABLE. As a concrete consequence, oid will now be visible when running select * from the catalogs that have OIDs, as well as when querying information_schema.columns, or with \d inside psql. Until now they were hidden, as are all system columns such as xmin or xmax.

The commit message in the source repository mentions this reason for the change:

author Andres Freund <andres (at) anarazel (dot) de>
Wed, 21 Nov 2018 01:36:57 +0200 (15:36 -0800)
[…]
Remove WITH OIDS support, change oid catalog column visibility.
[…]
The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that “specialness” significantly.

Pluggable storage is a step towards the much expected zheap, as well as other formats in the future.

Looking back years ago, this can be seen in the perspective of previous changes, which also go in the direction of obsoleting OIDs:

  • 7.2 (Feb 2002), the oid column becomes optional.
  • 8.0 (Jan 2005), the default_with_oids parameter is created.
  • 8.1 (Nov 2005), default_with_oids is now false by default.

But why was the OID as a special column invented in the first place? Originally, as the name “Object ID” suggests, the OID is related to object orientation.

A bit of history: object orientation in Postgres

In the mid-80’s, the object orientation concept was surfacing, with OO languages like C++ in their early stages of design. In the database world, there was also this idea that maybe in the future, people will want to look at their data primarily through OO lenses.

This can explain why in early versions of Postgres, when developed as a research project at the University of Berkeley, object orientation was a significant component.

In programming languages

[...]
カテゴリー: postgresql

Andreas 'ads' Scherbaum: PGConf.EU 2019 - Call for Papers and Call for Sponsorships

2019-04-24(水) 19:46:36

PostgreSQL Conference Europe 2019 takes place in Milan, Italy, on October 15-18. Our Call for Papers is now open.

We are accepting proposals for talks in English. Each session will last 45 minutes, and may be on any topic related to PostgreSQL. The submission deadline is July 15th. Selected speakers will be notified before August 10th, 2019.

Please submit your proposals by going to 2019.pgconf.eu/callforpapers/ and following the instructions.

The proposals will be considered by committee who will produce a schedule to be published nearer the conference date. The members of the committee are listed on the website linked above.

All selected speakers will get free entry to the conference (excluding training sessions). We do not in general cover travel and accommodations for speakers, but may be able to do so in limited cases. If you require assistance with funding to be able to attend, please make a note of this in the submission notes field or contact us separately before the submission deadline.

And finally, our Call for Sponsorship is also open. Take your chance to present your services or products to the PostgreSQL community - or see it as a give back opportunity. All sponsorship levels also include one or more free entrance tickets, depending on the level. Please head to 2019.pgconf.eu/becomesponsor/ for more details.

Please note that the hotel room situation in Milan is tense, we advise you to book your room as early as possible. We are working with hotels around the venue to provide more options.

As usual, if you have any questions, don't hesitate to contact us at contact@pgconf.eu.

We look forward to seeing you in Milan in October!

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Support foreign keys that reference partitioned tables

2019-04-24(水) 18:48:56
On 3rd of April 2019, Alvaro Herrera committed patch: Support foreign keys that reference partitioned tables     Previously, while primary keys could be made on partitioned tables, it was not possible to define foreign keys that reference those primary keys. Now it is possible to do that.   Author: Álvaro Herrera   Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql … Continue reading "Waiting for PostgreSQL 12 – Support foreign keys that reference partitioned tables"
カテゴリー: postgresql

Viorel Tabara: Benchmarking Managed PostgreSQL Cloud Solutions - Part Three: Google Cloud

2019-04-24(水) 18:48:00

In this 3rd part of Benchmarking Managed PostgreSQL Cloud Solutions, I took advantage of Google’s GCP free tier offering. It has been a worthwhile experience and as a sysadmin spending most of his time at the console I couldn’t miss the opportunity of trying out cloud shell, one of the console features that sets Google apart from the cloud provider I’m more familiar with, Amazon Web Services.

To quickly recap, in Part 1 I looked at the available benchmark tools and explained why I chose AWS Benchmark Procedure for Aurora. I also benchmarked Amazon Aurora for PostgreSQL version 10.6. In Part 2 I reviewed AWS RDS for PostgreSQL version 11.1.

During this round, the tests based on the AWS Benchmark Procedure for Aurora will be run against Google Cloud SQL for PostgreSQL 9.6 since the version 11.1 is still in beta.

Cloud Instances Prerequisites

As mentioned in the previous two articles, I opted for leaving PostgreSQL settings at their cloud GUC defaults, unless they prevent tests from running (see further down below). Recall from previous articles that the assumption has been that out of the box the cloud provider should have the database instance configured in order to provide a reasonable performance.

The AWS pgbench timing patch for PostgreSQL 9.6.5 applied cleanly to Google Cloud version of PostgreSQL 9.6.10.

Using the information Google put out in their blog Google Cloud for AWS Professionals I matched up the specs for the client and the target instances with respect to the Compute, Storage, and Networking components. For example, Google Cloud equivalent of AWS Enhanced Networking is achieved by sizing the compute node based on the formula:

max( [vCPUs x 2Gbps/vCPU], 16Gbps) Related resources  ClusterControl for PostgreSQL  Benchmarking Managed PostgreSQL Cloud Solutions - Part Two: Amazon RDS  Benchmarking Managed PostgreSQL Cloud Solutions - Part One: Amazon Aurora

When it comes to setting up the target database instance, similarly to AWS, Google Cloud allows no replicas, however, the storage is encr

[...]
カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - REINDEX CONCURRENTLY

2019-04-22(月) 15:52:09

A lot of work has been put into making Postgres 12 an excellent release to come, and in some of the features introduced, there is one which found its way into the tree and has been first proposed to community at the end of 2012. Here is the commit which has introduced it:

commit: 5dc92b844e680c54a7ecd68de0ba53c949c3d605 author: Peter Eisentraut <peter@eisentraut.org> date: Fri, 29 Mar 2019 08:25:20 +0100 REINDEX CONCURRENTLY This adds the CONCURRENTLY option to the REINDEX command. A REINDEX CONCURRENTLY on a specific index creates a new index (like CREATE INDEX CONCURRENTLY), then renames the old index away and the new index in place and adjusts the dependencies, and then drops the old index (like DROP INDEX CONCURRENTLY). The REINDEX command also has the capability to run its other variants (TABLE, DATABASE) with the CONCURRENTLY option (but not SYSTEM). The reindexdb command gets the --concurrently option. Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee

As pointed out by the documentation, REINDEX needs to take an exclusive lock on the relation which is indexed, meaning that for the whole duration of the operation, no queries can be run on it and will wait for the REINDEX to finish. Sometimes REINDEX can become very handy in the event of an index corruption, or when in need to rebuild the index because of extra bloat on it. So the longer the operation takes, the longer a production instance is not available, and that’s bad for any deployments so maintenance windows become mandatory. There is a community tool called pg_reorg, which happens to be used by an organization called Instagram aimed at reducing the impact of a REINDEX at the cost of extra resources by using a trigger-based method to replay tuple changes while an index is rebuilt in parallel of the existing one. Later this

[...]
カテゴリー: postgresql

Raghavendra Rao: Fixing up a corrupted TOAST table

2019-04-22(月) 13:30:25
Today, when taking a logical backup(pg dump) of a database cluster table (PG 9.4), we saw a toast table error. [crayon-5cbd43c67da7c367733810/] Above error shows the toast table corruption. To fix this, we don’t need any special software, all we have to do is follow the instructions repeatedly suggested by Postgres-community folks on the community channel....
カテゴリー: postgresql

Shaun M. Thomas: PG Phriday: Around the World in Two Billion Transactions

2019-04-20(土) 02:00:21

Transaction IDs (XID) have been something of a thorn in Postgres’ side since the dawn of time. On one hand, they’re necessary to differentiate tuple visibility between past, present, and concurrent transactions. On the other hand, the counter that stores it is only 32-bits, meaning it’s possible to eventually overflow without some kind of intervention. […]

The post PG Phriday: Around the World in Two Billion Transactions appeared first on 2ndQuadrant | PostgreSQL.

カテゴリー: postgresql

Ernst-Georg Schmid: The Hare and the Hedgehog. Muscle, brain - or both?

2019-04-19(金) 07:22:00
In the famous fairy tale the hedgehog wins the race against the hare because he uses his brain to outwit the much faster hare: Brain beats muscle. But is that always the case? And what if we combine the two virtues?

The case at hand: Screening large sets of molecules for chemical simliarity.

Since (sub)graph isomorphism searching faces some mathematical challenges because of nonpolynomial O - even if you can use a specialized index, like pgchem::tigress does - fast similarity searching based on binary fingerprints has gained popularity in recent years.

I was tasked with evaluating a solution to the problem of similarity screening large sets of molecules with PostgreSQL where the fingerprints are generated externally, e.g. with the CDK.

This is, what I came up with...

Preparing the Racetrack
CREATE TABLE cdk.externalfp (
id int4 NOT NULL,
smiles varchar NOT NULL,
pubchemfp varbit NULL,
"cardinality" int4 NULL,
CONSTRAINT externalfp_pk PRIMARY KEY (id)
);

Above is the table definition of the final table. The cardinality column will be not used now, but since it is calculated by the fingerprint generator anyway, keeping it will save some work later. If you want to copy my example code 1:1, please use a database named chemistry and a schema named cdk.

First we need to load some data into the table. I used the free NCISMA99 dataset  from the National Cancer Institute, containing 249081 chemical structures in SMILES notation.

COPY cdk.externalfp (id, smiles) FROM '/tmp/NCISMA99' 
WITH (DELIMITER ' ', HEADER false, FORMAT csv);

And a few seconds later you should have 249081 rows in the table. Now we need to generate the fingerprints. The generator code is here, additionally you need the CDK 2.2 and a PostgreSQL JDBC driver. After changing the code to reflect your JDBC URL you are good to go.

Running the FingerprintGenerator should show no errors and takes about 30 Minutes on my Core i5 Linux Notebook. The fingerprint used is the PubChem fingerprint as described here.
Now we can put an index on the cardinality column[...]
カテゴリー: postgresql

Pavel Stehule: new release of plpgsql_check - possibility to check SQL injection issue

2019-04-19(金) 03:22:00
Yesterday I released next version of plpgsql_check.

With this release a developer can check some well known patterns of SQL injection vulnerabilities. The code of stored procedures of native languages like PL/SQL, T-SQL or PL/pgSQL is secure, and there is not a risk of SQL injection until dynamic SQL is used (the EXECUTE command in PL/pgSQL). The safe programming requires sanitization of all string variables. Anybody can use functions: quote_literal, quote_ident or format. This check can be slow, so it should be enabled by setting security_warnings parameter:

CREATE OR REPLACE FUNCTION public.foo1(a text)
RETURNS text
LANGUAGE plpgsql
AS $function$
DECLARE result text;
BEGIN
-- secure
EXECUTE 'SELECT $1' INTO result USING a;
-- secure
EXECUTE 'SELECT ' || quote_literal(a) INTO result;
-- secure
EXECUTE format('SELECT %L', a) INTO result;
-- unsecure
EXECUTE 'SELECT ''' || a || '''' INTO result;
-- unsecure
EXECUTE format(e'SELECT \'%s\'', a) INTO result;
RETURN result;
END;
$function$

postgres=# select * from plpgsql_check_function('foo1');
┌────────────────────────┐
│ plpgsql_check_function │
╞════════════════════════╡
└────────────────────────┘
(0 rows)

postgres=# select * from plpgsql_check_function('foo1', security_warnings => true);
┌─────────────────────────────────────────────────────────────────────────────┐
│ plpgsql_check_function │
╞═════════════════════════════════════════════════════════════════════════════╡
│ security:00000:11:EXECUTE:text type variable is not sanitized │
│ Query: SELECT 'SELECT ''' || a || '''' │
│ -- ^ │
│ Detail: The EXECUTE expression is SQL injection vulnerable. │
│ Hint: Use quote_ident, quote_literal or format function to secure variable. │
│ security:00000:13:EXECUTE:text type variable is not sanitized │
│ Query: SELECT format(e'SELECT \'%s\'', a) [...]
カテゴリー: postgresql

Julien Rouhaud: New in pg12: Statistics on checkums errors

2019-04-18(木) 20:02:26
Data checksums

Added in PostgreSQL 9.3, data checksums can help to detect data corruption happening on the storage side.

Checksums are only enabled if the instance was setup using initdb --data-checksums (which isn’t the default behavior), or if activated afterwards with the new pg_checksums tool also added in PostgreSQL 12.

When enabled, checksums are written each time a block is written to disk, and verified each time a block is read from disk (or from the operating system cache). If the checksum verification fails, an error is reported in the logs. If the block was read by a backend, the query will obviously fails, but if the block was read by a BASE_BACKUP operation (such as pg_basebackup), the command will continue its processing . While data checkums will only catch a subset of possible problems, they still have some values, especially if you don’t trust your storage reliability.

Up to PostgreSQL 11, any checksum validation error could only be found by looking into the logs, which clearly isn’t convenient if you want to monitor such error.

New counters available in pg_stat_database

To make checksum errors easier to monitor, and help users to react as soon as such a problem occurs, PostgreSQL 12 adds new counters in the pg_stat_database view:

commit 6b9e875f7286d8535bff7955e5aa3602e188e436 Author: Magnus Hagander <magnus@hagander.net> Date: Sat Mar 9 10:45:17 2019 -0800 Track block level checksum failures in pg_stat_database This adds a column that counts how many checksum failures have occurred on files belonging to a specific database. Both checksum failures during normal backend processing and those created when a base backup detects a checksum failure are counted. Author: Magnus Hagander Reviewed by: Julien Rouhaud

 

commit 77bd49adba4711b4497e7e39a5ec3a9812cbd52a Author: Magnus Hagander <magnus@hagander.net> Date: Fri Apr 12 14:04:50 2019 +0200 Show shared object statistics in pg_stat_database This adds a row to the pg_stat_database view with datoid 0 and datname NULL fo[...]
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Log all statements from a sample of transactions

2019-04-18(木) 17:01:31
On 3rd of April 2019, Alvaro Herrera committed patch: Log all statements from a sample of transactions   This is useful to obtain a view of the different transaction types in an application, regardless of the durations of the statements each runs.   Author: Adrien Nayrat Commit message makes it pretty clear, so let's see … Continue reading "Waiting for PostgreSQL 12 – Log all statements from a sample of transactions"
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Report progress of CREATE INDEX operations

2019-04-18(木) 10:15:52
On 2nd of April 2019, Alvaro Herrera committed patch: Report progress of CREATE INDEX operations     This uses the progress reporting infrastructure added by , adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.   There are two pieces to this: one is index-AM-agnostic, and the other is AM-specific. The latter is fairly elaborate … Continue reading "Waiting for PostgreSQL 12 – Report progress of CREATE INDEX operations"
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Generated columns

2019-04-18(木) 04:16:03
On 30th of March 2019, Peter Eisentraut committed patch: Generated columns   This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis.   This implements one kind of generated column: stored (computed on write). Another kind, … Continue reading "Waiting for PostgreSQL 12 – Generated columns"
カテゴリー: postgresql

Ibrar Ahmed: PostgreSQL CommitFest 2019-03 Ends

2019-04-17(水) 19:17:27

PostgreSQL has a very unique way to review the code submitted by developers. Most open source software uses Github pull requests to accommodate users’ code. PostgreSQL has a Github page but doesn’t  manage pull requests using Github. Many years ago, PostgreSQL introduced CommitFest to manage its patches, where postgres opens for a three to four month “festival” to accept patches. The CommitFest is set up to keep track of patches’ statuses. In practice, CommitFest is mainly an admin page used to manage patches. Volunteers and community committers review the submitted code, after which committers are able commit those patches that have been approved. There have been almost 22 CommitFest events since 2014.

The process

When a patch is submitted, a volunteer can choose to review it. A review can be done by more than one person. At any point, a patch will be in one of the following statuses:

  • Committed
  • Moved to next CF (CommitFest)
  • Needs Review
  • Ready for Committer
  • Rejected
  • Returned with Feedback
  • Waiting on Author
  • Withdrawn

Needs review indicates that the patch is waiting for a reviewer to analyze it. Once a reviewer has confirmed that the patch in question is good to be committed, a community committer will pick up the patch, review it and commit it if all is well, otherwise, they reject it or return it with feedback.

Patch categories

Patches are categorized in various ways including:

  • Bug Fixes
  • Clients
  • Code Comments
  • Documentation
  • Miscellaneous
  • Monitoring & Control
  • Performance
  • Procedural Languages
  • Refactoring
  • Replication & Recovery
  • SQL Commands
  • Server Features
  • System Administration

A complete history of a patch, along with a discussion on email, is maintained with each revision which makes it very simple to track the complete process from submission of a patch to its eventual conclusion.

Here are some graphs that show various stats for the past CommitFests.

 

 

Now it’s time to end the 22nd CommitFest, the first such festival for 2019. This event saw 207 patches, almost 52% of which were committed to the master b

[...]
カテゴリー: postgresql

Hans-Juergen Schoenig: DECLARE CURSOR in PostgreSQL or how to reduce memory consumption

2019-04-17(水) 17:00:59

Have you ever heard about cursors in PostgreSQL or in SQL in general? If not you should definitely read this article in depth and learn how to reduce memory consumption in PostgreSQL easily. Cursors have been around for many years and are in my judgement one of the most underappreciated feature of all times. Therefore it makes sense to take a closer look at cursors and see what they can be used for.

The purpose of a cursor in PostgreSQL

Consider the following example:

test=# CREATE TABLE t_large (id int); CREATE TABLE test=# INSERT INTO t_large SELECT * FROM generate_series(1, 10000000); INSERT 0 10000000

I have created a table containing 10 million rows so that we can play with the data. Let us run a simple query now:

test=# SELECT * FROM t_large; id ---------- 1 2 3 …

The first thing you will notice is that the query does not return immediately. There is a reason for that: PostgreSQL will send the data to the client and the client will return as soon as ALL the data has been received. If you happen to select a couple thousand rows, life is good and everything will be just fine. However, what happens if you do a “SELECT * …” on a table containing 10 billion rows? Usually the client will die with an “out of memory” error and your applications will simply die. There is no way to keep such a large table in memory. Throwing ever more RAM at the problem is not feasible either (and pretty stupid too).

Using DECLARE CURSOR and FETCH

DECLARE CURSOR and FETCH can come to the rescue. What is the core idea? We can fetch data in small chunks and only prepare the data at the time it is fetched – not earlier. Here is how it works:

test=# BEGIN; BEGIN test=# DECLARE mycur CURSOR FOR SELECT * FROM t_large WHERE id > 0; DECLARE CURSOR test=# FETCH NEXT FROM mycur; id ---- 1 (1 row) test=# FETCH 4 FROM mycur; id ---- 2 3 4 5 (4 rows) test=# COMMIT; COMMIT

The first important thing to notice is that a cursor can only be declared inside a transaction. However,

[...]
カテゴリー: postgresql

Robins Tharakan: Another look at Replica Lag :)

2019-04-17(水) 11:59:00
The other day, I remembered an old 9.0-era mail thread (when Streaming Replication had just launched) where someone had tried to daisy-chain Postgres Replicas and see how many (s)he could muster.

If I recall correctly, the OP could squeeze only ~120 or so, mostly because the Laptop memory gave way (and not really because of an engine limitation).

I couldn't find that post, but it was intriguing to know if we could reach (at least) a thousand mark and see what kind of "Replica Lag" would that entail; thus NReplicas.

On a (very) unscientific test, my 4-Core 16G machine can spin-up a 1000 Replicas in ~8m (and tear them down in another ~2m). Now am sure this could get better, but am not complaining since this was a breeze to setup (in that it just worked without much tinkering ... besides lowering shared_buffers).

For those interested, a single UPDATE on the master, could (nearly consistently) be seen on the last Replica in less than half a second, with top showing 65% CPU idle (and 3.5 on the 1-min CPU metric) during a ~15 minute test.

So although (I hope) this isn't a real-world use-case, I still am impressed that without much tweaking, we're way under the 1 second mark, and that's right out of the box.

Am sure there's more to squeeze here, but still felt this was worthy of a small post nonetheless!
カテゴリー: postgresql

ページ