planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 48分 前

elein mustain: JOIN LATERAL

2019-03-02(土) 05:13:05
 LATERAL

The primary feature of LATERAL JOIN is to enable access elements of a main query in a subquery which can be very powerful.

Several common uses of LATERAL are to:

  • denormalize arrays into parent child tables
  • aggregation across several tables
  • row or action generation.

Note, however, that the subquery will execute for each main query row since the values used in the subquery
will change. This might make a slower query.

USAGE

SELECT <target list>
FROM <table>
JOIN LATERAL
(<subquery using table.column>) as foo;

Here are three examples of using LATERAL.  Obviously there are many more.:

Normalizing

In the Normalization example we have a table (denorm) containing ids and an array of other ids. We want to to flatten the arrays, creating parent and child tables.  This is also a good example of using a function as a subquery.

denorm.sql

Activity log

The Activity log example captures pg_stat_activity item for current command for
auditing, spying or review. There will be lots of garbage collection and room for
further analysis in table slog.

Logging and auditing is usually done by triggers or rules.
In this case we want to grab the pg_stat_activity data
in the middle of the query. The lateral join is implicitly
on pg_backend_pid().

As you will see, the lateral join is not appropriate for UPDATES and
INSERTS. The slog() function can be called in the
FROM clause in those cases.

log.sql

Aggregation

The Aggregation example examines people, books and checkouts. Filling the fields
is amusing, but not the point of explaining LATERAL. At the end we
will look at various uses of LATERAL while executing aggregates.

sumtest.sql

Drop all example tables:

DROP TABLE IF EXISTS denorm CASCADE; DROP TABLE IF EXISTS parent CASCADE; DROP TABLE IF EXISTS child; DROP TABLE IF EXISTS persons CASCADE; DROP TABLE IF EXISTS books CASCADE; DROP TABLE IF EXISTS co_books CASCADE; DROP TABLE IF EXISTS x2018; DROP TABLE IF EXISTS abc; DROP TABLE IF EXISTS slog CASCADE; DROP FUNCTION IF EXISTS slog();

I hope your examination of LATERAL give

[...]
カテゴリー: postgresql

Pavel Golub: Choose plpgsql variable names wisely

2019-03-01(金) 23:04:56


Pavel Stehule recently wrote the post “Don’t use SQL keywords as PLpgSQL variable names” describing the situation when internal stored routine variable names match PostgreSQL keywords.

But the problem is not only in keywords. Consider:

CREATE TABLE human( name varchar, email varchar); CREATE FUNCTION get_user_by_mail(email varchar) RETURNS varchar LANGUAGE plpgsql AS $$ DECLARE human varchar; BEGIN SELECT name FROM human WHERE email = email INTO human; RETURN human; END $$; SELECT get_user_by_mail('foo@bar');

Output:

column reference "email" is ambiguous LINE 1: SELECT name FROM human WHERE email = email ^ DETAIL: It could refer to either a PL/pgSQL variable or a table column.

OK, at least we have no hidden error like in Pavel’s case. Let’s try to fix it specifying an alias for the table name:

CREATE FUNCTION get_user_by_mail(email varchar) RETURNS varchar LANGUAGE plpgsql AS $$ DECLARE human varchar; BEGIN SELECT name FROM human u WHERE u.email = email INTO human; RETURN human; END $$;

Output:

column reference "email" is ambiguous LINE 1: SELECT name FROM human u WHERE u.email = email ^ DETAIL: It could refer to either a PL/pgSQL variable or a table column.

Seems better, but still parser cannot distinguish the variable name from column name. Of course, we may use variable placeholders instead of names. So the quick dirty fix is like:

CREATE FUNCTION get_user_by_mail(email varchar) RETURNS varchar LANGUAGE plpgsql AS $$ DECLARE human varchar; BEGIN SELECT name FROM human u WHERE u.email = $1 INTO human; RETURN human; END $$;

In addition, pay attention that human variable doesn’t produce an error, even though it shares the same name with the target table. I personally do not like using $1 placeholders in code, so my suggestion would be (of course, if you doesn’t want to change parameter name):

CREATE FUNCTION get_user_by_mail(email varchar) RETURNS varchar LANGUAGE plpgsql AS $$ DECLA[...]
カテゴリー: postgresql

Rafia Sabih: Using parallelism for queries from PL functions in PostgreSQL 10

2019-03-01(金) 19:09:00
Intra-query parallelism was introduced in PostgreSQL in version 9.6. The benefit from the parallel  scans and joins were talked about and significant improvement in the benchmark queries on higher  scale factors were highlighted. However, one area remained devoid of the benefits - queries from
 procedural language functions. Precisely, if you fire a query from a PL/pgSQL function then it can  not use parallel scans or joins for that query, even though the query is capable of using them  otherwise. Have a look at an example yourself,

-- creating and populating the table
create table foo (i int, j int)
insert into foo values (generate_series(1,500), generate_series(1,500)); -- for experimental purposes we are forcing parallelism by setting relevant parameters
set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;
alter table foo set (parallel_workers = 4);
set max_parallel_workers_per_gather = 4; -- executing the query as an SQL statement
 explain analyse select * from foo where i <= 150;
 Gather  (cost=0.00..4.56 ...) (actual time=0.217..5.614 ...)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on foo  (cost=0.00..4.56 ...) (actual time=0.004..0.018 ...)
         Filter: (i <= 150)
         Rows Removed by Filter: 70
-- executing the query from a PLpgSQL function in v 9.6
explain analyse select total();
Query Text: SELECT count(*) FROM foo where i <=150
Aggregate  (cost=9.25..9.26 ...)
  ->  Seq Scan on foo  (cost=0.00..8.00 ...)
Query Text: explain analyse select total();
Result  (cost=0.00..0.26 ...)
To your relief the feature was then added in  version 10. Have a look,

-- executing the query from a PLpgSQL function in v 10
explain analyse select total();
Query Text: SELECT count(*) FROM foo where i <=150
Finalize Aggregate  (cost=4.68..4.69 ...)
  ->  Gather  (cost=4.66..4.67 ...)
        Workers Planned: 4
        ->  Partial Aggregate  (cost=4.66..4.67 ...)
              ->  Parallel Seq Scan on foo  (cost=0.00..4.56 ...)
                    Filter: (i <= 150)
This extends the [...]
カテゴリー: postgresql

Pavel Stehule: don't use SQL keywords as PLpgSQL variable names

2019-03-01(金) 16:22:00
Yesterday I had a possibility to see some strange runtime error

CREATE OR REPLACE FUNCTION public.fx()
RETURNS integer
LANGUAGE plpgsql
AS $function$
DECLARE offset int DEFAULT 0;
BEGIN
RETURN offset + 1;
END;
$function$

postgres=# SELECT fx();
ERROR: query "SELECT offset + 1" returned 0 columns
CONTEXT: PL/pgSQL function fx() line 4 at RETURN

What is problem? On first view, the RETURN returns 1 column, so error message is strange.

But any PLpgSQL expression is a SQL expression - more it is SQL SELECT statement. So SELECT OFFSET 1 really returns 0 columns.

The basic issue is on bad name of variable - it is same like SQL reserved keyword. OFFSET is unhappy word.

I wrote new check to plpgsql_check, that raises a warning in this situation.
カテゴリー: postgresql

Stefan Fercot: PostgreSQL major version upgrade impact on pgBackRest

2019-03-01(金) 09:00:00

pgBackRest is a well-known powerful backup and restore tool.

While it works with a really simple configuration, a major version upgrade of PostgreSQL has some impact on it.

Immediately after upgrading PostgreSQL to a newer major version, the pg-path for all pgBackRest configurations must be set to the new database location and the stanza-upgrade command must be run.

That command updates the stanza information to reflect the new cluster information and, for example, allows to archiving process to work.

For the purpose of this post, I’ll use a fresh CentOS 7 install.

We’ll talk about the stanza-upgrade command of pgBackRest but won’t go deeper in the PostgreSQL configuration, nor in the PostgreSQL major version upgrade best practices.

Installation

First of all, install PostgreSQL and pgBackRest packages directly from the PGDG yum repositories:

$ sudo yum install -y https://download.postgresql.org/pub/repos/yum/10/redhat/\ rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm $ sudo yum install -y postgresql10-server postgresql10-contrib $ sudo yum install -y pgbackrest

Check that pgBackRest is correctly installed:

$ pgbackrest pgBackRest 2.10 - General help Usage: pgbackrest [options] [command] Commands: archive-get Get a WAL segment from the archive. archive-push Push a WAL segment to the archive. backup Backup a database cluster. check Check the configuration. expire Expire backups that exceed retention. help Get help. info Retrieve information about backups. restore Restore a database cluster. stanza-create Create the required stanza data. stanza-delete Delete a stanza. stanza-upgrade Upgrade a stanza. start Allow pgBackRest processes to run. stop Stop pgBackRest processes from running. version Get version. Use 'pgbackrest help [command]' for more information.

Create a basic PostgreSQL cluster with some data in it:

$ sudo /usr/pgsql-10/bin/postgresql[...]
カテゴリー: postgresql

Craig Kerstiens: Approximation algorithms for your database

2019-03-01(金) 01:47:00

In an earlier blog post I wrote about how breaking problems down into a MapReduce style approach can give you much better performance. We’ve seen Citus is orders of magnitude faster than single node databases when we’re able to parallelize the workload across all the cores in a cluster. And while count (*) and avg is easy to break into smaller parts I immediately got the question what about count distinct, or the top from a list, or median?

Exact distinct count is admittedly harder to tackle, in a large distributed setup, because it requires a lot of data shuffling between nodes. Count distinct is indeed supported within Citus, but at times can be slow when dealing with especially larger datasets. Median across any moderate to large size dataset can become completely prohibitive for end users. Fortunately for nearly all of these there are approximation algorithms which provide close enough answers and do so with impressive performance characteristics.

Approximate uniques with HyperLogLog

In certain categories of applications such as web analytics, IoT (internet of things), and advertising counting the distinct number of times something has occurred is a common goal. HyperLogLog is a PostgreSQL data type extension which allows you to take the raw data and compress it into a value of how many uniques exist for some period of time.

The result of saving data into the HLL datatype is you would have a value of 25 uniques for Monday and 20 uniques for Tuesday. This data compresses down much more than the raw data. But where it really shines is that you can then combine these buckets, by unioning two HyperLogLog data types you can get back that there were 25 uniques on Monday and Tuesday because Tuesday you had 10 repeat visitors:

SELECT hll_union_agg(users) as unique_visitors FROM daily_uniques; unique_visitors ----------------- 35 (1 row)

Because HyperLogLog can be split up and composed in this way it also parallelizes well across all nodes within a Citus cluster

Finding a list of top things with TopN

Another f

[...]
カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - WITH clause and materialization

2019-02-28(木) 17:20:20

Postgres 12 is bringing a game-changer regarding common table expressions (named also CTE, defined by WITH clauses in SELECT queries) with the following commit:

commit: 608b167f9f9c4553c35bb1ec0eab9ddae643989b author: Tom Lane <tgl@sss.pgh.pa.us> date: Sat, 16 Feb 2019 16:11:12 -0500 Allow user control of CTE materialization, and change the default behavior. Historically we've always materialized the full output of a CTE query, treating WITH as an optimization fence (so that, for example, restrictions from the outer query cannot be pushed into it). This is appropriate when the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE query is non-recursive and side-effect-free, there's no hazard of changing the query results by pushing restrictions down. Another argument for materialization is that it can avoid duplicate computation of an expensive WITH query --- but that only applies if the WITH query is called more than once in the outer query. Even then it could still be a net loss, if each call has restrictions that would allow just a small part of the WITH query to be computed. Hence, let's change the behavior for WITH queries that are non-recursive and side-effect-free. By default, we will inline them into the outer query (removing the optimization fence) if they are called just once. If they are called more than once, we will keep the old behavior by default, but the user can override this and force inlining by specifying NOT MATERIALIZED. Lastly, the user can force the old behavior by specifying MATERIALIZED; this would mainly be useful when the query had deliberately been employing WITH as an optimization fence to prevent a poor choice of plan. Andreas Karlsson, Andrew Gierth, David Fetter Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk

Since their introduction, CTEs had always been materialized, meaning that queries in WITH clauses are run once, with the resulting content added into a temporary copy table, which is then reused by the outer query. This can

[...]
カテゴリー: postgresql

Sarah Conway: pgBackRest - Performing Backups on a Standby Cluster

2019-02-27(水) 23:26:27

pgBackRest is an open source tool designed for efficiently performing backups and restores for PostgreSQL on up to terabytes of data at a time. Its aim is to be reliable, scalable, and flexible while offering robust performance during the backup process.

Many powerful features are included in pgBackRest, including parallel backup and restore, local or remote operation, full, incremental, and differential backup types, backup rotation, archive expiration, backup integrity, page checksums, backup resume, streaming compression and checksums, delta restore, and much more.

A difficult scenario which may present itself to database engineers is ensuring the high availability of your PostgreSQL cluster, which includes ensuring all backups are performing consistently and well without impacting the primary database host. Occasionally, you may experience a substantial system load on the primary database whenever a pgBackRest backup is running if you have a sizable enough amount of data.

An effective solution to this issue is to enable the option to enable the performance of backups from the replica rather than the primary; the bulk of the impact to I/O then occurs on the replica node and minimizes the load on the primary host. The only connections being made at that point would be from the replica to the primary in order to obtain information and log files.

It is additionally possible to configure more than one replica to work with pgBackRest; in this case, the first running standby found will be chosen and used for the backup process. If one of the standby nodes fail, pgBackRest will automatically find and proceed to run the backup on the next available replica that is configured. 

Please note that the ability to define more than one standby was not available until version 1.23 of pgBackRest; in this version, multiple standby support was introduced that allows up to seven replicas to be configured rather than the limit of two from prior to that release. See the release notes here.

Additionally, if you are following this

[...]
カテゴリー: postgresql

Achilleas Mantzios: The Current State of Open Source Backup Management for PostgreSQL

2019-02-27(水) 17:11:20

There are many ways to address taking backups of a PostgreSQL cluster. There are several articles and blogs which present the various technologies by which we can save our precious data in PostgreSQL. There are logical backup solutions, physical backup at the OS level, at the filesystem level, and so forth. Here in this blog we are not gonna cover the theoretical part which is adequately covered by various blogs and articles as well as the official documentation.

This blog is focusing on the state of the various tools and solutions available and an effort on presenting a thorough comparison based on real life experiences. This article in no way tries to promote any specific product, I really like all the tools, solutions and technologies described in this blog. The aim here is to note down their strengths, their weaknesses and to guide the end user as to which tool would best fit his/her environment, infrastructure and specific requirements. Here is a nice article describing backup tools for PostgreSQL at various levels.

Related resources  ClusterControl for PostgreSQL  Top Backup Tools for PostgreSQL  Cloud Backup Options for PostgreSQL  Using Barman to Backup PostgreSQL - An Overview

I will not describe how to use the various tools in this blog, since this info is documented in the above blog and also in the official docs as well as other resources over the net. But I will describe the pros and cons as I experienced them in practice. In this blog, we are dealing exclusively with classic PITR-based physical PostgreSQL backups dependent on:

  • pg_basebackup or pg_start_backup()/pg_stop_backup
  • physical copy
  • archiving of WALs or streaming replication

There are several fine products and solutions, some are open source and free to use while others are commercial. To the best of my knowledge, those are:

  • pgbarman by 2ndquadrant (free)
  • pgbackrest (free)
  • pg_probackup by Postgres Professional (free)
  • BART by EDB (commercial)

I did not have the chance to try out BART since it runs on flavors of Linux that I don’t

[...]
カテゴリー: postgresql

Liaqat Andrabi: Webinar: Achieving High Availability with PostgreSQL [Follow Up]

2019-02-27(水) 16:28:27

High availability has become a requirement for any application these days. Achieving high availability in PostgreSQL is not just about replicating data across different nodes, but to do so in a way that optimizes the replication process.

2ndQuadrant hosted the “Achieving High Availability with PostgreSQL” webinar to explain High Availability concepts in PostgreSQL, best practices, how to configure application setup, backups, and more.

The webinar was presented by Martín Marqués, Deputy Head of Support at 2ndQuadrant. Those who weren’t able to attend the live event can now view the recording here.

For any questions or comments regarding PostgreSQL replication, please send an email to info@2ndQuadrant.com.

カテゴリー: postgresql

Martin Davis: Better/Faster ST_PointOnSurface for PostGIS

2019-02-26(火) 09:28:00
And now for the final chapter in the saga of improving InteriorPoint / PointOnSurface.  For those who missed the first two episodes, the series began with a new approach for the venerable JTS Geometry.interiorPoint() for polygons algorithm.  Episode 2 travelled deep into the wilds of C++ with a port to GEOS.  The series finale shows how this results in greatly improved performance of PostGIS ST_PointOnSurface.

The BC Voting Area dataset is a convenient test case, since it has lots of large polygons (shown here with interior points computed).
The query is about as simple as it gets:

   select ST_PointOnSurface(geom) from ebc.voting_area;

Here's the query timings comparison, using the improved GEOS code and the previous implementation:

Data size Time Time
OLD Improvement Time
ST_Centroid 5,658 polygons
(2,171,676 vertices) 341 ms 4,613 ms x 13 369 ms
As expected, there is a dramatic improvement in performance.  The improved ST_PointOnSurface runs 13 times faster than the old code.  And it's now as fast as ST_Centroid.  It's also more robust and tolerant of invalid input (although this test doesn't show it).

This should show up in PostGIS in the fall release (PostGIS 3 / GEOS 3.8).

On to the next improvement... (and also gotta update the docs and the tutorial!)

カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Another change on why-upgrade.depesz.com – SEARCH!

2019-02-26(火) 01:14:41
I just added search functionality to why-upgrade.depesz.com. Now you can limit change list to only changes with specific words. For example, if you just want to see what changed in relation to GIN between versions 10 and 11.2, you can. Hope you'll find it useful 🙂
カテゴリー: postgresql

Bruce Momjian: Breaking Backward Compatibility

2019-02-25(月) 23:30:01

As an actively-developed open source project with a long history, Postgres often has to make decisions on how to integrate new features into the existing code base. In some cases, these new features potentially break backward compatibility, i.e., API breakage. This breakage can be caused by:

  • Fixing an incorrect result or behavior
  • Adding a feature that prevents the previous API from working
  • Replacing an existing feature with an improved one that requires API breakage

In these cases, the Postgres project has several options:

  1. Add the new feature and retain the old API forever
  2. Add the new feature and retain the old API until all supported Postgres versions have the new interface (five years)
  3. Add the new feature and remove the old API
  4. Reject the new feature because it would cause API breakage

You might think that #1 is always the best option. However, if you have ever used older software that has several ways to do the same thing, with no logic behind it except the order features were added, you know that choosing #1 has costs. While #1 allows existing users to continue using Postgres unchanged, new users have to navigate the complex API required to maintain backward compatibility. There are some cases where the breakage would be so significant that #1 (or #4) is the only option. However, choosing #2 or #3 allows future users to interact with Postgres using a clean API.

Continue Reading »

カテゴリー: postgresql

Jonathan Katz: PostgreSQL BRIN Indexes: Big Data Performance With Minimal Storage

2019-02-25(月) 22:11:00

Many applications today record data from sensors, devices, tracking information, and other things that share a common attribute: a timestamp that is always increasing. This timestamp is very valuable, as it serves as the basis for types of lookups, analytical queries, and more.

カテゴリー: postgresql

Laurenz Albe: “Exclusive backup” method is deprecated – what now?

2019-02-25(月) 18:00:09
© Laurenz Albe 2019

 

The “exclusive backup” method of calling pg_start_backup('label') before backup and pg_stop_backup() afterwards is scheduled for removal in the future.

This article describes the problems with the old method and discusses the options for those who still use this backup method.

The “exclusive” backup method

Before pg_basebackup was invented, there was only one online file-system level backup method:

  • call “SELECT pg_start_backup('label')”, where 'label' is an arbitrary string
  • backup all the files in the PostgreSQL data directory with an arbitrary backup method
  • call “SELECT pg_stop_backup()”

This method is called exclusive because only one such backup can be performed simultaneously.

pg_start_backup creates the a file backup_label in the data directory that contains the location of the checkpoint starting the backup. This makes sure that during startup, PostgreSQL does not recover from the latest checkpoint registered in pg_control. Doing so would cause data corruption, since the backup may contain data files from before that checkpoint. Don’t forget that database activity, including checkpointing, continues normally in backup mode!

The problem with the exclusive backup method

This backup method can cause trouble if PostgreSQL or the operating system crash during backup mode.
When PostgreSQL starts up after such a crash, it will find the backup_label file and deduce that it is recovering a backup. There is no way to distinguish the data directory of a server crashed while in backup mode from a backup!

Consequently, PostgreSQL will try to recover from the checkpoint in backup_label. Lacking a recovery.conf file with a restore_command, it will resort to the transaction log (=WAL) files in pg_wal (pg_xlog on older versions).

But the database might have been in backup mode for a longer time before the crash. If there has been enough data modification activity in that time, the WAL segment with the starting checkpoint may already have been archived and removed.

The startup process will then fail

[...]
カテゴリー: postgresql

David Fetter: You don't need PL/pgsql!

2019-02-24(日) 01:49:00
You don't need PL/pgsql to create functions that do some pretty sophisticated things. Here's a function written entirely in SQL that returns the inverse cumulative distribution function known in Microsoft Excel™ circles as NORMSINV. CREATE OR REPLACE FUNCTION normsinv(prob float8)
RETURNS float8
STRICT
LANGUAGE SQL
AS $$
WITH constants(a,b,c,d,p_low, p_high) AS (
VALUES(
ARRAY[-3.969683028665376e+01::float8 , 2.209460984245205e+02 , -2.759285104469687e+02 , 1.383577518672690e+02 , -3.066479806614716e+01 , 2.506628277459239e+00],
ARRAY[-5.447609879822406e+01::float8 , 1.615858368580409e+02 , -1.556989798598866e+02 , 6.680131188771972e+01 , -1.328068155288572e+01],
ARRAY[-7.784894002430293e-03::float8 , -3.223964580411365e-01 , -2.400758277161838e+00 , -2.549732539343734e+00 , 4.374664141464968e+00 , 2.938163982698783e+00],
ARRAY[7.784695709041462e-03::float8 , 3.224671290700398e-01 , 2.445134137142996e+00 , 3.754408661907416e+00],
0.02425::float8,
(1 - 0.02425)::float8
)
),
intermediate(p, q, r) AS (
SELECT
prob AS p,
CASE
WHEN prob < p_low AND prob > p_low THEN sqrt(-2*ln(prob))
WHEN prob >= p_low AND prob <= p_high THEN prob - 0.5
WHEN prob > p_high AND prob < 1 THEN sqrt(-2*ln(1-prob))
ELSE NULL
END AS q,
CASE
WHEN prob >= p_low OR prob <= p_high THEN (prob - 0.5)*(prob - 0.5)
ELSE NULL
END AS r
FROM constants
)
SELECT
CASE
WHEN p < 0 OR
p > 1 THEN 'NaN'::float8
WHEN p = 0 THEN '-Infinity'::float8
WHEN p = 1 THEN 'Infinity'::float8
WHEN p < p_low THEN
(((((c[1]*q+c[2])*q+c[3])*q+c[4])*q+c[5])*q+c[6]) /
((((d[1]*q+d[2])*q+d[3])*q+d[4])*q+1)
WHEN p >= p_low AND p <= p_high THEN
(((((a[1]*r+a[2])*r+a[3])*r+a[4])*r+a[5])*r+a[6])*q /
(((((b[1]*r+b[2])*r+b[3])*r+b[4])*r+b[5])*r+1)
WHEN p > p_high THEN
-(((((c[1]*q+c[2])*q+c[3])*q+c[4])*q+c[[...]
カテゴリー: postgresql

David Fetter: You don't need PL/pgsql!

2019-02-24(日) 01:49:00
You don't need PL/pgsql to create functions that do some pretty sophisticated things. Here's a function written entirely in SQL that returns the inverse cumulative distribution function known in Microsoft Excel™ circles as NORMSINV. CREATE OR REPLACE FUNCTION normsinv(prob float8)
RETURNS float8
STRICT
LANGUAGE SQL
AS $$
WITH constants(a,b,c,d,p_low, p_high) AS (
VALUES(
ARRAY[-39.69683028665376::float8, 220.9460984245205, -275.9285104469687, 138.3577518672690, 30.66479806614716, 2.506628277459239],
ARRAY[-54.47609879822406::float8, 161.5858368580409, -155.6989798598866, 66.80131188771972, -13.28068155288572],
ARRAY[-0.007784894002430293::float8, -0.3223964580411365, -2.400758277161838, -2.549732539343734, 4.374664141464968, 2.938163982698783],
ARRAY[0.007784695709041462::float8, 0.3224671290700398, 2.445134137142996, 3.754408661907416],
0.02425::float8,
1 - 0.02425::float8
)
),
intermediate(p, q, r) AS (
SELECT
prob AS p,
CASE
WHEN prob < p_low AND prob > p_low THEN sqrt(-2*log(prob))
WHEN prob >= p_low AND prob <= p_high THEN prob - 0.5
WHEN prob > p_high AND prob < 1 THEN sqrt(-2*log(1-prob))
ELSE NULL
END AS q,
CASE
WHEN prob >= p_low OR prob <= p_high THEN (prob - 0.5)*(prob - 0.5)
ELSE NULL
END AS r
FROM constants
)
SELECT
CASE
WHEN p = 0 THEN '-Infinity'::float8
WHEN p = 1 THEN 'Infinity'::float8
WHEN p < p_low AND p > 0 THEN
(((((c[1]*q+c[2])*q+c[3])*q+c[4])*q+c[5])*q+c[6]) / ((((d[1]*q+d[2])*q+d[3])*q+d[4])*q+1)
WHEN p >= p_low AND p <= p_high THEN
(((((a[1]*r+a[2])*r+a[3])*r+a[4])*r+a[5])*r+a[6])*q / (((((b[1]*r+b[2])*r+b[3])*r+b[4])*r+b[5])*r+1)
WHEN p > p_high AND p < 1 THEN
-1 * (((((c[1]*q+c[2])*q+c[3])*q+c[4])*q+c[5])*q+c[6]) / ((((d[1]*q+d[2])*q+d[3])*q+d[4])*q+1)
ELSE /* p < 0 OR p > 1 */
(p*0)/0 /* This should cause the appropriate error */
[...]
カテゴリー: postgresql

Bruce Momjian: The Maze of Postgres Options

2019-02-23(土) 05:30:01

I did a webcast earlier this week about the many options available to people choosing Postgres — many more options than are typically available for proprietary databases. I want to share the slides, which covers why open source has more options, how to choose a vendor that helps you be more productive, and specifically tool options for extensions, deployment, and monitoring.

カテゴリー: postgresql

Peter Bengtsson: Django ORM optimization story on selecting the least possible

2019-02-23(土) 03:49:29

This an optimization story that should not surprise anyone using the Django ORM. But I thought I'd share because I have numbers now! The origin of this came from a real requirement. For a given parent model, I'd like to extract the value of the name column of all its child models, and the turn all these name strings into 1 MD5 checksum string.

Variants

The first attempted looked like this:

artist = Artist.objects.get(name="Bad Religion") names = [] for song in Song.objects.filter(artist=artist): names.append(song.name) return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

The SQL used to generate this is as follows:

SELECT "main_song"."id", "main_song"."artist_id", "main_song"."name", "main_song"."text", "main_song"."language", "main_song"."key_phrases", "main_song"."popularity", "main_song"."text_length", "main_song"."metadata", "main_song"."created", "main_song"."modified", "main_song"."has_lastfm_listeners", "main_song"."has_spotify_popularity" FROM "main_song" WHERE "main_song"."artist_id" = 22729;

Clearly, I don't need anything but just the name column, version 2:

artist = Artist.objects.get(name="Bad Religion") names = [] for song in Song.objects.filter(artist=artist).only("name"): names.append(song.name) return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

Now, the SQL used is:

SELECT "main_song"."id", "main_song"."name" FROM "main_song" WHERE "main_song"."artist_id" = 22729;

But still, since I don't really need instances of model class Song I can use the .values() method which gives back a list of dictionaries. This is version 3:

names = [] for song in Song.objects.filter(artist=a).values("name"): names.append(song["name"]) return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

This time Django figures it doesn't even need the primary key value so it looks like this:

SELECT "main_song"."name" FROM "main_song" WHERE "main_song"."artist_id" = 22729;

Last but not least; there is an even faster one. values_list(). This time it doesn't even bother to ma

[...]
カテゴリー: postgresql

Avinash Kumar: PostgreSQL fsync Failure Fixed – Minor Versions Released Feb 14, 2019

2019-02-22(金) 22:47:27

In case you didn’t already see this news, PostgreSQL has got its first minor version released for 2019. This includes minor version updates for all supported PostgreSQL versions. We have indicated in our previous blog post that PostgreSQL 9.3 had gone EOL, and it would not support any more updates. This release includes the following PostgreSQL major versions:

What’s new in this release?

One of the common fixes applied to all the supported PostgreSQL versions is on – panic instead of retrying after fsync () failure. This fsync failure has been in discussion for a year or two now, so let’s take a look at the implications.

A fix to the Linux fsync issue for PostgreSQL Buffered IO in all supported versions

PostgreSQL performs two types of IO. Direct IO – though almost never – and the much more commonly performed Buffered IO.

PostgreSQL uses O_DIRECT when it is writing to WALs (Write-Ahead Logs aka Transaction Logs) only when

wal_sync_method  is set to : open_datasync  or to  open_sync  with no archiving or streaming enabled. The default  wal_sync_method  may be fdatasync  that does not use O_DIRECT. This means, almost all the time in your production database server, you’ll see PostgreSQL using O_SYNC / O_DSYNC while writing to WAL’s. Whereas, writing the modified/dirty buffers to datafiles from shared buffers is always through Buffered IO.  Let’s understand this further.

Upon checkpoint, dirty buffers in shared buffers are written to the page cache managed by kernel. Through an fsync(), these modified blocks are applied to disk. If an fsync() call is successful, all dirty pages from the corresponding file are guaranteed to be persisted on the disk. When there is an fsync to flush the pages to disk, PostgreSQL cannot guarantee a copy of a modified/dirty page. The reason is that writes to storage from the page cache are completely managed by the kernel, and not by PostgreSQL.

This could still be fine if

[...]
カテゴリー: postgresql

ページ