Sebastian Insausti: Scaling Postgresql for Large Amounts of Data

planet postgresql - 4時間 36分

Nowadays, it’s common to see a large amount of data in a company’s database, but depending on the size, it could be hard to manage and the performance could be affected during high traffic if we don’t configure or implement it in a correct way. In general, if we have a huge database and we want to have a low response time, we’ll want to scale it. PostgreSQL is not the exception to this point. There are many approaches available to scale PostgreSQL, but first, let’s learn what scaling is.

Scalability is the property of a system/database to handle a growing amount of demands by adding resources.

The reasons for this amount of demands could be temporal, for example, if we’re launching a discount on a sale, or permanent, for an increase of customers or employees. In any case, we should be able to add or remove resources to manage these changes on the demands or increase in traffic.

In this blog, we’ll look at how we can scale our PostgreSQL database and when we need to do it.

Horizontal Scaling vs Vertical Scaling

There are two main ways to scale our database...

  • Horizontal Scaling (scale-out): It’s performed by adding more database nodes creating or increasing a database cluster.
  • Vertical Scaling (scale-up): It’s performed by adding more hardware resources (CPU, Memory, Disk) to an existing database node.

For Horizontal Scaling, we can add more database nodes as slave nodes. It can help us to improve the read performance balancing the traffic between the nodes. In this case, we’ll need to add a load balancer to distribute traffic to the correct node depending on the policy and the node state.

Related resources  ClusterControl for PostgreSQL  Scaling Connections in PostgreSQL using Connection Pooling  How to Deploy PostgreSQL for High Availability

To avoid a single point of failure adding only one lo

カテゴリー: postgresql

php[world] 2019 — 25 years of PHP

php.net - 21時間 33分
カテゴリー: php

Craig Kerstiens: Postgres tips for the average and power user

planet postgresql - 2019-07-17(水) 23:05:00

Personally I’m a big fan of email, just like blogging. To me a good email thread can be like a good novel where you’re following along always curious for what comes next. And no, I don’t mean the ones where there is an email to all-employees@company.com and someone replies all, to only receive reply-all’s to not reply-all. I mean ones like started last week internally among the Azure Postgres team.

The first email was titled: Random Citus development and psql tips, and from there it piled on to be more and more tips and power user suggestions for Postgres. Some of these tips are relevant if you’re working directly on the Citus codebase, others relevant as anyone that works with Postgres, and some useful for debugging Postgres internals. While the thread is still ongoing here is just a few of the great tips:

In psql, tag your queries and use Ctrl+R

Psql supports Ctrl+R to search previous queries you ran. For demos and when testing complex scenarios, I like adding a little comment to queries that then becomes the tag by which I can later find the query:

# SELECT count(*) FROM test; -- full count ┌───────┐ │ count │ ├───────┤ │ 0 │ └───────┘ (1 row) Time: 127.124 ms (reverse-i-search)`f': SELECT count(*) FROM test; -- full count

In most cases, 2-3 letters is going to be enough to find the query.

Better psql output

I find \x lacking, but pspg is great. It is available from PGDG via sudo yum install -y pspg or the equivalent on your system. I have the following .psqlrc which sets up pspg with a very minimalistic configuration:

$ cat > ~/.psqlrc \timing on \pset linestyle unicode \pset border 2 \setenv PAGER 'pspg --no-mouse -bX --no-commandbar --no-topbar' \set HISTSIZE 100000 Get a stack trace for an error

In psql:

# SELECT pg_backend_pid(); ┌────────────────┐ │ pg_backend_pid │ ├────────────────┤ │ 156796 │ └────────────────┘ (1 row)

In another shell:

$ gdb -p 156796 (gdb) b errfinish Breakpoint 1 at 0x83475b: file elog.c, line 251. (gdb) c Continuing.


カテゴリー: postgresql

Interview with David Laietta

planet PHP - 2019-07-17(水) 17:30:00
カテゴリー: php

Kaarel Moppel: The mysterious “backend_flush_after” configuration setting

planet postgresql - 2019-07-17(水) 16:30:06

The above-mentioned PostgreSQL server configuration parameter was introduced already some time ago, in version 9.6, but has been flying under the radar so to say and had not caught my attention previously. Until I recently was pasted (not being on Twitter) a tweet from one of the Postgres core developers Andres Freund, that basically said – if your workload is bigger than Shared Buffers, you should enable the “ backend_flush_after” parameter for improved throughput and also jitter. Hmm, who wouldn’t like an extra boost on performance for free? FOMO kicked in… but before adding this parameter to my “standard setup toolbox” I hurried to test things out – own eye is king! So here a small test and my conclusion on effects of enabling (not enabled by default!) “backend_flush_after”.

What does this parameter actually do?

Trying to interpret the documentation (link here) in my own wording – “backend_flush_after” is basically designed to enable sending “hints” to the OS, that if user has written more than X bytes (configurable from 0 to max. 2MB) it would be very nice if the kernel could already do some flushing of recently changed data files in the background, so that when the “checkpointer” comes or the kernel’s “dirty” limit is reached, there would be less bulk “fsyncing” to do – meaning less IO contention (spikes) for our user sessions, thus smoother response times.

Be warned though – unlike most Postgres settings this one actually is not guaranteed to function, and currently only can work on Linux systems, having sync_file_range() functionality available – which again depends on kernel version and used file system. So in short this explains why the parameter has not gotten too much attention. Similar story actually also with the “sister” parameters – “bgwriter_flush_after”, “checkpoint_flush_after”, “wal_writer_flush_after”…with the difference that they are already enabled by default!

NB! Also note that this parameter, being controlled and initiated by Postgres, might be the only way to influence

カテゴリー: postgresql

Luca Ferrari: Suggesting Single-Column Primary Keys (almost) Automatically

planet postgresql - 2019-07-17(水) 09:00:00
Unable to clean HTML
カテゴリー: postgresql

Jobin Augustine: BRIN Index for PostgreSQL: Don’t Forget the Benefits

planet postgresql - 2019-07-17(水) 01:30:22

BRIN Index was introduced in PostgreSQL 9.5, but many users postponed the usage of it in their design and development just because it was “new”. But now we understand that it has stood the test-of-time! It is time to reconsider BRIN if you have not done it yet. I often see users who forget there is a provision to select the type of Index by specifying USING clause when creating an index.

BRIN Index is a revolutionary idea in indexing first proposed by PostgreSQL contributor Alvaro Herrera. BRIN stands for “Block Range INdex”. A block range is a group of pages adjacent to each other, where summary information about all those pages is stored in Index.  For example, Datatypes like integers – dates where sort order is linear – can be stored as min and max value in the range. Other database systems including Oracle announced similar features later. BRIN index often gives similar gains as Partitioning a table.

BRIN usage will return all the tuples in all the pages in the particular range. So the index is lossy and extra work is needed to further filter out records. So while one might say that is not good, there are a few advantages.

  1. Since only summary information about a range of pages is stored, BRIN indexes are usually very small compared to B-Tree indexes. So if we want to squeeze the working set of data to shared_buffer, this is a great help.
  2. Lossiness of BRIN can be controlled by specifying pages per range (discussed in a later section)
  3. Offloads the summarization work to vacuum or autovacuum. So the overhead of index maintenance on transaction / DML operation is minimal.
Putting BRIN into a test

Let’s take a simple example to examine the benefits of BRIN index by creating a simple table.


Now let’s Insert some data into this table.

postgres=# INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP + ( g || 'minute' ) :: interval, random[...]
カテゴリー: postgresql

502 Bad Gateway

planet PHP - 2019-07-17(水) 00:00:00

HTTP is a protocol that is implemented by servers and clients, but there is a third category: proxies.

When a system is acting as a proxy for a different server, and that server is misbehaving or doing something unexpected, the proxy can return 502 Bad Gateway to tell a client that the proxy is working fine, but there was something wrong with the ‘origin’ server instead.

A specific example of this could be a CDN in front on a web server, and the web-server is misconfigured and responding with incorrect HTTP responses.

This status should not be used if the origin server just returned a valid HTTP error itself, because these should generally just be forwarded by the proxy (mostly) unaltered.

Example HTTP/1.1 502 Bad Gateway Content-Type text/plain We made a HTTP request to an origin server, but we got a Gopher response back. References
カテゴリー: php

Kirk Roybal: Partitioning enhancements in PostgreSQL 12

planet postgresql - 2019-07-16(火) 01:18:50
Declarative partitioning got some attention in the PostgreSQL 12 release, with some very handy features. There has been some pretty dramatic improvement in partition selection (especially when selecting from a few partitions out of a large set), referential integrity improvements, and introspection. In this article, we’re going to tackle the referential integrity improvement first. This […]
カテゴリー: postgresql

On "10x developers"

planet PHP - 2019-07-15(月) 01:58:00
On "10x developers"

Yesterday, a VC posted a Twitter thread about "10x engineers and how to spot them.'' It is a frankly terrible thread, and predictably, it became the latest Internet Pile On(tm), which we all know is Twitter's favorite pastime. I added my own thoughts in another thread, which I want to replicate here for posterity and then expand on a bit more now that I have a real keyboard and not just my phone.

First, here's my original thread, lightly edited for clarity, paragraphs, and links:

Continue reading this post on SteemIt.

Larry 14 July 2019 - 11:58am
カテゴリー: php

Jonathan Katz: How to Upgrade Your PostgreSQL Passwords to SCRAM

planet postgresql - 2019-07-13(土) 03:27:00

In a lot of PostgreSQL environments, it’s common practice to protect user accounts with a password. Starting with PostgreSQL 10, the way PostgreSQL manages password-based authentication got a major upgrade with the introduction of SCRAM authentication, a well-defined standard that is a significant improvement over the current system in PostgreSQL. What’s better is that almost all PostgreSQL drivers now support this new method of password authentication, which should help drive further adoption of this method.

While it may be easy to take advantage of SCRAM authentication in new PostgreSQL deployments, there are a few steps involved in upgrading your existing systems to utilize this method. This article will briefly explain how SCRAM works in PostgreSQL (to try to encourage you to upgrade!) and then walk you through the steps of how to upgrade your existing PostgreSQL clusters to use SCRAM authentication.

A Very Brief Overview of SCRAM
カテゴリー: postgresql

Kaarel Moppel: Implementing Autonomous Transactions in Postgres

planet postgresql - 2019-07-12(金) 16:30:13

Having recently witnessed quite an ingenious hack to implement some good old “println” style development debugging from stored procedures into a file, it prompted me to post knowledge to the Interwebs on two other ways how such a goal can be implemented more transparently. Also with help of some other good old legacy technology in one case. By the way, the main reason for them going for the hack was that being relatively new to databases they didn’t know how to name this thing that they were implementing, so Google wasn’t able to help – once again proof that naming things is one of the hardest problems of computing

What’s an “autonomous transaction”?

But to start with let’s explain what is an “autonomous transaction” as most people working with databases probably haven’t needed / heard of them and actually they’re not a standard or anything also, thus not too supported by various database engines. The phrase itself comes from the Oracle world I believe and it basically denotes “fire and forget” (sub) transactions that are not connected to the main transactions. And sadly also Postgres does not have direct built-in support for that – you’re always in a real transaction and need to invent a bit if the need arises.

And what would be the common use case? Mostly some logging / auditing / progress tracking into tables, in such a way that the information on the attempt would persist even when the main transaction is rolled back due to an error for example. Remember – in a standard transaction everything is thrown away in case no special measures (savepoints or exception handling sub-blocks in stored procedures) are taken.

The hacky way

So how did the initial implementation that wowed me a bit, looked like?

CREATE FUNCTION public.log(appid text, msg text) RETURNS void LANGUAGE plpgsql SECURITY DEFINER AS $function$ BEGIN execute $$ copy (select $$ || quote_literal(msg) || $$) to program 'tee -a /tmp/$$ ||appid|| $$.log' $$; END; $function$;

Not bad – it works and is relatively short and

カテゴリー: postgresql

Magnus Hagander: PGConf.EU 2019 - Registration and training open

planet postgresql - 2019-07-12(金) 00:19:19

PostgreSQL Conference Europe 2019 in Milan, Italy, on October 15-18 is now open for registrations.

The Tuesday training sessions have also been finalized, and are now available for registration. Six sessions in a mix of full and half day sessions are available. Training sessions are available at an extra cost of €150 per half day. Attendees of training sessions will also receive a €90 discount on the regular conference fee. Seats to these sessions are limited to smaller groups, so make sure you register early!

Our call for papers is still open! If you have already submitted a talk, or are planning to submit one, we suggest you wait to register until you have received a confirmation if the talk was accepted or not. If your talk is accepted, attendance is of course free!

See you in Milan!

カテゴリー: postgresql

Vasilis Ventirozos: Comparing pg 9.4 with pg 12, CTE edition

planet postgresql - 2019-07-11(木) 22:14:00
Postgres 12 is around the corner, and as always is packed with new features. One of them being the option to allow the user to control the behaviour of CTE materialization. You can find the commit from Tom Lane here, which explains everything about this new change but TLDR is that so far, CTE's where fully materialized, so restrictions from the query that uses it won't apply to the CTE. Which is the right way doing this when you are using CTEs to INSERT/UPDATE/DELETE or when they are recursive. This means that when a CTE is side-effect-free and non-recursive it's safe to push the restrictions from the outer query.
So, from postgres 12, when it's safe or when the CTE is called only once, postgres will inline the CTE to the outer query, removing the optimization fence. User will be able to override this behaviour by using  MATERIALIZED / NOT MATERIALIZED keywords.

Here's an example :

drop table if exists test_cte;
create table test_cte as
select generate_series(1,1000000) as id,
floor(random() * 10 + 1)::int as random;
EXPLAIN analyze
SELECT random,count(*) FROM test_cte group by random
SELECT * FROM a_cte WHERE random = 5;
-- NEW
EXPLAIN analyze
SELECT random,count(*) FROM test_cte group by random
SELECT * FROM a_cte WHERE random = 5;

It's not really important to show all the differences in explain plans but rather to see how many rows it had to process in order to create the CTE by applying the condition directly:

-- Materialized (old):
-> Parallel Seq Scan on test_cte (cost=0.00..8591.88 rows=416688 width=4) (actual time=0.031..19.077 rows=333333 loops=3)

-- Not Materialized (New):
-> Parallel Seq Scan on test_cte (cost=0.00..9633.59 rows=2083 width=4) (actual time=0.021..24.469 rows=33222 loops=3)
Filter: (random = 5)
Rows Removed by Filter: 300112

A production system running a 2TB on 9.4 with a dev that has a lot of free diskspace that we plan upgrading soon  ma[...]
カテゴリー: postgresql

PHP 7.4.0 alpha 3 Released

php.net - 2019-07-11(木) 19:15:12
カテゴリー: php

PHP 7.4.0 alpha 3 Released

planet PHP - 2019-07-11(木) 09:00:00
PHP team is glad to announce the release of the third PHP 7.4.0 version, PHP 7.4.0 Alpha 3. This continues the PHP 7.4 release cycle, the rough outline of which is specified in the PHP Wiki. For source downloads of PHP 7.4.0 Alpha 3 please visit the download page. Please carefully test this version and report any issues found in the bug reporting system. Please DO NOT use this version in production, it is an early test version. For more information on the new features and other changes, you can read the NEWS file, or the UPGRADING file for a complete list of upgrading notes. These files can also be found in the release archive. The next release would be Beta 1, planned for July 25th. The signatures for the release can be found in the manifest or on the QA site. Thank you for helping us make PHP better.
カテゴリー: php

Jobin Augustine: PostgreSQL WAL Retention and Clean Up: pg_archivecleanup

planet postgresql - 2019-07-10(水) 23:44:08

WAL retention is a very important topic for PostgreSQL database management. But very often we come across DBAs getting into surprise situations such as:

1. Several TBs of WALs piled up in archive destination

2. WALs filling up pg_wal/pg_xlog directory due to failing archive

3. Necessary WALs are no longer preserved

External backup projects for PostgreSQL are good in addressing retention policies. But there is a simple program named pg_archivecleanup which comes along with PostgreSQL binaries which might be useful in both maintaining the WAL retention as well as handling an emergency situation. pg_archivecleanup is probably the least talked about utility among standard PostgreSQL binaries. It is extremely simple and useful, and it can work in 2 scenarios:

  • Can be used as a standalone program to clean up old WAL files from any file system location.
  • It can also be used in Standby side for cleaning up WAL files which are no longer required for Standby.

This program can operate in dryrun mode (-n option) or delete mode (-d option).

Dryrun (-n)

This feature is useful to list all WALs older than a specific WAL. In the following demonstration, I am listing all WALs which are still in the default wal location (pg_wal) inside my data directory and older than a specific WAL:

$ ./pg_archivecleanup -n ~/bigsql/data/pg11/pg_wal 00000001000000000000001E /home/jobin/bigsql/data/pg11/pg_wal/00000001000000000000001B /home/jobin/bigsql/data/pg11/pg_wal/000000010000000000000017 ... Delete Mode (-d)

In this mode, pg_archivecleanup does the cleanup by removing all the candidate WALs:

$ pg_archivecleanup -d /home/postgres/archive 00000002000000000000006B pg_archivecleanup: keeping WAL file "/home/postgres/archive/00000002000000000000006B" and later pg_archivecleanup: removing file "/home/postgres/archive/000000020000000000000069" pg_archivecleanup: removing file "/home/postgres/archive/00000002000000000000006A" ... Understanding WAL retention criteria

In order to do a clean up

カテゴリー: postgresql

Dave Page: Reverse Proxying to pgAdmin

planet postgresql - 2019-07-10(水) 21:33:00
Reverse proxying requests to a pgAdmin server is becoming more and more popular if posts to the mailing lists are to be taken as an indicative measure; more often than not when using pgAdmin in a container (of which there have now been over 10 million pulls)! Typically users will deploy a reverse proxy for a couple of reasons; to host multiple applications in different subdirectories under the same domain, or to add SSL/TLS support independently of the application.

Because of the number of questions asked, I spent a little time over the last couple of days doing some testing and updating the documentation with some examples. Here's a blog-ified version of that work.

Nginx Nginx is winning the battle of the web servers these days, beating out Lighttpd (which is still a fine piece of software) and the ageing and arguably bloated Apache HTTPD. All of these servers support reverse proxying, and whilst I've looked at Nginx, the examples shown below can easily be translated to the other servers if you prefer to run them instead.
In the following examples, we have pgAdmin running in a Docker container (in which it's hosted under Gunicorn). For simplicity, the examples have Nginx running on the host machine, but it can also be easily run in another container, sharing a Docker network with pgAdmin. In such a configuration there is no need to map the pgAdmin container port to the host.
The container is launched as shown below. See the documentation for information on other useful environment variables you can set and paths you can map.
The commands below will pull the latest version of the container from the repository, and run it with port 5050 on the host mapped to port 80 on the container. It will set the default username and password to user@domain.com and SuperSecret respectively.
docker pull dpage/pgadmin4
docker run -p 5050:80 \
-e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
カテゴリー: postgresql