postgresql

Kristi Anderson: Managing High Availability in PostgreSQL – Part II

planet postgresql - 2019-03-13(水) 00:10:46

Are you deploying PostgreSQL in the cloud and want to understand your options for achieving high availability? In our previous blog post, Managing High Availability in PostgreSQL – Part I, we discussed the capabilities and functioning of PostgreSQL Automatic Failover (PAF) by ClusterLabs. In Part II, we’re introducing you to an alternative open source tool, Replication Manager from 2ndQuadrant, to be closely followed by Part III where we dive into our third alternative, Patroni by Zalando.

Replication Manager (repmgr)

repmgr is an open-source tool suite developed by 2ndQuadrant for managing replication and failover of your PostgreSQL clusters. It provides the tools to setup, configure, manage, and monitor replication of PostgreSQL, and also enables you to perform manual switchover and failover tasks using repmgr utility. This free tool supports and enhances PostgreSQL’s built-in streaming replication.

Replication Manager provides two main tools to manage replication and failover of PostgreSQL.

repmgr
  • A command-line interface utility which enables you to perform various administrative tasks.
  • repmgr enables you to setup standby servers, promote standbys, do a switchover, and monitor the status of your PostgreSQL cluster.
  • It also provides dry run option for almost all of the administrative commands.
repmgrd

This is the daemon which:

  • Actively monitors the PostgreSQL clusters and performs necessary actions based on the state of the cluster.
  • Performs automatic failover in case the primary node goes down by promoting the most eligible standby as the new primary.
  • Provides an option to monitor and store the data related to replication performance.
  • Provides notification by invoking the user scripts for registered events.
How it Works

repmrg not only manages the replication of PostgreSQL clusters, but also has capabilities for setting up the standby servers for replication. Following the initial installation, we need to make changes to the repmgr configuration file (repmgr.conf) with the required details on each server.

[...]
カテゴリー: postgresql

Regina Obe: PostGIS 2.5.2, 2.4.7, 2.3.9 Released

planet postgresql - 2019-03-11(月) 09:00:00

The PostGIS development team is pleased to provide bug fix 2.5.2, 2.4.7, and 2.3.9 for the 2.5, 2.4, and 2.3 stable branches.

These are the first versions to be able to compile against Proj 6.0.0, You must upgrade to these if you are using Proj 6.

2.5.2 This release supports PostgreSQL 9.3-11 (will compile against PostgreSQL 12, but not pass tests. Use only for pg_upgrade. You are encouraged to use the PostGIS 3.0 unreleased branch with PostgreSQL 12 , which has features specifically designed to take advantage of features new in PostgreSQL 12).

2.4.7 This release supports PostgreSQL 9.3-10.

2.3.9

This release supports PostgreSQL 9.2-10.

View all closed tickets for 2.5.2, 2.4.7, 2.3.9.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE; ALTER EXTENSION postgis_topology UPDATE; ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

カテゴリー: postgresql

Bruce Momjian: The High Value of Data

planet postgresql - 2019-03-09(土) 01:30:01

There was a time when every piece of software had to be purchased: operating systems, compilers, middleware, text editors. Those days are mostly gone, though there are a few holdouts (e.g., MS Windows, vertical applications). What happened is that open source software has come to dominate most uses, and software selection is rarely based on cost requirements.

One of the final holdouts for purchased software is databases. You might think that is because database software is complex, but so is the other software mentioned. The big difference is that while non-database software processes or stores user data in a simple or standard way, databases lock user data inside the database. This data locking is a requirement for fast, reliable, and concurrent data access, but it does place the database on a different monetary plane.

In any organization, it is really their data that is valuable, and because the database is so tightly coupled to that valuable data, database software becomes something that is worth significant investment. This explains why databases have resisted the open source commoditization that has happened to so much other purchased software. (Custom database applications tied to complex business logic has also slowed the migration.)

Continue Reading »

カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - Connection slots and WAL senders

planet postgresql - 2019-03-08(金) 16:28:51

The maximum number of connections a PostgreSQL can accept is configured using max_connections. When attempting to connect to a server already at full capacity, logically the server complains:

$ psql psql: FATAL: sorry, too many clients already

It is possible to define connection policies, for example at database level with CREATE DATABASE or ALTER DATABASE, and even have superuser-only connection slots using superuser_reserved_connections, so as a superuser has a reserved space to be able to perform some activities even with a server full.

When creating a connection for replication purposes, the connection is spawned under a special status with the context of a WAL sender which is in charge of the communication, and speaks the replication protocol, so as it is possible to do replication, to take base backups, etc. A lot of those tasks are important for availability. One problem however is that this connection uses a shared memory slot which is part of max_connections. Hence, it is possible to get into a case where an application bloats the connections, and it becomes impossible to connect with a replication connection. This can be rather bad for availability, because this could the creation of a new standby after a failover for example.

One way to counter that is to connect to the server for base backups and standbys with a superuser role. Still this is not completely right either as by design there can be replication roles, which allow a role to connect to a server in replication mode, without being a superuser. In this context, this is where the following commit of Postgres 12 becomes handy:

commit: ea92368cd1da1e290f9ab8efb7f60cb7598fc310 author: Michael Paquier <michael@paquier.xyz> date: Tue, 12 Feb 2019 10:07:56 +0900 Move max_wal_senders out of max_connections for connection slot handling Since its introduction, max_wal_senders is counted as part of max_connections when it comes to define how many connection slots can be used for replication connections with a WAL sender context. This can lead t[...]
カテゴリー: postgresql

Viorel Tabara: Benchmarking Managed PostgreSQL Cloud Solutions - Part One: Amazon Aurora

planet postgresql - 2019-03-08(金) 07:37:18

This blog starts a multi-series documenting my journey on benchmarking PostgreSQL in the cloud.

The first part includes an overview of benchmarking tools, and kickstarts the fun with Amazon Aurora PostgreSQL.

Selecting the PostgreSQL Cloud Services Providers

A while ago I came across the AWS benchmark procedure for Aurora, and thought it would be really cool if I could take that test and run it on other cloud hosting providers. To Amazon’s credit, out of the three most known utility computing providers — AWS, Google, and Microsoft — AWS is the only major contributor to PostgreSQL development, and the first to offer managed PostgreSQL service (dating back in November 2013).

While managed PostgreSQL services are also available from a plethora of PostgreSQL Hosting Providers, I wanted to focus on the said three cloud computing providers since their environments are where many organizations looking for the advantages of cloud computing choose to run their applications, provided that they have the required know-how on managing PostgreSQL. I am a firm believer that in today’s IT landscape, organizations working with critical workloads in the cloud would greatly benefit from the services of a specialized PostgreSQL service provider, that can help them navigate the complex world of GUCS and myriads of SlideShare presentations.

Selecting the Right Benchmark Tool

Benchmarking PostgreSQL comes up quite often on performance mailing list, and as stressed countless of times the tests are not intended to validate a configuration for a real life application. However, selecting the right benchmark tool and parameters are important in order to gather meaningful results. I would expect every cloud provider to provide procedures for benchmarking their services, especially when the first cloud experience may not start on the right foot. The good news is that two of the three players in this test, have included benchmarks in their documentation. The AWS Benchmark Procedure for Aurora guide is easy to find, available right on

[...]
カテゴリー: postgresql

Bruce Momjian: Tool Vendor/Support Options

planet postgresql - 2019-03-08(金) 00:15:02

Having explained that lock-in is not a binary option, what are the Postgres tool support options available, at a high level?

  • Develop in-house database tools and support them yourself
  • Use open source tools and support them yourself
  • Use open source tools with vendor support (hopefully the vendor supports your chosen tools)
  • Use closed-source tools with vendor support

Of course, you can mix and match these options, i.e., use a support vendor for the open source tools they support, use other open source tools they don't support, and use some tools you develop in-house, e.g.:

  • open source Postgres database (vendor support)
  • pgBackRest for backup (vendor support)
  • patroni for failover (community support channels)
  • In-house developed tools (self support)

I went over these options in more detail in this presentation. This diversity of options is rarely available for closed-source, single-vendor database solutions.

カテゴリー: postgresql

Laurenz Albe: “LOCK TABLE” can harm your database’s health

planet postgresql - 2019-03-07(木) 18:00:17
© Laurenz Albe 2019

 

Many people know that explicit table locks with LOCK TABLE are bad style and usually a consequence of bad design. The main reason is that they hamper concurrency and hence performance.

Through a recent support case I learned that there are even worse effects of explicit table locks.

Table locks

Before an SQL statement uses a table, it takes the appropriate table lock. This prevents concurrent use that would conflict with its operation. For example, reading from a table will take a ACCESS SHARE lock which will conflict with the ACCESS EXCLUSIVE lock that TRUNCATE needs.

You can find a description of the individual lock levels in the documentation. There is also the matrix that shows which lock levels conflict with each other.

You don’t have to perform these table locks explicitly, PostgreSQL does it for you automatically.

Explicit table locks with the LOCK TABLE statement

You can also explicitly request locks on a table with the LOCK statement:

LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

There are some cases where it is useful and indicated to use such an explicit table lock. One example is a bulk update of a table, where you want to avoid deadlocks with other transactions that modify the table at the same time. In that case you would use a SHARE lock on the table that prevents concurrent data modifications:

LOCK atable IN SHARE MODE; Typical mistakes with LOCK TABLE

Unfortunately most people don’t think hard enough and just use “LOCK atable” without thinking that the default lock mode is ACCESS EXCLUSIVE, which blocks all concurrent access to the table, even read access. This harms performance more than necessary.

But most of the time, tables are locked because developers don’t know that there are less restrictive ways to achieve what they want:

  • You don’t want concurrent transactions to modify a row between the time you read it and the time you update it? Use SELECT ... FOR UPDATE!
    If concurrent modifications are unlikely and you are not sure that you ar
[...]
カテゴリー: postgresql

Mark Wong: PDXPUG: March Meetup: Let’s talk databases in python!

planet postgresql - 2019-03-06(水) 23:09:45

2019 March 21 Meeting 6pm-8pm

Location:

PSU Business Accelerator
2828 SW Corbett Ave · Portland, OR
Parking is open after 5pm.

Speaker: Hannah Stepanek

Let’s talk databases in python! What’s an ORM? Is there a way to write database queries so that they are compatible with multiple types of databases? How do you make database changes (such as adding a new table or a new column) safely? What is a connection pool and why is it useful? What are some things that can go wrong when operating at scale? In this talk we’ll take a deep dive into how the python libraries sqlalchemy and alembic make managing production databases simple, efficient, and painless so you can get back to feature development. Hannah has been working in industry for over 6 years as a python software engineer. She currently works at Hypothesis, a web application for annotating web pages and pdfs. In her spare time she enjoys riding her horse Sophie and playing board games.
カテゴリー: postgresql

Bruce Momjian: SQL Replay for Replication?

planet postgresql - 2019-03-06(水) 22:45:01

Postgres has had streaming (binary) replication for a long time, and logical (row shipping) replication since Postgres 10. Implementing these was a lot of work, and they work well.

However, the simplest way to do replication is often considered to be replaying SQL queries on standbys. The primary was modified by SQL queries, so isn't the simplest way to replicate replaying SQL? A novice would think so, and many database server developers initially try replication by replaying SQL. It seems simple because SQL queries are more concise than per-row changes. Imagine a DELETE that affects one million rows being shipped to a standby as a single SQL query. The conciseness and simplicity of SQL replication looks promising.

However, if you try implementing replication via SQL, you will realize that SQL runs in a multi-user environment. SQL commands do not contain enough information to replay queries the exact same way on standbys as the primary. Concurrent DML, volatile functions, sequence assignment, locks, and cursor contents can all cause inconsistencies. Developers have tried patching over these issues, but eventually the fundamental limitations of this approach become clear. I doubt Postgres will ever implement SQL-level replication for this reason.

カテゴリー: postgresql

Ibrar Ahmed: Settling the Myth of Transparent HugePages for Databases

planet postgresql - 2019-03-06(水) 22:07:25

The concept of Linux HugePages has existed for quite a while: for more than 10 years, introduced to Debian in 2007 with kernel version 2.6.23. Whilst a smaller page size is useful for general use, some memory intensive applications may gain performance by using bigger memory pages. By having bigger memory chunks available to them, they can reduce lookup time as well as improve the performance of read/write operations. To be able to make use of HugePages, applications need to carry the specific code directive, and changing applications across the board is not necessarily a simple task. So enter Transparent HugePages (THP).

By reputation, THPs are said to have a negative impact on performance. For this post, I set out to either prove or debunk the case for the use of THPs for database applications.

The Linux context

On Linux – and for that matter all operating systems that I know of – memory is divided into small chunks called pages. A typical memory page size is set to 4k. You can obtain the value of page size on Linux using getconf.

# getconf PAGE_SIZE 4096

Generally, the latest processors support multiple page sizes. However, Linux defaults to a minimal 4k page size. For a system with 64GB physical memory, this memory will be divided into more than 16 million pages. Linking between these pages and physical memory (which is called page table walking) is undertaken by the CPU’s memory management unit (MMU). To optimize page lookup, CPU maintains a cache of recently used pages called the Table Lookaside Buffer (TLB). The higher the number of pages, the lower the percentage of pages that are maintained in TLB. This translates to a higher cache miss ratio. With every cache miss, a more expensive search must be done via page table walking. In effect, that leads to a degradation in performance.

So what if we could increase the page size? We could then reduce the number of pages accessed, and reduce the cost of page walking. Cache hit ratio might then improve because more relevant data now fits in one page rather than

[...]
カテゴリー: postgresql

Alexander Sosna: Antivirus and database systems (PostgreSQL in focus)

planet postgresql - 2019-03-06(水) 19:00:00
Antivirus and database systems (PostgreSQL in focus) Many users are required to run virus scanners on all of their IT systems due to vendor requirements or working instructions. This general requirement can lead to problems for IT systems that are neither desktops nor simple file servers. Usual interpretations of baseline security requirements demand... 06-03 Alexander Sosna
カテゴリー: postgresql

Bruce Momjian: Lock-In Is Not a Binary Decision

planet postgresql - 2019-03-06(水) 01:00:01

One of the major attractions of Postgres is the ability to stop using database software controlled by a single vendor. Single-vendor control means a single entity controls the software, tooling, training, and support. There are sometimes options from other vendors, but they are usually hampered because the source code is closed.

Postgres is a strong move away from that, but is it always a complete disconnection from lock-in? Well, it can be — you could:

  • Download the Postgres source code and compile it yourself
  • Evaluate, integrate, and test the tooling you need to use Postgres in your organization
  • Create a training program for your employees
  • Develop a Postgres internals team to support Postgres and your tooling

This is not possible for proprietary software, but because Postgres is open source, it is certainly possible.

Continue Reading »

カテゴリー: postgresql

Richard Yen: I Fought the WAL, and the WAL Won: Why hot_standby_feedback can be Misleading

planet postgresql - 2019-03-05(火) 09:15:00
Introduction

When I first got involved in managing a Postgres database, I was quickly introduced to the need for replication. My first project was to get our databases up on Slony, which was the hot new replication technology, replacing our clunky DRBD setup and allowing near-realtime read-only copies of the database. Of course, with time and scale, Slony had a hard time keeping up with the write traffic, especially since it ultimately suffered from write amplification (each write ultimately becomes two or more writes to the database, because of all the under-the-hood work involved). When Postgres Streaming Replication came out in v. 9.0, everyone felt like they struck gold. Streaming Replication was fast, and it took advantage of an already-existing feature in Postgres: the WAL Stream.

Many years have passed since v. 9.0 (we’re coming up on v. 12 very soon). More features have been added, like Hot Standby, Logical Replication, and some two-way Master-Master replication extensions have been created. This has been quite a path of growth, especially since I remember someone saying that Postgres’ roadmap would not include replication at a BOF at PGCon, circa 2010.

With all the improvements to Streaming Replication over the years, I think one of the most misunderstood features is hot_standby_feedback, and I hope to clarify that here.

With Streaming Replication, users are able to stand up any number of standby servers with clones of the primary, and they are free to throw all sorts of load at them. Some will send read-only traffic for their OLTP apps, huge cronjobs, and long-running reporting queries, all without affecting write traffic on the primary. However, some will occasionally see that their queries get aborted for some reason, and in their logs they might see something like:

ERROR: canceling statement due to conflict with recovery

That’s an unfortunate reality that nobody likes. Nobody wants their queries canceled on them, just like nobody likes to order a pastrami sandwich, only to be told 10 minutes later

[...]
カテゴリー: postgresql

Stephen Frost: How to setup Windows Active Directory with PostgreSQL GSSAPI Kerberos Authentication

planet postgresql - 2019-03-05(火) 01:38:24

PostgreSQL provides a bevy of authentication methods to allow you to pick the one that makes the most sense for your environment. One desired implementation that I have found customers wanting is to use  Windows Active Directory with PostgreSQL's GSSAPI authentication interface using Kerberos. I've put together this guide to help you take advantage of this setup in your own environment. 

カテゴリー: postgresql

Bruce Momjian: Corporate Backing

planet postgresql - 2019-03-05(火) 01:15:01

Postgres has long lived in the shadow of proprietary and other open source databases. We kind of got used to that, though we had early support from Fujitsu and NTT. In recent years, Postgres has become more noticed, and the big companies promoting Postgres have become somewhat of a flood:

Even with IBM having DB2 and Microsoft having SQL Server, they still support Postgres.

It is odd having multi-billion-dollar companies asking how they can help the Postgres community, but I guess we will have to get used to it. These companies support the community to varying degrees, but we certainly appreciate all the help we receive. Just having these companies list us as supported is helpful.

カテゴリー: postgresql

Alexey Lesovsky: pgCenter’s wait event profiler

planet postgresql - 2019-03-04(月) 21:17:00
As you might know, in the last pgCenter release the new tool has been added - wait event profiler. In this post, I'd like to explore this tool and propose some use-cases for it.

First of all, what are “wait events”? PostgreSQL official documentation doesn't give an explanation for wait events (it just provides a list of all wait events). In short, wait events are points in time where backends have to wait until a particular event occurs. This may be waiting for obtaining locks, IO, inter-process communication, interacting with client or something else. Stats about wait events are provided by pg_stat_activity view in wait_event_type and wait_event columns.

Using EXPLAIN we always can understand what query does. But EXPLAIN is aimed to work with query planner and doesn't show time when query got stuck in waitings. For this, you can use pgCenter's wait event profiler.

How does it work? First, you need to know PID of profiled backend. It can be found using pg_stat_activity, or, if you're connected to Postgres directly, with pg_backend_pid(). Next, in the second terminal, run 'pgcenter profile' and pass backend PID as an argument. That’s it. pgCenter connects to Postgres and using wait events stats from pg_stat_activity will start collecting data. When query finishes, pgCenter shows you distribution of wait events, like this:

$ pgcenter profile -U postgres -P 19241 LOG: Profiling process 19241 with 10ms sampling ------ ------------ ----------------------------- % time      seconds wait_event         query: update t1 set a = a + 100; ------ ------------ ----------------------------- 72.15     30.205671 IO.DataFileRead 20.10      8.415921 Running 5.50       2.303926 LWLock.WALWriteLock 1.28       0.535915 IO.DataFileWrite 0.54       0.225117 IO.WALWrite 0.36       0.152407 IO.WALInitSync 0.03       0.011429 IO.WALInitWrite 0.03       0.011355 LWLock.WALBufMappingLock ------ ------------ ----------------------------- 99.99     41.861741
In this example, a massive UPDATE is profiled. The query took around 40 second[...]
カテゴリー: postgresql

Dave Conlin: Index-only scans in Postgres

planet postgresql - 2019-03-04(月) 19:04:25

Index-only scans can be a really effective way to speed up table reads that hit an index. Of course, they’re not a silver bullet to all your performance problems, but they are a very welcome and useful part of the toolbox.

In order to understand index-only scans, why (and when) they’re valuable, let’s recap how a “normal” index scan works.

Index scans

An index is just references to the rows in a table, stored in a data structure (usually a binary tree) based on their values in the indexed columns.

An index scan reads through the index and uses it to quickly look up the rows that match your filter (something like WHERE x > 10), and return them in the order they’re stored in the index.

Postgres then goes to look up the data in these rows from the table, in the heap, where it would have found them if it had done a sequential scan.

It checks that they are visible to the current transaction — for example they haven’t been deleted or replaced by a newer version — and passes them on to the next operation.

Photo by João Silas

It’s a bit like using an index in a book. Instead of starting at page one and turning over the pages until you find the ones that deal with, say, soil erosion, you skip to “s” in the index, look up “soil erosion” and turn to the listed pages to read about where all the dirt is going.

Enter index-only scans

Index-only scans start off like index scans, but they get all their column information from the index, obviating the need to go back to the table to fetch the row data — the second step in the index scan process.

Returning to our book example, if we want to produce a list of topics in the book, ordered by the number of pages they appear on, then all that information is stored in the book’s index, so we can do so purely from reading the index without ever actually turning to the pages in question.

As you can imagine, under the right circumstances this can be an incredibly fast way for Postgres to access the table data. In pgMustard, we suggest considering an index-only scan to improve slow index

[...]
カテゴリー: postgresql

LPI-Japan、無償公開中の「オープンソースデータベース標準教科書 - PostgreSQL-」のバージョン更新版(Ver2.0.0)をリリース

www.postgresql.jp news - 2019-03-04(月) 16:08:17
LPI-Japan、無償公開中の「オープンソースデータベース標準教科書 - PostgreSQL-」のバージョン更新版(Ver2.0.0)をリリース anzai 2019/03/04 (月) - 16:08
カテゴリー: postgresql

Luca Ferrari: Running pgbackrest on FreeBSD

planet postgresql - 2019-03-04(月) 09:00:00

I tend to use FreeBSD as my PostgreSQL base machine, and that’s not always as simple as it sounds to get software running on it. In this post I take some advices on running pgbackrest on FreeBSD 12.

Running pgbackrest on FreeBSD

pgbackrest is an amazing tool for backup and recovery of a PostgreSQL database. However, and this is not a critique at all, it has some Linux-isms that make it difficult to run on FreeBSD. I tried to install and run it on FreeBSD 12, stopping immediatly at the compilation part. So I opened an issue to get some help, and then tried to experiment a little more to see if at least I could compile.

The first trial was to cross-compile: I created the executable (pgbackrest has a single executable) on a Linux machine, then moved it to the FreeBSD machine along with all the ldd libraries (placed into /compat/linux/lib64). But libpthread.so.0 prevented me to start the command:

% ./pgbackrest ./pgbackrest: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory

So I switched back to native compilation and, as described in the issue I made a little changes to the client.c and the Makefile. Since it compiled (using of course gmake), I also made a little more changes to Makefile to compile and install it the FreeBSD way (i.e., under /usr/local/bin). The full diff is the following (some changes are not shown in the issue):

% git diff diff --git a/src/Makefile b/src/Makefile index 73672bff..0472c7f1 100644 --- a/src/Makefile +++ b/src/Makefile @@ -8,7 +8,7 @@ CC=gcc # Compile using C99 and Posix 2001...
カテゴリー: postgresql

Pavel Stehule: compiled dll of plpgsql 1.6 for PostgreSQL 10, 11

planet postgresql - 2019-03-03(日) 01:39:00
Adam Bartoszewicz prepared dll. Please, read a message.

Thank you, Adam
カテゴリー: postgresql

ページ