postgresql

Andreas Scherbaum: How long will a 64 bit Transaction-ID last in PostgreSQL?

planet postgresql - 2019-02-03(日) 22:34:00

Andreas 'ads' Scherbaum

At FOSDEM someone asked how long 64 bit Transaction-IDs will last.

To refresh: PostgreSQL is currently using 32 bits for the TXID, and is good for around 4 billion transactions:

fosdem=# SELECT 2^32; ?column? ------------ 4294967296 (1 row)

That will not last very long if you have a busy database, doing many writes over the day. MVCC keeps the new and old versions of a row in the table, and the TXID will increase with every transaction. At some point the 4 billion transactions are reached, the TXID will overrun, and start again at the beginning. The way transactions are working in PostgreSQL, suddenly all data in your database will become invisible. No one wants that!

To limit this problem, PostgreSQL has a number mechanism in place:

  • PostgreSQL splits transaction ids into half: 2 billion in the past are visible, 2 billion in the future are not visible - all visible rows must live in the 2 billion in the past, at all times.
  • Old, deleted row versions are enevtually removed by VACUUM (or Autovacuum), the XID is no longer used.
  • Old row versions, which are still live, are marked as "freezed" in a table, and assigned a special XID - the previously used XID is no longer needed. The problem here is that every single table in every database must be Vacuumed before the 2 billion threshold is reached.
  • PostgreSQL uses lazy XIDs, where a "real" transaction id is only assigned if the transaction changes something on disk - if a transaction is read only, and does not change anything, no transaction id is consumed.

 

 

Continue reading "How long will a 64 bit Transaction-ID last in PostgreSQL?"
カテゴリー: postgresql

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Allow COPY FROM to filter data using WHERE conditions

planet postgresql - 2019-02-03(日) 17:47:51
On 19th of January 2019, Tomas Vondra committed patch: Allow COPY FROM to filter data using WHERE conditions   Extends the COPY FROM command with a WHERE condition, which allows doing various types of filtering while importing the data (random sampling, condition on a data column, etc.). Until now such filtering required either preprocessing of … Continue reading "Waiting for PostgreSQL 12 – Allow COPY FROM to filter data using WHERE conditions"
カテゴリー: postgresql

James Coleman: PostgreSQL at Scale: Database Schema Changes Without Downtime

planet postgresql - 2019-02-02(土) 08:01:28

Braintree Payments uses PostgreSQL as its primary datastore. We rely heavily on the data safety and consistency guarantees a traditional relational database offers us, but these guarantees come with certain operational difficulties. To make things even more interesting, we allow zero scheduled functional downtime for our main payments processing services.

Several years ago we published a blog post detailing some of the things we had learned about how to safely run DDL (data definition language) operations without interrupting our production API traffic.

Since that time PostgreSQL has gone through quite a few major upgrade cycles — several of which have added improved support for concurrent DDL. We’ve also further refined our processes. Given how much has changed, we figured it was time for a blog post redux.

In this post we’ll address the following topics:

First, some basics

For all code and database changes, we require that:

  • Live code and schemas be forward-compatible with updated code and schemas: this allows us to roll out deploys gradually across a fleet of application servers and database clusters.
  • New code and schemas be backward-compatible with live code and schemas: this allows us to roll back any change to the previous version in the event of unexpected errors.

For all DDL operations we require that:

  • Any exclusive locks acquired on tables or indexes be held for at most ~2 seconds.
  • Rollback strategies do not involve reverting the database schema to its previous version.
Transactionality

PostgreSQL supports transactional DDL. In most cases, you can execute multiple DDL statements inside an explicit database transaction and take an “all or nothing” approach to a set of changes. However, running multiple DDL statements inside a transaction has one serious downside: if you alter multiple objects, you’ll need to acquire exclusive locks on all of those objects in a single

[...]
カテゴリー: postgresql

Michael Paquier: Two-phase commit and temporary objects

planet postgresql - 2019-02-01(金) 17:01:39

A couple of weeks ago a bug has popped up on the community mailing lists about the use of temporary objects in two-phase commit. After discussions, the result is the following commit:

commit: c5660e0aa52d5df27accd8e5e97295cf0e64f7d4 author: Michael Paquier <michael@paquier.xyz> date: Fri, 18 Jan 2019 09:21:44 +0900 Restrict the use of temporary namespace in two-phase transactions Attempting to use a temporary table within a two-phase transaction is forbidden for ages. However, there have been uncovered grounds for a couple of other object types and commands which work on temporary objects with two-phase commit. In short, trying to create, lock or drop an object on a temporary schema should not be authorized within a two-phase transaction, as it would cause its state to create dependencies with other sessions, causing all sorts of side effects with the existing session or other sessions spawned later on trying to use the same temporary schema name. Regression tests are added to cover all the grounds found, the original report mentioned function creation, but monitoring closer there are many other patterns with LOCK, DROP or CREATE EXTENSION which are involved. One of the symptoms resulting in combining both is that the session which used the temporary schema is not able to shut down completely, waiting for being able to drop the temporary schema, something that it cannot complete because of the two-phase transaction involved with temporary objects. In this case the client is able to disconnect but the session remains alive on the backend-side, potentially blocking connection backend slots from being used. Other problems reported could also involve server crashes. This is back-patched down to v10, which is where 9b013dc has introduced MyXactFlags, something that this patch relies on. Reported-by: Alexey Bashtanov Author: Michael Paquier Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/5d910e2e-0db8-ec06-dd5f-baec420513c3@imap.cc Backpatch-through: 10

In PostgreSQL, temporary objects are as

[...]
カテゴリー: postgresql

Keith Fiske: Managing Transaction ID Exhaustion (Wraparound) in PostgreSQL

planet postgresql - 2019-01-31(木) 21:14:00

One of the most critical topics to understand when administering a PostgresSQL database is the concept of transaction IDs (TXID) and that they can be exhausted if not monitored properly. However, this blog post isn't going to go into the details of what it TXID exhaustion actually is. The Routine Vacuuming section of the documentation is probably one of the most important to read and understand so I will refer you there. What this blog post is going to cover is an easy way to monitor for it and what can be done to prevent it ever being a problem.

カテゴリー: postgresql

Bruce Momjian: Limiting Superuser Activity

planet postgresql - 2019-01-31(木) 02:15:01

This interesting email thread explores the question of how much you can prevent or detect unauthorized database superuser activity. The main conclusions from the thread are:

  • It is impossible to restrict database administrator access without hindering their ability to perform their jobs
  • Monitoring superuser activity is the most reasonable way to detect and hopefully discourage unauthorized activity
  • Monitoring includes:
    • Assign a separate account to each administrator for auditing purposes; do not use generic/shared accounts
    • Use an auditing tool to record database activity, e.g., pgAudit
    • Use syslog to send database logs to a computer not under database administrators' control
    • Record all shell command activity in a similar way

There is also a helpful summary email.

カテゴリー: postgresql

Liaqat Andrabi: Webinar: Banking on Postgres – Financial Application Considerations [Follow up]

planet postgresql - 2019-01-31(木) 00:34:37

The demand for PostgreSQL within the financial industry has been rapidly increasing in the recent years; mainly due to reduction in licensing costs, better choice of open source tools, and the robust enterprise features that PostgreSQL provides.

2ndQuadrant hosted the “Banking on Postgres” webinar to discuss attributes of financial databases based on Postgres, configuration processes, hardware needs, availability, backups, and more.

The webinar was presented by Shaun Thomas, Principal Consultant at 2ndQuadrant. Those who weren’t able to attend the live event can now view the recording here.

For any questions or comments regarding Postgres-BDR, please send an email to info@2ndQuadrant.com.

カテゴリー: postgresql

Pavel Stehule: plpgsql_check - new report for code coverage ratio calculation

planet postgresql - 2019-01-30(水) 18:11:00
Few months ago I integrated a profiler into plpgsql_check.

The result of prifiling is line oriented:

postgres=# select lineno, avg_time, source from plpgsql_profiler_function_tb('fx(int)');
┌────────┬──────────┬───────────────────────────────────────────────────────────────────┐
│ lineno │ avg_time │ source │
╞════════╪══════════╪═══════════════════════════════════════════════════════════════════╡
│ 1 │ │ │
│ 2 │ │ declare result int = 0; │
│ 3 │ 0.075 │ begin │
│ 4 │ 0.202 │ for i in 1..$1 loop │
│ 5 │ 0.005 │ select result + i into result; select result + i into result; │
│ 6 │ │ end loop; │
│ 7 │ 0 │ return result; │
│ 8 │ │ end; │
└────────┴──────────┴───────────────────────────────────────────────────────────────────┘
(9 rows)

This format is well readable, but it is not practical for calculation of code coverage metrics. So this week I wrote new function, that produce
report based on commands:

CREATE OR REPLACE FUNCTION public.fx1(a integer)
RETURNS integer
LANGUAGE plpgsql
1 AS $function$
2 begin
3 if a > 10 then
4 raise notice 'ahoj';
5 return -1;
6 else
7 raise notice 'nazdar';
8 return 1;
9 end if;
10 end;
11 $function$

postgres=# select stmtid, parent_stmtid, parent_note, lineno, exec_stmts, stmtname
from plpgsql_profiler_function_statements_tb('fx1');
┌────────┬───────────────┬─────────────┬────────┬────────────┬─────────────────┐
│ stmtid │ parent_stmtid │ parent_note │ lineno │ exec[...]
カテゴリー: postgresql

Dave Cramer: A Guide to Building an Active-Active PostgreSQL Cluster

planet postgresql - 2019-01-29(火) 23:39:33

One of the toughest challenges facing database engineers today is ensuring their data is always accessible so they can meet the high-availability  requirements for their applications.

While this problem may seem trivial in the world of applications where one can have many instances behind geographically distributed load balancers, in the database world where there is only one version of the data globally, the problem faces many obstacles.

PostgreSQL replication has advanced considerably in recent major releases, including continuous improvements to streaming replication and the addition of logical replication in PostgreSQL 10. While these capabilities represent important enhancements to PostgreSQL, enabling users to address a wide variety of clustered architectures, they do not (easily) address the use cases where the application requires access to an updatable database in two or more geographic regions - often referred to as an "active-active" cluster.

More specifically, an active-active cluster is one where the application can write to any instance in the cluster and the data will be written to all of the instances in the cluster, enabling each instance in the cluster to be used to:


  • Provide near zero downtime as the new instance is already in a read/write state; there is no need to reconfigure it.
  • Provide near zero downtime upgrades from one version to another
  • Improve latency for users in geographically distributed clusters. By providing an instance physically closer to the user latency is reduced.

While there are a number of proprietary solutions that attempt to address active-active PostgreSQL requirements, this post and a series to follow provides users with potential reference architectures and configurations that enable active-active PostgreSQL configurations using entirely open source software.

This post is of course only one approach to deploying an active-active PostgreSQL cluster.  There are other ways to deploy an active-active setup with PostgreSQL.  I will cover some ways to do this in the f

[...]
カテゴリー: postgresql

Kaarel Moppel: Major feature update for the pgwatch2 Postgres monitoring tool

planet postgresql - 2019-01-29(火) 18:13:23

Again I could make good (hopefully) use of the “slow time” around the turn of the year…and managed to push out another set of features for our Open Source PostgreSQL monitoring tool called pgwatch2 – so a quick overview on changes in this post. Continuing the tradition I’m calling it “Feature Pack 4” as it’s mostly about new features. Git and Docker images carry version number 1.5.0. As our last pgwatch2-related blogpost covered only 1.4.0, I’ll include here also most important stuff from 1.4.5 minor feature release.

Highlight – Monitoring Postgres with Postgres

This is the biggest one this time – finally and quite approprietly for “Postgres-minded” people, there’s now a chance to store all the gathered metrics in Postgres! This of course doesn’t necessarily mean that Postgres is best for storing Time-Series Data although it performs very nicely thanks to JSONB…but in general it’s a good compromise – more disk space (~3-4x) at comparable query times to InfluxDB…but with full power of SQL! Meaning some saved time learning a new (and quite limited) query language. And after all, only a few people are running dozens and dozens of databases so performance is mostly not and issue. And on the plus side we can now ask questions that were previously plainly not possible (no joins, remember) or were only possible by storing some extra columns of data (de-normalizing).

The new functionality is designed for the latest Postgres version of 11, but as people run all kinds of different versions and might not want to set up a new cluster, there is also a legacy mode, that will cost more IO though. In total there are 4 different “schema modes” so that people could optimize their IO based on needs:

  • a separate normal table for each distinct metric (legacy mode)
  • a separate partitioned table for each distinct metric + weekly partitions
  • a separate partitioned table for each distinct metric + separate sub-table for each distinct monitored host + monthly partitions. Best for monitoring 50+ DB-s
  • custom mode – all data inserted into a si
[...]
カテゴリー: postgresql

ページ