planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 17分 前

KUNTAL GHOSH: Data alignment in PostgreSQL

2019-02-13(水) 17:23:00
When data are naturally aligned, CPU can perform read and write to memory efficiently. Hence, each data type in PostgreSQL has a specific alignment requirement. When multiple attributes are stored consecutively in a tuple, padding is inserted before an attribute so that it begins from the required aligned boundary. A better understanding of these alignment requirements may help minimizing the amount of padding required while storing a tuple on disk, thus saving disk space.
Data types in Postgres are divided into following categories:
  • Pass-by-value, fixed length: Data types that are passed by values to Postgres internal routines and have fixed lengths fall into this category.. The length can be 1, 2,  or 4 (or 8 on 64-bit systems) bytes.
  • Pass-by-reference, fixed length: For these data types, an address reference from the in-memory heap page is sent to internal Postgres routines. They also have fixed lengths.
  • Pass-by_reference, variable length: For variable length data types, Postgres prepends a varlena header before the actual data. It stores some information about how the data is actually stored on-disk (uncompressed, compressed or TOASTed) and the actual length of the data. For TOASTed attributes, the actual data is stored in a separate relation. In these cases, the varlena headers follow some information about the actual location of the data in their corresponding TOAST relation. Typically, on-disk size of a varlena header is 1-byte. But, if the data cannot be toasted and size of the uncompressed data crosses 126 bytes, it uses a 4-bytes header. For example, CREATE TABLE t1 (
    , a varchar
    );
    insert into t1 values(repeat('a',126));
    insert into t1 values(repeat('a',127));
    select pg_column_size(a) from t1;
    pg_column_size
    ---------------------
    127
    131 Besides, attributes having 4-bytes varlena header need to be aligned to a 4-bytes aligned memory location. It may waste upto 3-bytes of additional padding space. So, some careful length restrictions on such columns may save space.
  • Pass-by_referen
[...]
カテゴリー: postgresql

Craig Kerstiens: SQL: One of the most valuable skills

2019-02-13(水) 01:52:00

I’ve learned a lot of skills over the course of my career, but no technical skill more useful than SQL. SQL stands out to me as the most valuable skill for a few reasons:

  1. It is valuable across different roles and disciplines
  2. Learning it once doesn’t really require re-learning
  3. You seem like a superhero. You seem extra powerful when you know it because of the amount of people that aren’t fluent

Let me drill into each of these a bit further.

SQL a tool you can use everywhere

Regardless of what role you are in SQL will find a way to make your life easier. Today as a product manager it’s key for me to look at data, analyze how effective we’re being on the product front, and shape the product roadmap. If we just shipped a new feature, the data on whether someone has viewed that feature is likely somewhere sitting in a relational database. If I’m working on tracking key business metrics such as month over month growth, that is likely somewhere sitting in a relational database. At the other end of almost anything we do there is likely a system of record that speaks SQL. Knowing how to access it most natively saves me a significant amount of effort without having to go ask someone else the numbers.

But even before becoming a product manager I would use SQL to inform me about what was happening within systems. As an engineer it could often allow me to pull information I wanted faster than if I were to script it in say Ruby or Python. When things got slow in my webapp having an understanding of the SQL that was executed and ways to optimize it was indespensible. Yes, this was going a little beyond just a basic understanding of SQL… but adding an index to a query instead of rolling my own homegrown caching well that was well worth the extra time learning.

SQL is permanent

I recall roughly 20 years ago creating my first webpage. It was magical, and then I introduced some Javascript to make it even more impressive prompting users to click Yes/No or give me some input. Then about 10 years later jQuery came along and while

[...]
カテゴリー: postgresql

Christophe Pettus: What’s up with SET TRANSACTION SNAPSHOT?

2019-02-12(火) 07:44:33

A feature of PostgreSQL that most people don’t even know exists is the ability to export and import transaction snapshots.

The documentation is accurate, but it doesn’t really describe why one might want to do such a thing.

First, what is a “snapshot”? You can think of a snapshot as the current set of committed tuples in the database, a consistent view of the database. When you start a transaction and set it to REPEATABLE READ mode, the snapshot remains consistent throughout the transaction, even if other sessions commit transactions. (In the default transaction mode, READ COMMITTED, each statement starts a new snapshot, so newly committed work could appear between statements within the transaction.)

However, each snapshot is local to a single transaction. But suppose you wanted to write a tool that connected to the database in multiple sessions, and did analysis or extraction? Since each session has its own transaction, and the transactions start asynchronously from each other, they could have different views of the database depending on what other transactions got committed. This might generate inconsistent or invalid results.

This isn’t theoretical: Suppose you are writing a tool like pg_dump, with a parallel dump facility. If different sessions got different views of the database, the resulting dump would be inconsistent, which would make it useless as a backup tool!

The good news is that we have the ability to “synchronize” various sessions so that they all use the same base snapshot.

First, a transaction opens and sets itself to REPEATABLE READ or SERIALIZABLE mode (there’s no point in doing exported snapshots in READ COMMITTED mode, since the snapshot will get replaced at the very next transaction). Then, that session calls pg_export_snapshot. This creates an identifier for the current transaction snapshot.

Then, the client running the first session passes that identifier to the clients that will be using it. You’ll need to do this via some non-database channel. For example, you can’t use LISTEN / NOTIFY,

[...]
カテゴリー: postgresql

Bruce Momjian: AT TIME ZONE Confusion

2019-02-12(火) 05:00:01

I saw AT TIME ZONE used in a query, and found it confusing. I read the Postgres documentation and was still confused, so I played with some queries and finally figured it out. I then updated the Postgres documentation to explain it better, and here is what I found.

First, AT TIME ZONE has two capabilities. It allows time zones to be added to date/time values that lack them (timestamp without time zone, ::timestamp), and allows timestamp with time zone values (::timestamptz) to be shifted to non-local time zones and the time zone designation removed. In summary, it allows:

  1. timestamp without time zone &roarr timestamp with time zone (add time zone)
  2. timestamp with time zone &roarr timestamp without time zone (shift time zone)

It is kind of odd for AT TIME ZONE to be used for both purposes, but the SQL standard requires this.

Continue Reading »

カテゴリー: postgresql

Regina Obe: Compiling http extension on ubuntu 18.04

2019-02-11(月) 17:31:00

We recently installed PostgreSQL 11 on an Ubuntu 18.04 using apt.postgresql.org. Many of our favorite extensions were already available via apt (postgis, ogr_fdw to name a few), but it didn't have the http extension we use a lot. The http extension is pretty handy for querying things like Salesforce and other web api based systems. We'll outline the basic compile and install steps. While it's specific to the http extension, the process is similar for any other extension you may need to compile.

Continue reading "Compiling http extension on ubuntu 18.04"
カテゴリー: postgresql

Alexey Lesovsky: pgCenter 0.6.0 released.

2019-02-10(日) 04:34:00
Great news for all pgCenter users - a new version 0.6.0 has been released with new features and few minor improvements.

Here are some major changes:
  • new wait events profiler - a new sub-command which allows to inspect long-running queries and understand what query spends its time on.
  • goreleaser support - goreleaser helps to build binary packages for you, so you can find .rpm and .deb packages on the releases page.
  • Goreport card A+ status - A+ status is the little step to make code better and align it to Golang code style
This release also includes following minor improvements and fixes:
  • report tool now has full help list of supported stats, you can, at any time, get a descriptive explanation of stats provided by pgCenter. Check out the “--describe” flag of “pgcenter report”;
  • “pgcenter top” now has been fixed and includes configurable aligning of columns, which make stats viewing more enjoyable (check out builtin help for new hotkeys);
  • wrong handling of group mask has been fixed. It is used for canceling group of queries, or for termination of backends’ groups;
  • also fixed the issue when pgCenter is failed to connect to Postgres with disabled SSL;
  • and done some other minor internal refactoring.
New release is available here. Check it out and have a nice day.
カテゴリー: postgresql

Christophe Pettus: Do not change autovacuum age settings

2019-02-09(土) 10:52:05

PostgreSQL has two autovacuum-age related settings, autovacuum_freeze_max_age, and vacuum_freeze_table_age.

Both of them are in terms of the transaction “age” of a table: That is, how long it has been since the table has been scanned completely for “old” tuples that can be marked as “frozen” (a “frozen” tuple is one that no open transaction can cause to disappear by a rollback). In short, the “oldest” a table can become in PostgreSQL is 2^31-1 transactions; if a table were ever to reach that, data loss would occur. PostgreSQL takes great pains to prevent you from eaching that point.

The “vacuum freeze” process is the process that scans the table and marks these tuples as frozen.

vacuum_freeze_table_age causes a regular autovacuum run to be an “autovacuum (to prevent xid wraparound)” run, that is, an (auto)vacuum freeze, if the age of the table is higher than vacuum_freeze_table_age.

autovacuum_freeze_max_age will cause PostgreSQL to start an “autovacuum (to prevent xid wraparound)” run even if it has no other reason to vacuum the table, should a table age exceed that setting.

By default, vacuum_freeze_table_age = 100000000 (one hundred million), and autovacuum_freeze_max_age = 200000000 (two hundred million).

Do not change them.

In the past, I made a recommendation I now deeply regret. Because, before 9.6, each autovacuum freeze run scanned the entire table, and (on its first pass) potentially rewrote the entire table, it could be very high I/O, and when it woke up suddenly, it could cause performance issues. I thus recommended two things:

  1. Increase autovacuum_freeze_max_age and vacuum_freeze_table_age, and,
  2. Do manual VACUUM FREEZE operations on the “oldest” tables during low-traffic periods.

Unfortunately, far too many installations adopted recommendation #1, but didn’t do #2. The result was that they cranked up autovacuum_freeze_max_age so high that by the time the mandatory autovacuum freeze operation began, they were so close to transaction XID wraparound point, they had no choice but to take the system offl

[...]
カテゴリー: postgresql

Craig Kerstiens: The most useful Postgres extension: pg_stat_statements

2019-02-09(土) 03:59:00

Extensions are capable of extending, changing, and advancing the behavior of Postgres. How? By hooking into low level Postgres API hooks. The open source Citus database that scales out Postgres horizontally is itself implemented as a PostgreSQL extension, which allows Citus to stay current with Postgres releases without lagging behind like other Postgres forks. I’ve previously written about the various types of extensions, today though I want to take a deeper look at the most useful Postgres extension: pg_stat_statements.

You see, I just got back from FOSDEM. FOSDEM is the annual free and open source software conference in Brussels, and at the event I gave a talk in the PostgreSQL devroom about Postgres extensions. By the end of the day, over half the talks that had been given in the Postgres devroom mentioned pg_stat_statements:

Most frequently dispensed #PostgreSQL tip-of-the-day here in the Postgres devroom at #FOSDEM? Use pg_stat_statements! @Xof’s talk on Breaking PostgreSQL at Scale is the 4th talk today to drive this point home HT @craig @net_snow @magnushagander pic.twitter.com/Tcwkhy8W8h

— Claire Giordano (@clairegiordano) February 3, 2019

If you use Postgres and you haven’t yet used pg_stat_statements, it is a must to add it to your toolbox. And even if you are familiar, it may be worth a revisit.

Getting started with pg_stat_statements

Pg_stat_statements is what is known as a contrib extension, found in the contrib directory of a PostgreSQL distribution. This means it already ships with Postgres and you don’t have to go and build it from source or install packages. You may have to enable it for your database if it is not already enabled. This is as simple as:

CREATE EXTENSION pg_stat_statements;

If you run on a major cloud provider there is a strong likelihood they have already installed and enabled it for you.

Once pg_stat_statements is installed, it begins silently going to work under the covers. Pg_stat_statements records queries that are run against your database, strips out a number of va

[...]
カテゴリー: postgresql

Bruce Momjian: PgLife For Familiarization

2019-02-09(土) 00:00:01

I worked with two companies this week to help them build open-source Postgres teams. Hopefully we will start seeing their activity in the community soon.

One tool I used to familiarize them with the Postgres community was PgLife. Written by me in 2013, PgLife presents a live dashboard of all current Postgres activity, including user, developer, and external topics. Not only a dashboard, you can drill down into details too. All the titles on the left are click-able, as are the detail items. The plus sign after each Postgres version shows the source code changes since its release. Twitter and Slack references have recently been added.

I last mentioned PgLife here six years ago, so I thought I would mention it again. FYI, this is my 542nd blog entry. If you missed any of them, see my category index at the top of this page.

カテゴリー: postgresql

Daniel Vérité: Postgres instances open to connections from the Internet

2019-02-08(金) 21:40:00

A PostgreSQL server may be accessible from the Internet, in the sense that it may listen on a public IP address and a TCP port accepting connections from any origin.

With the rising popularity of the DBaaS (“Database As A Service”) model, database servers can be legitimately accessible from the Internet, but it can also be the result of an unintentional misconfiguration.

As a data point, shodan.io, a scanner service that monitors such things, finds currently more than 650,000 listening Postgres instances on the Internet, without prejudging how they’re protected by host-based access rules, strong passwords, and database-level grants.

Such an open configuration at the network level is opposed to the more traditional, secure one where database servers are at least protected by a firewall, or don’t even have a network interface connected to the Internet, or don’t listen on it if they have one.

One consequence of having an instance listening to connections from the Internet is that intrusion attempts on the default port 5432 may happen anytime, just like it happens for other services such as ssh, the mail system or popular web applications like Drupal, Wordpress or phpMyAdmin.

If you have a server on the Internet, you may put its IP address in the search field of shodan.io to see what it knows about it.

The purpose of this post is to put together a few thoughts on this topic, for people who already manage PostgreSQL instances accepting public connections, or plan to do that in the future, or on the contrary, want to make sure that their instances don’t do that.

Do not mistakenly open your instance to the Internet!

When asking “how to enable remote access to PostgreSQL?”, the typical answer is almost invariably to add some rules in pg_hba.conf and set in postgresql.conf:

listen_addresses = *

(replacing the default listen_addresses = localhost).

It does work indeed, by making all the network interfaces to listen, but not necessarily only those where these connections are expected. In the case that they should come onl

[...]
カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - Functions for partitions

2019-02-08(金) 12:50:47

Partitions in Postgres are a recent concept, being introduced as of version 10 and improved a lot over the last years. It is complicated, and doable, to gather information about them with specific queries working on the system catalogs, still these may not be straight-forward. For example, getting a full partition tree leads to the use of WITH RECURSIVE when working on partitions with multiple layers.

Postgres 12 is coming with improvements in this regard with two commits. The first one introduces a new system function to get easily information about a full partition tree:

commit: d5eec4eefde70414c9929b32c411cb4f0900a2a9 author: Michael Paquier <michael@paquier.xyz> date: Tue, 30 Oct 2018 10:25:06 +0900 Add pg_partition_tree to display information about partitions This new function is useful to display a full tree of partitions with a partitioned table given in output, and avoids the need of any complex WITH RECURSIVE query when looking at partition trees which are deep multiple levels. It returns a set of records, one for each partition, containing the partition's name, its immediate parent's name, a boolean value telling if the relation is a leaf in the tree and an integer telling its level in the partition tree with given table considered as root, beginning at zero for the root, and incrementing by one each time the scan goes one level down. Author: Amit Langote Reviewed-by: Jesper Pedersen, Michael Paquier, Robert Haas Discussion: https://postgr.es/m/8d00e51a-9a51-ad02-d53e-ba6bf50b2e52@lab.ntt.co.jp

The second function is able to find the top-most parent of a partition tree:

commit: 3677a0b26bb2f3f72d16dc7fa6f34c305badacce author: Michael Paquier <michael@paquier.xyz> date: Fri, 8 Feb 2019 08:56:14 +0900 Add pg_partition_root to display top-most parent of a partition tree This is useful when looking at partition trees with multiple layers, and combined with pg_partition_tree, it provides the possibility to show up an entire tree by just knowing one member at any level. Author: Michael Paquier Re[...]
カテゴリー: postgresql

Vasilis Ventirozos: Managing xid wraparound without looking like a (mail) chimp

2019-02-07(木) 05:07:00
My colleague Payal came across an outage that happened to mailchimp's mandrill app yesterday, link can be found HERE.
Since this was PostgreSQL related i wanted to post about the technical aspect of it.
According to the company :

“Mandrill uses a sharded Postgres setup as one of our main datastores,”
the email explains.
“On Sunday, February 3, at 10:30pm EST, 1 of our 5 physical Postgres instances saw a significant spike in writes.” 
The email continues:
The spike in writes triggered a Transaction ID Wraparound issue. When this occurs, database activity is completely halted. The database sets itself in read-only mode until offline maintenance (known as vacuuming) can occur.”

So, lets see what that "transaction id wraparound issue" is and how someone could prevent similar outages from ever happening.

PostgreSQL uses MVCC to control transaction visibility, basically by comparing transaction IDs (XIDs). A row with an insert XID greater than the current transaction  XID shouldn't be visible to the current transaction. But since transaction IDs are not unlimited a cluster will eventually run out after
(2^32 transactions 4+ billion) causing transaction ID wraparound: transaction counter wraps around to zero, and all past transaction would appear to be in the future

This is being taken care of by vacuum that will mark rows as frozen, indicating that they were inserted by a transaction that committed far in the past that can be visible to all current and future transactions. To control this behavior, postgres has a configurable called autovacuum_freeze_max_age, which defaults at 200.000.000 transactions, a very conservative default that must be tuned in larger production systems.

It sounds complicated but its relatively easy not to get to that point,for most people just having autovacuum on will prevent this situation from ever happening. You can simply schedule manual vacuums by getting a list of the tables "closer" to autovacuum_freeze_max_age with a simple query like this:


SELECT 'vacuum analyze ' || c.oid::regcl[...]
カテゴリー: postgresql

Mark Wong: PDXPUG: February Meetup: Temporal Databases: Theory and Postgres

2019-02-07(木) 04:51:29

2019 February 21 Meeting (Note: Back to third Thursday this month!)

Location:

PSU Business Accelerator
2828 SW Corbett Ave · Portland, OR
Parking is open after 5pm.

Speaker: Paul Jungwirth

Temporal databases let you record history: either a history of the database (what the table used to say), a history of the thing itself (what it used to be), or both at once. The theory of temporal databases goes back to the 90s, but standardization has only just begun with some modest recommendations in SQL:2011, and database products (including Postgres) are still missing major functionality.

This talk will cover how temporal tables are structured, how they are queried and updated, what SQL:2011 offers (and doesn’t), what functionality Postgres has already, and what remains to be built.

Paul started programming on a Tandy 1000 at age 8 and hasn’t been able to stop since. He helped build one of the Mac’s first web servers in 1994 and has founded software companies in politics and technical hiring. He works as an independent consultant specializing in Rails, Postgres, and Chef.

カテゴリー: postgresql

Bruce Momjian: Expanding Permission Letters

2019-02-07(木) 03:30:01

Thanks to a comment on my previous blog post by Kaarel, the ability to simply display the Postgres permission letters is not quite as dire as I showed. There is a function, aclexplode(), which expands the access control list (ACL) syntax used by Postgres into a table with full text descriptions. This function exists in all supported versions of Postgres. However, it was only recently documented in this commit based on this email thread, and will appear in the Postgres 12 documentation.

Since aclexplode() exists (undocumented) in all supported versions of Postgres, it can be used to provide more verbose output of the pg_class.relacl permission letters. Here it is used with the test table created in the previous blog entry:

SELECT relacl FROM pg_class WHERE relname = 'test'; relacl -------------------------------------------------------- {postgres=arwdDxt/postgres,bob=r/postgres,=r/postgres} SELECT a.* FROM pg_class, aclexplode(relacl) AS a WHERE relname = 'test' ORDER BY 1, 2; grantor | grantee | privilege_type | is_grantable ---------+---------+----------------+-------------- 10 | 0 | SELECT | f 10 | 10 | SELECT | f 10 | 10 | UPDATE | f 10 | 10 | DELETE | f 10 | 10 | INSERT | f 10 | 10 | REFERENCES | f 10 | 10 | TRIGGER | f 10 | 10 | TRUNCATE | f 10 | 16388 | SELECT | f

Continue Reading »

カテゴリー: postgresql

Peter Eisentraut: PostgreSQL with passphrase-protected SSL keys under systemd

2019-02-06(水) 21:34:56

PostgreSQL supports SSL, and SSL private keys can be protected by a passphrase. Many people choose not to use passphrases with their SSL keys, and that’s perhaps fine. This blog post is about what happens when you do have a passphrase.

If you have SSL enabled and a key with a passphrase and you start the server, the server will stop to ask for the passphrase. This happens automatically from within the OpenSSL library. Stopping to ask for a passphrase obviously prevents automatic starts, restarts, and reboots, but we’re assuming here that you have made that tradeoff consciously.

When you run PostgreSQL under systemd, which is very common nowadays, there is an additional problem. Under systemd, the server process does not have terminal access, and so it cannot ask for any passphrases. By default, the startup will fail in such setups.

As of PostgreSQL 11, it is possible to configure an external program to obtain the SSL passphrase, using the configuration setting ssl_passphrase_command. As a simple example, you can set

ssl_passphrase_command = 'echo "secret"'

and it will apply the passphrase that the program prints out. You can use this to fetch the passphrase from a file or other secret store, for example.

But what if you still want the security of having to manually enter the password? Systemd has a facility that lets services prompt for passwords. You can use that like this::

ssl_passphrase_command = '/bin/systemd-ask-password "%p"'

Except that that doesn’t actually work, because non-root processes are not permitted to use the systemd password system; see this bug. But there are workarounds.

One workaround is to use sudo. So you use

ssl_passphrase_command = 'sudo /bin/systemd-ask-password "%p"'

and then put something like this into /etc/sudoers:

postgres ALL=(root) NOPASSWD: /bin/systemd-ask-password

A more evil workaround (discussed in the above-mentioned bug report) is to override the permissions on the socket file underlying this mechanism. Add this to the postgresql service unit:

ExecStartPre=+/bin/setfa[...]
カテゴリー: postgresql

Jignesh Shah: PGConf.RU 2019: Slides from my sessions

2019-02-06(水) 19:17:00
It was my first visit to Moscow for PGConf.RU 2019. Enjoyed meeting the strong community of PostgreSQL in Russia!


Slides from my sessions:

1. Deep Dive into the RDS PostgreSQL Universe


Deep Dive into the RDS PostgreSQL Universe from Jignesh Shah

2. Tips and Tricks for Amazon RDS for PostgreSQL

Tips and Tricks with Amazon RDS for PostgreSQL from Jignesh Shah

This blog represents my own view points and not of my employer, Amazon Web Services.


カテゴリー: postgresql

James Coleman: We scale both vertically and horizontally with dozens of PostgreSQL clusters across multiple…

2019-02-06(水) 00:03:47

We scale both vertically and horizontally with dozens of PostgreSQL clusters across multiple datacenters.

カテゴリー: postgresql

Hans-Juergen Schoenig: Implementing “AS OF”-queries in PostgreSQL

2019-02-05(火) 23:01:09

Over the years many people have asked for “timetravel” or “AS OF”-queries in PostgreSQL. Oracle has provided this kind of functionality for quite some time already. However, in the PostgreSQL world “AS OF timestamp” is not directly available. The question now is: How can we implement this vital functionality in user land and mimic Oracle functionality?

Implementing “AS OF” and timetravel in user land

Let us suppose we want to version a simple table consisting of just three columns: id, some_data1 and some_data2. To do this we first have to install the btree_gist module, which adds some valuable operators we will need to manage time travel. The table storing the data will need an additional column to handle the validity of a row. Fortunately PostgreSQL supports “range types”, which allow to store ranges in an easy and efficient way. Here is how it works:

CREATE EXTENSION IF NOT EXISTS btree_gist; CREATE TABLE t_object ( id int8, valid tstzrange, some_data1 text, some_data2 text, EXCLUDE USING gist (id WITH =, valid WITH &&) );

Mind the last line here: “EXLUDE USING gist” will ensure that if the “id” is identical the period (“valid”) must not overlap. The idea is to ensure that the same “id” only has one entry at a time. PostgreSQL will automatically create a Gist index on that column. The feature is called “exclusion constraint”. If you are looking for more information about this feature consider checking out the official documentation (https://www.postgresql.org/docs/current/ddl-constraints.html).

If you want to filter on some_data1 and some_data2 consider creating indexes. Remember, missing indexes are in many cases the root cause of bad performance:

CREATE INDEX idx_some_index1 ON t_object (some_data1); CREATE INDEX idx_some_index2 ON t_object (some_data2);

By creating a view, it should be super easy to extract data from the underlying tables:

CREATE VIEW t_object_recent AS SELECT id, some_data1, some_data2 FROM t[...]
カテゴリー: postgresql

Avinash Kumar: Use pg_repack to Rebuild PostgreSQL Database Objects Online

2019-02-05(火) 10:14:49

In this blog post, we’ll look at how to use

pg_repack  to rebuild PostgreSQL database objects online.

We’ve seen a lot of questions regarding the options available in PostgreSQL for rebuilding a table online. We created this blog post to explain the 

pg_repack  extension, available in PostgreSQL for this requirement. pg_repack is a well-known extension that was created and is maintained as an open source project by several authors.

There are three main reasons why you need to use

pg_repack  in a PostgreSQL server:
  1. Reclaim free space from a table to disk, after deleting a huge chunk of records
  2. Rebuild a table to re-order the records and shrink/pack them to lesser number of pages. This may let a query fetch just one page  ( or < n pages) instead of n pages from disk. In other words, less IO and more performance.
  3. Reclaim free space from a table that has grown in size with a lot of bloat due to improper autovacuum settings.

You might have already read our previous articles that explained what bloat is, and discussed the internals of autovacuum. After reading these articles, you can see there is an autovacuum background process that removes dead tuples from a table and allows the space to be re-used by future updates/inserts on that table. Over a period of time, tables that take the maximum number of updates or deletes may have a lot of bloated space due to poorly tuned autovacuum settings. This leads to slow performing queries on these tables. Rebuilding the table is the best way to avoid this. 

Why is just autovacuum not enough for tables with bloat?

We have discussed several parameters that change the behavior of an autovacuum process in this blog post. There cannot be more than

autovacuum_max_workers  number of autovacuum processes running in a database cluster at a time. At the same time, due to untuned autovacuum settings and no manual vacuuming of the database as a weekly or monthy jobs, many tables can be skipped from autovacuum. We have discussed in this post that the default autovacuum settings run autova[...]
カテゴリー: postgresql

Christophe Pettus: “Breaking PostgreSQL at Scale” at FOSDEM 2019

2019-02-05(火) 06:00:49

The slides for my talk, “Breaking PostgreSQL at Scale” at FOSDEM 2019 are available.

カテゴリー: postgresql

ページ