planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 26分 前

Jobin Augustine: PostgreSQL WAL Retention and Clean Up: pg_archivecleanup

2019-07-10(水) 23:44:08

WAL retention is a very important topic for PostgreSQL database management. But very often we come across DBAs getting into surprise situations such as:

1. Several TBs of WALs piled up in archive destination

2. WALs filling up pg_wal/pg_xlog directory due to failing archive

3. Necessary WALs are no longer preserved

External backup projects for PostgreSQL are good in addressing retention policies. But there is a simple program named pg_archivecleanup which comes along with PostgreSQL binaries which might be useful in both maintaining the WAL retention as well as handling an emergency situation. pg_archivecleanup is probably the least talked about utility among standard PostgreSQL binaries. It is extremely simple and useful, and it can work in 2 scenarios:

  • Can be used as a standalone program to clean up old WAL files from any file system location.
  • It can also be used in Standby side for cleaning up WAL files which are no longer required for Standby.

This program can operate in dryrun mode (-n option) or delete mode (-d option).

Dryrun (-n)

This feature is useful to list all WALs older than a specific WAL. In the following demonstration, I am listing all WALs which are still in the default wal location (pg_wal) inside my data directory and older than a specific WAL:

$ ./pg_archivecleanup -n ~/bigsql/data/pg11/pg_wal 00000001000000000000001E /home/jobin/bigsql/data/pg11/pg_wal/00000001000000000000001B /home/jobin/bigsql/data/pg11/pg_wal/000000010000000000000017 ... Delete Mode (-d)

In this mode, pg_archivecleanup does the cleanup by removing all the candidate WALs:

$ pg_archivecleanup -d /home/postgres/archive 00000002000000000000006B pg_archivecleanup: keeping WAL file "/home/postgres/archive/00000002000000000000006B" and later pg_archivecleanup: removing file "/home/postgres/archive/000000020000000000000069" pg_archivecleanup: removing file "/home/postgres/archive/00000002000000000000006A" ... Understanding WAL retention criteria

In order to do a clean up

[...]
カテゴリー: postgresql

Dave Page: Reverse Proxying to pgAdmin

2019-07-10(水) 21:33:00
Reverse proxying requests to a pgAdmin server is becoming more and more popular if posts to the mailing lists are to be taken as an indicative measure; more often than not when using pgAdmin in a container (of which there have now been over 10 million pulls)! Typically users will deploy a reverse proxy for a couple of reasons; to host multiple applications in different subdirectories under the same domain, or to add SSL/TLS support independently of the application.

Because of the number of questions asked, I spent a little time over the last couple of days doing some testing and updating the documentation with some examples. Here's a blog-ified version of that work.

Nginx Nginx is winning the battle of the web servers these days, beating out Lighttpd (which is still a fine piece of software) and the ageing and arguably bloated Apache HTTPD. All of these servers support reverse proxying, and whilst I've looked at Nginx, the examples shown below can easily be translated to the other servers if you prefer to run them instead.
In the following examples, we have pgAdmin running in a Docker container (in which it's hosted under Gunicorn). For simplicity, the examples have Nginx running on the host machine, but it can also be easily run in another container, sharing a Docker network with pgAdmin. In such a configuration there is no need to map the pgAdmin container port to the host.
The container is launched as shown below. See the documentation for information on other useful environment variables you can set and paths you can map.
The commands below will pull the latest version of the container from the repository, and run it with port 5050 on the host mapped to port 80 on the container. It will set the default username and password to user@domain.com and SuperSecret respectively.
docker pull dpage/pgadmin4
docker run -p 5050:80 \
-e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
-e "PGADMIN_DEFAULT_[...]
カテゴリー: postgresql

Paul Ramsey: Serving Dynamic Vector Tiles from PostGIS

2019-07-10(水) 04:09:15

One of the most popular features of PostGIS 2.5 was the introduction of the "vector tile" output format, via the ST_AsMVT() function.

カテゴリー: postgresql

Laurenz Albe: ICU collations against glibc 2.28 data corruption

2019-07-09(火) 17:00:27
© Laurenz Albe 2019

 

As announced by the glibc mailing list and discussed by Daniel Vérité, version 2.28 of the GNU C library will ship with many changes to the collations.

PostgreSQL uses the operating system’s collations by default. When this version hits your Linux operating system as part of a routine bugfix upgrade, and you don’t use the “C” or “POSIX” collation, you stand a good chance that some indexes on strings will become corrupted.

Since version 10, PostgreSQL does not only support “libc collations” but also “ICU collations” which are not affected by this particular problem. This article describes ICU collations and how you can swicth your database to using them.

Why is it a problem?

Collations determine how strings are compared, so they affect everything in the database that has to do with the order of strings:

  • the result of the ORDER BY clause in SELECT statements
  • the order of the keys in b-tree indexes

Now changes in the behavior of ORDER BY can be annoying, but that doesn’t threaten data integrity.
But if your index is suddenly ordered in the wrong fashion, that is data corruption. It can lead to incorrect query results or to duplicate entries in UNIQUE indexes.

What can you do to avoid the problem

The straightforward remedy is to rebuild all indexes on strings.

The simple way to do this is to run REINDEX INDEX for all affected indexes. However, that will block all attempts to use the index and will prevent data modifications on the table.

In PostgreSQL v12, you will be able to use REINDEX CONCURRENTLY to avoid these disadvantages. In older versions of PostgreSQL you can work around this by using CREATE INDEX CONCURRENTLY to create a copy of the index, then drop the original one. I’ll demonstrate that further down.

But I wanted to tell you about ICU collations and their benefits.

ICU collations

ICU is a set of libraries that provide, among other things, standardized Unicode collations. They are available on many op

[...]
カテゴリー: postgresql

Luca Ferrari: Generate Primary Keys (almost) Automatically

2019-07-09(火) 09:00:00

What if your database design is so poor that you need to refactor tables in order to add primary keys?

Generate Primary Keys (almost) Automatically

While playing on quite large database (in terms of number of tables) with a friend of mine, we discovered that almost all tables did not have a primary key!
Gosh!
This is really baaaad!

Why is that bad? Well, you should not ask, but let’s keep the poor database design alone and focus on some more concrete problems: in particular not having a primary key prevents a lot of smart softwares and middlewares to work on your database. As you probably know, almost every ORM requires each table to have at least one surrogate key in order to properly identify each row and enable persistence (that is, modification of rows).

Luckily, fixing tables for such software is quite simple: just add a surrogate key and everyone will be happy again. But unluckily, while adding a primary key is a matter of issuing an ALTER TABLE, doing so for a long list of tables is boring.

Here comes the power of PostgreSQL again: thanks to its rich catalog, it is possible to automate the process.



In this post you will see how to build from a query to a whole procedure that does the trick.

A query to generate the ALTER TABLE commands

A first example is the following query, that searches for every table in the schema public that does not have a constraint of type p (primary key) and issue an ALTER TABLE for such table:

testdb=# WITH to_be_fixed...
カテゴリー: postgresql

Luca Ferrari: PostgreSQL & recovery.conf

2019-07-08(月) 09:00:00

The coming version of PostgreSQL, 12, will loose the recovery.conf file. It will get some time to get used to!

PostgreSQL & recovery.conf

According to the documentation for the upcoming version 12, the recovery.conf file has gone! The release note states it clearly: the server will not start if recovery.conf is in place and all the configuration parameters have moved to the classic postgresql.conf (or included files).

The change proposal is quite old, but represents a deep change in the way PostgreSQL handles the server startup and recovery and could take a while to get all the software out there to handle it too.

Please note that since PostgreSQL 12 is still in beta, things could change a little, even if the discussion and the implementation is nearly ended.

Two files can be created to instrument a standby node:

  • standby.signal if present in the PGDATA directory the host will work as a standby, that is it will wait for new incoming WAL segments and replay them for the rest of its life;
  • recovery.signal if present will stop the WAL replaying as soon as all the WALs have been consumed or the recovery_target parameter has been reached.


It is interesting to note that standby.signal takes precedence on recovery.signal, meaning that if both file exists the node will act as a standby. Both files may be empty, they act now as as triggering files rather than configuration files (here the change in the suffix).

So, what is the rationale for this change? There are several reasons, including the not needing for a duplication of configuration files. But...

カテゴリー: postgresql

Daniel Vérité: The CSV output format in psql

2019-07-06(土) 21:50:12

Exporting query results in CSV has been possible for a long time (since version 8.0), either with
COPY (SELECT ...) TO STDOUT CSV as an SQL command, or with the \copy meta-command in psql, which invokes COPY under the hood and handles the flow of data on the client side.

But there are still a few cases not covered by this functionality, which is why in PostgreSQL 12, CSV has been added to the output formats. It means that we can now issue \pset format csv, so that all commands producing tabular results can output them in the CSV format. It can also be opted for on the command line with the --csv option.

In practice, we need to use an output format instead of \copy:

  • when data must be fetched through a query that is not supported by COPY, for instance a cursor, since fetch from c is valid but copy (fetch from c) to stdout csv is not.

  • when the result is produced by a meta-command instead of a query: \crosstabview, \l, \d etc…

CSV as a replacement for the unaligned format (-A)

Aside from \copy, simple exports in tabular format are often done with the “unaligned” format with its field separator given by the fieldsep parameter (-F option).

But this format has two weaknesses:

  • when the separator appears inside the data, there’s no way to distinguish it (no escape mechanism).
  • when line feeds are present inside fields (in multi-line text), there’s no way to distinguish a line feed within a field from a record separator.

Example:

$ query="select 'ab,cd' as col1, 'ef'||chr(10)||'gh' as col2" $ psql -AtF',' -c "$query" ab,cd,ef gh

In the above example, when trying to interpret the output, we can’t know where the fields start and end, nor how many records there were in the source data.

Although it comes from a single record of two columns, this output could just as well represent a single column with ab,cd,ef as the value of the first record, and gh for the second one.

The CSV format solves this problem:

$ psql --csv -t[...]
カテゴリー: postgresql

elein mustain: Postgresql Interval, Date, Timestamp and Time Data Types

2019-07-06(土) 04:42:23
Does anyone really know what time it is? First we have the bare bones of the PostgreSQL Interval, Date and Timestamp Data types. Here are the questions: What types are they? And what options do they have? What postgresql.conf variables affect date and time i/o? Available Date and Time Data types Date day of year […]
カテゴリー: postgresql

Louise Grandjonc: Testing your Django app with Citus

2019-07-05(金) 23:02:00

Recently, I started working on the django-multitenant application. The main reason we created it was to to help django developers use citus in their app. While I was working on it, I wrote unit tests. And to be able to reproduce a customer’s production environment, I wanted the tests to use citus and not a single node postgres. If you are using citus as your production database, we encourage you to have it running in your development environment as well as your staging environments to be able to minimise the gap between dev and production. To understand better the importance of dev/prod parity, I recommend reading the Twelve-Factor app that will give you ideas to lower the chances of having last minute surprising when deploying on prod.

The goal of this article is to explain how I set it up as I think that if you are using citus on production, it is better to have tests running with citus.

Setting up docker

I will consider that you already have docker installed, otherwise I recommend taking a look at our article[https://docs.citusdata.com/en/v8.1/installation/singlemachinedocker.html#single-machine-docker]

Once you have docker running, create in your app directory a docker-compose.yml containing:

version: '2.1' services: master: image: 'citusdata/citus:8.2.1' ports: ['5600:5432'] labels: ['com.citusdata.role=Master'] volumes: ['/var/run/postgresql'] manager: container_name: "${COMPOSE_PROJECT_NAME:-citus}_manager" image: 'citusdata/membership-manager:0.1.0' volumes: ['/var/run/docker.sock:/var/run/docker.sock'] depends_on: { master: { condition: service_healthy } } worker1: image: 'citusdata/citus:8.2.1' ports: ['5601:5432'] labels: ['com.citusdata.role=Worker'] depends_on: { manager: { condition: service_healthy } } worker2: image: 'citusdata/citus:8.2.1' ports: ['5602:5432'] labels: ['com.citusdata.role=Worker'] depends_on: { manager: { condition: service_healthy } } healthcheck: image: busybox depends_on: [...]
カテゴリー: postgresql

Kirk Roybal: Generated columns in PostgreSQL 12

2019-07-05(金) 22:26:11
Computed columns in PostgreSQL 12 are here, but still have a ways to go.
カテゴリー: postgresql

Pavel Golub: Bulding PostgreSQL (x86, x64) and OpenSSL using MSYS2 and MinGW under Windows

2019-07-03(水) 17:00:43

I’ve written already a blog post “Building PostgreSQL with MSYS2 and MinGW under Windows” a year ago. But this time I want to show you two important things:

  1. cross compiling (x86, x64) using the same run-time;
  2. compiling with the latest OpenSSL support.

That’s true nowadays you won’t find binaries or installations of PostgreSQL without OpenSSL support. It used not only for encryption but for compression as well.

Even though it’s Windows, let’s make all steps in command prompt only. Ya, I told you Linux guys, we can run console too!

Setting up the environment

First things first. I launch my msys2 shell. I assume you have it on your system, if not check installation instructions from my previous post.

c:\msys64\ is the msys2 installation folder.

C:\Users\pasha>c:\msys64\msys2_shell.cmd -mingw64 -conemu

I’m telling msys2 to run shell and use -mingw64 as a default chain. That means we will build x64 binaries. -conemu is an optional parameter, you may want to pass it in case you’re a fan of ConEmu as I am. If not, just skip it.

pasha@PG480 MINGW64 ~ $ mkdir /src pasha@PG480 MINGW64 ~ $ cd /src pasha@PG480 MINGW64 /src $ git clone https://github.com/openssl/openssl.git --branch OpenSSL_1_1_1-stable --depth 1 Cloning into 'openssl'... remote: Enumerating objects: 18297, done. remote: Counting objects: 100% (18297/18297), done. remote: Compressing objects: 100% (15182/15182), done. remote: Total 18297 (delta 884), reused 14976 (delta 582), pack-reused 0 Receiving objects: 100% (18297/18297), 13.48 MiB | 1.38 MiB/s, done. Resolving deltas: 100% (884/884), done. Checking out files: 100% (18166/18166), done. pasha@PG480 MINGW64 /src $ git clone https://github.com/postgres/postgres.git --branch REL_11_STABLE --depth 1 Cloning into 'postgres'... remote: Enumerating objects: 5868, done. remote: Counting objects: 100% (5868/5868), done. remote: Compressing objects: 100% (5365/5365), done. remote: Total 5868 (delta 754), reused 1296 (delta 317), pack-reused 0 Receiving objects: 100% (5[...]
カテゴリー: postgresql

Doug Hunley: How the CIS Benchmark for PostgreSQL 11 Works

2019-07-02(火) 19:28:00

Crunchy Data has recently announced an update to the CIS PostgreSQL Benchmark by the Center for Internet Security, a nonprofit organization that provides publications around standards and best practices for securing technologies systems. This newly published CIS PostgreSQL 11 Benchmark joins the existing CIS Benchmarks for PostgreSQL 9.5, 9.6, and 10 while continuing to build upon Crunchy Data's efforts with the PostgreSQL Security Technical Implementation Guide (PostgreSQL STIG).

カテゴリー: postgresql

Louise Grandjonc: PostgreSQL's indexes - GIN algorithms

2019-07-02(火) 02:20:22
Introduction In the previous article, I introduced the internal data structure of a postgres GIN index. Now it’s time to talk about searching and maintaining them. Searching in a GIN index When postgres uses the GIN index, you’ll notice in the EXPLAIN that it’s always doing a bitmap heap scan. It means that the search algorithm returns a map of tids (pointers to rows) in physical memory order. Considering the way the data is stored inside the posting lists and trees, it makes a lot of sense to use this method.
カテゴリー: postgresql

Louise Grandjonc: PostgreSQL's indexes - GIN

2019-07-02(火) 02:20:21
Introduction GIN (Generalized Invertes Index) indexes are often used to index arrays, jsonb, and tsvector (for fulltext search) columns. When it comes to array, they are, for example, used to verify if an array contains an other array or elements (the <@ operator). In the documentation you can see the full list of operators. But the question that I want to answer in this article is: “why should we use GIN indexes for this data types and operators ?
カテゴリー: postgresql

Ibrar Ahmed: Deep Dive Into PostgreSQL Indexes Webinar: Q & A

2019-07-01(月) 22:39:13

I want to thank everybody who attended my session “Deep Dive in PostgreSQL Indexes” in Austin. It was quite a wonderful experience! To cover a bigger audience, I also did a webinar on the same topic. There were many questions in that webinar, but unfortunately, there was not enough time to cover each and every question, so I decided to have a followup Q&A session with a blog post on the topic.

The recording and slides are available here. Below is the list of your questions.

Q1: Is there a concept of the functional index in PostgreSQL?

The functional index is when an index is defined on the result of a function. This is present in PostgreSQL 7. In later releases of PostgreSQL, you can define the index on the result of an expression. The expression can be any PostgreSQL expression. The function is a subset of expression.

Q2: Can I create an index using a table row data that is attached to the index? Does my query have to use index-only-scan?

Yes, if all selected columns are in the index, then index-only-scans will be used.

Q3: It’s a little bit hard to choose the right index type. Is there any offer mechanism in Postgres for index types?

Yes, that’s true that is hard to choose, but it all depends on your data and queries types. I have mentioned recommendations in my slides. There is no mechanism for that in PostgreSQL.

Q4: Is it possible to manipulate which index is used by a query if you have multiple indexes?

No, it’s optimizer’s responsibility to choose between defined the indexes. Believe that the PostgreSQL optimizer will choose the best option for you.

Q5: If there’s a need to index a date column, would it be better to use B-TREE or BRIN index?

A date data type is mostly aligned with physical storage. If you are not making many updates and delete operations, then go for the BRIN. But if you have too many updates and deletes, then B-TREE is best for you.

Q6: For time-based filters what kind of index is preferable?

Time is also aligned with physical storag

[...]
カテゴリー: postgresql

Hans-Juergen Schoenig: PostgreSQL meets “Data Science and AI”

2019-07-01(月) 18:40:51

Machine Learning, Deep Learning, Artificial Intelligence, and Data Science have become really important topics these days. Everybody is looking for new technologies and tries to make use of those recent developments. PostgreSQL can help to achieve those goals, and it offers reliable backend storage for your data.

PostgreSQL vs. CSV files vs. commercial databases

In the past we have seen that many in the Data Science and AI community are still using CSV files to process data. While there is a lot of tooling available (e.g. the Python pandas library) it might not be the best of all decision to build your models on CSV files.

So what is wrong with a CSV file? Well, actually nothing as long as …

  • the format of your file is correct
  • the amount of data is reasonable
  • ALL the data in the file is needed
  • nothing changes

All those points might seem obvious BUT my practical experience shows that they are not and that those issues are seriously underappreciated. What do I mean by that?

Recently we have seen a customer, who wanted to store interest rates for some financial models. What did we find in the data? Let me quote: “3 percent” (as a string). Now: How do you expect your AI models to work if the data is so wrong? To do data mining you will need high quality data – not some crap. Which leads to an important observation:

“If you are doing Artificial Intelligence 80% of your effort has to go into data preparation – only 20% are the real model”

In short: You cannot seriously expect a model to learn something about “interest rates” if the raw data does not even ensure that the interest rate is actually a number.

The second remark is about the amount of data: Suppose you got a TB of data. If you want to process or maybe sort your data. What do you think? Who is better at sorting? A database, which has been optimized to sort data for 30 years or some hand written Python program, which you have written over a weekend? From experience I can tell you: You got no chance. The d

[...]
カテゴリー: postgresql

Regina Obe: PostGIS 3.0.0alpha3

2019-07-01(月) 09:00:00

The PostGIS development team is pleased to release PostGIS 3.0.0alpha3.

This release works with PostgreSQL 9.5-12beta2 and GEOS >= 3.6

Best served with PostgreSQL 12beta2.

Continue Reading by clicking title hyperlink ..
カテゴリー: postgresql

Dave Page: Indexing Documents for Full Text Search

2019-07-01(月) 00:28:00
During a conversation with $BOSS a couple of days ago the topic of the new Full Text Search in MongoDB came up. He (knowing of, but presumably having never worked with FTS in PostgreSQL) suggested I might create a tool like the one the MongoDB guys were proudly showing off in their keynote video from their recent conference to make it easy to index and search HTML documents on disk. I looked at him with a slight tilt of the head and remarked that it's already easy to do that, and just takes a few lines of Python code or similar.

So, here I am on my flight back to the UK having written 36 lines of code whilst munching on my breakfast and enjoying my free seat upgrade. I should probably note that those 36 lines of code include import statements, comments, blank lines for readability, and a nicely formatted 7 line SQL query that could easily be just one line. Plus, the vast majority of the code is really just boilerplate to read a directory of files and extract their contents.

Anyway, before I can go into that, it was necessary to first create a schema to hold the data we'd need to query. This is basically just a table to hold whatever information you like about each document, with a tsvector column for searching. For my test I decided to make the path to the file the primary key, and to store the title of the HTML document and the text in the body. The body text isn't really needed, but it does make it a touch easier to use a trigger to automatically maintain the tsvector data - plus of course, I could also then use that column and get rid of the source documents.

Here's what the table looked like:

  CREATE TABLE docs
  (
      path text NOT NULL,
      title text NOT NULL,
      body text NOT NULL,
      tsv tsvector NOT NULL,
      PRIMARY KEY (path)
  );

  CREATE INDEX docs_tsv_idx
      ON docs USING gin (tsv);

I also added the aforementioned trigger to maintain the tsvector. I got slightly fancy with this, indexing both the title and the body fields, and giving the[...]
カテゴリー: postgresql

Vasilis Ventirozos: Accessing PostgreSQL data from AWS Lambda

2019-06-29(土) 04:45:00
All databases need some maintenance that we usually perform using scheduled tasks. For example, if you have an RDS instance and you want to get a bloat report once a month, you’ll probably need a small EC2 instance just to do these kinds of things. In this post, we will talk about accessing RDS, getting a result from a SQL statement and reporting it to a slack channel using a scheduled Lambda function; let’s call this poor man’s crontab :)>
Before we start, psycopg2 isn’t supported by lambda so it needs to be packaged into the lambda deployment package along with any other dependencies, but since psycopg2 requires libpq it needs to be compiled with libpq statically linked. There are many binaries of psycopg2 that you can download, but I would suggest you compile your own, using the latest PostgreSQL source code. We’ll talk about all these steps throughout this post.
We’ll cover the following:
  • Creating a new AWS postgres RDS instance
  • How to compile psycopg (we’ll use docker for that)
  • Code for the function
  • Packaging
  • Lambda function, trigger, schedule
  • Testing

I won’t get into much detail about the steps of building an RDS instance as it is pretty straight forward, assuming you have your aws client setup, you’ll need something similar to the following, if not, go to aws console and clickety-click.
#!/bin/sh
aws rds create-db-instance \
--db-subnet-group-name [your subnet group] \
--db-security-groups [your security group] \
--db-instance-identifier [your instance identifier] \
--db-instance-class db.t2.micro \
--engine postgres \
--allocated-storage 5 \
--no-publicly-accessible \
--db-name [your db name] \
--master-username [your username] \
--master-user-password [your password]\
--backup-retention-period 3
For compiling psycopg2 we need a linux installation, I am on a mac (don’t hate) so I used a docker container (ubuntu bionic) for the compiles. You can find the Dockerfile here

The steps for compiling are the following :
# Download sou[...]
カテゴリー: postgresql

Bruce Momjian: Ibiza: A Different Type of Conference

2019-06-28(金) 22:45:01

Having returned from last week's Ibiza conference, I have a new understanding of the event's goals. I know there was some uncertainty about the event, for several reasons:

  • Having a conference at a resort is a new thing for our community. We started years ago with conferences in universities, and steadily grew to hotel-based conferences in minor and then major cities.
  • Ibiza has a reputation in some countries as a party destination. The wildest thing I saw were yelping dogs being walked along the beach.
  • The beach mention often confused people. This was part of an effort to raise the importance of the hallway track, rather than it being just scheduled holes between technical talks. I didn't realize it was possible during a 90-minute lunch break to eat and swim in the ocean, but I did it!
  • There is historical abuse of resort-based conferences as paid or tax-free vacations. This was certainly not the case for Ibiza, but it is an additional hurdle.

I returned from the conference with a warm feeling for the venue, the people I met, and the things I learned, as did my wife and daughter. While resort conferences are not for everybody, they are popular in other industries, and there is certainly a need for this type of event. The next scheduled "beach" conference is in Bali, and I plan to attend.

カテゴリー: postgresql

ページ