planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 2時間 51分 前

Pavel Golub: Bulding PostgreSQL (x86, x64) and OpenSSL using MSYS2 and MinGW under Windows

2019-07-03(水) 17:00:43

I’ve written already a blog post “Building PostgreSQL with MSYS2 and MinGW under Windows” a year ago. But this time I want to show you two important things:

  1. cross compiling (x86, x64) using the same run-time;
  2. compiling with the latest OpenSSL support.

That’s true nowadays you won’t find binaries or installations of PostgreSQL without OpenSSL support. It used not only for encryption but for compression as well.

Even though it’s Windows, let’s make all steps in command prompt only. Ya, I told you Linux guys, we can run console too!

Setting up the environment

First things first. I launch my msys2 shell. I assume you have it on your system, if not check installation instructions from my previous post.

c:\msys64\ is the msys2 installation folder.

C:\Users\pasha>c:\msys64\msys2_shell.cmd -mingw64 -conemu

I’m telling msys2 to run shell and use -mingw64 as a default chain. That means we will build x64 binaries. -conemu is an optional parameter, you may want to pass it in case you’re a fan of ConEmu as I am. If not, just skip it.

pasha@PG480 MINGW64 ~ $ mkdir /src pasha@PG480 MINGW64 ~ $ cd /src pasha@PG480 MINGW64 /src $ git clone https://github.com/openssl/openssl.git --branch OpenSSL_1_1_1-stable --depth 1 Cloning into 'openssl'... remote: Enumerating objects: 18297, done. remote: Counting objects: 100% (18297/18297), done. remote: Compressing objects: 100% (15182/15182), done. remote: Total 18297 (delta 884), reused 14976 (delta 582), pack-reused 0 Receiving objects: 100% (18297/18297), 13.48 MiB | 1.38 MiB/s, done. Resolving deltas: 100% (884/884), done. Checking out files: 100% (18166/18166), done. pasha@PG480 MINGW64 /src $ git clone https://github.com/postgres/postgres.git --branch REL_11_STABLE --depth 1 Cloning into 'postgres'... remote: Enumerating objects: 5868, done. remote: Counting objects: 100% (5868/5868), done. remote: Compressing objects: 100% (5365/5365), done. remote: Total 5868 (delta 754), reused 1296 (delta 317), pack-reused 0 Receiving objects: 100% (5[...]
カテゴリー: postgresql

Doug Hunley: How the CIS Benchmark for PostgreSQL 11 Works

2019-07-02(火) 19:28:00

Crunchy Data has recently announced an update to the CIS PostgreSQL Benchmark by the Center for Internet Security, a nonprofit organization that provides publications around standards and best practices for securing technologies systems. This newly published CIS PostgreSQL 11 Benchmark joins the existing CIS Benchmarks for PostgreSQL 9.5, 9.6, and 10 while continuing to build upon Crunchy Data's efforts with the PostgreSQL Security Technical Implementation Guide (PostgreSQL STIG).

カテゴリー: postgresql

Louise Grandjonc: PostgreSQL's indexes - GIN algorithms

2019-07-02(火) 02:20:22
Introduction In the previous article, I introduced the internal data structure of a postgres GIN index. Now it’s time to talk about searching and maintaining them. Searching in a GIN index When postgres uses the GIN index, you’ll notice in the EXPLAIN that it’s always doing a bitmap heap scan. It means that the search algorithm returns a map of tids (pointers to rows) in physical memory order. Considering the way the data is stored inside the posting lists and trees, it makes a lot of sense to use this method.
カテゴリー: postgresql

Louise Grandjonc: PostgreSQL's indexes - GIN

2019-07-02(火) 02:20:21
Introduction GIN (Generalized Invertes Index) indexes are often used to index arrays, jsonb, and tsvector (for fulltext search) columns. When it comes to array, they are, for example, used to verify if an array contains an other array or elements (the <@ operator). In the documentation you can see the full list of operators. But the question that I want to answer in this article is: “why should we use GIN indexes for this data types and operators ?
カテゴリー: postgresql

Ibrar Ahmed: Deep Dive Into PostgreSQL Indexes Webinar: Q & A

2019-07-01(月) 22:39:13

I want to thank everybody who attended my session “Deep Dive in PostgreSQL Indexes” in Austin. It was quite a wonderful experience! To cover a bigger audience, I also did a webinar on the same topic. There were many questions in that webinar, but unfortunately, there was not enough time to cover each and every question, so I decided to have a followup Q&A session with a blog post on the topic.

The recording and slides are available here. Below is the list of your questions.

Q1: Is there a concept of the functional index in PostgreSQL?

The functional index is when an index is defined on the result of a function. This is present in PostgreSQL 7. In later releases of PostgreSQL, you can define the index on the result of an expression. The expression can be any PostgreSQL expression. The function is a subset of expression.

Q2: Can I create an index using a table row data that is attached to the index? Does my query have to use index-only-scan?

Yes, if all selected columns are in the index, then index-only-scans will be used.

Q3: It’s a little bit hard to choose the right index type. Is there any offer mechanism in Postgres for index types?

Yes, that’s true that is hard to choose, but it all depends on your data and queries types. I have mentioned recommendations in my slides. There is no mechanism for that in PostgreSQL.

Q4: Is it possible to manipulate which index is used by a query if you have multiple indexes?

No, it’s optimizer’s responsibility to choose between defined the indexes. Believe that the PostgreSQL optimizer will choose the best option for you.

Q5: If there’s a need to index a date column, would it be better to use B-TREE or BRIN index?

A date data type is mostly aligned with physical storage. If you are not making many updates and delete operations, then go for the BRIN. But if you have too many updates and deletes, then B-TREE is best for you.

Q6: For time-based filters what kind of index is preferable?

Time is also aligned with physical storag

[...]
カテゴリー: postgresql

Hans-Juergen Schoenig: PostgreSQL meets “Data Science and AI”

2019-07-01(月) 18:40:51

Machine Learning, Deep Learning, Artificial Intelligence, and Data Science have become really important topics these days. Everybody is looking for new technologies and tries to make use of those recent developments. PostgreSQL can help to achieve those goals, and it offers reliable backend storage for your data.

PostgreSQL vs. CSV files vs. commercial databases

In the past we have seen that many in the Data Science and AI community are still using CSV files to process data. While there is a lot of tooling available (e.g. the Python pandas library) it might not be the best of all decision to build your models on CSV files.

So what is wrong with a CSV file? Well, actually nothing as long as …

  • the format of your file is correct
  • the amount of data is reasonable
  • ALL the data in the file is needed
  • nothing changes

All those points might seem obvious BUT my practical experience shows that they are not and that those issues are seriously underappreciated. What do I mean by that?

Recently we have seen a customer, who wanted to store interest rates for some financial models. What did we find in the data? Let me quote: “3 percent” (as a string). Now: How do you expect your AI models to work if the data is so wrong? To do data mining you will need high quality data – not some crap. Which leads to an important observation:

“If you are doing Artificial Intelligence 80% of your effort has to go into data preparation – only 20% are the real model”

In short: You cannot seriously expect a model to learn something about “interest rates” if the raw data does not even ensure that the interest rate is actually a number.

The second remark is about the amount of data: Suppose you got a TB of data. If you want to process or maybe sort your data. What do you think? Who is better at sorting? A database, which has been optimized to sort data for 30 years or some hand written Python program, which you have written over a weekend? From experience I can tell you: You got no chance. The d

[...]
カテゴリー: postgresql

Regina Obe: PostGIS 3.0.0alpha3

2019-07-01(月) 09:00:00

The PostGIS development team is pleased to release PostGIS 3.0.0alpha3.

This release works with PostgreSQL 9.5-12beta2 and GEOS >= 3.6

Best served with PostgreSQL 12beta2.

Continue Reading by clicking title hyperlink ..
カテゴリー: postgresql

Dave Page: Indexing Documents for Full Text Search

2019-07-01(月) 00:28:00
During a conversation with $BOSS a couple of days ago the topic of the new Full Text Search in MongoDB came up. He (knowing of, but presumably having never worked with FTS in PostgreSQL) suggested I might create a tool like the one the MongoDB guys were proudly showing off in their keynote video from their recent conference to make it easy to index and search HTML documents on disk. I looked at him with a slight tilt of the head and remarked that it's already easy to do that, and just takes a few lines of Python code or similar.

So, here I am on my flight back to the UK having written 36 lines of code whilst munching on my breakfast and enjoying my free seat upgrade. I should probably note that those 36 lines of code include import statements, comments, blank lines for readability, and a nicely formatted 7 line SQL query that could easily be just one line. Plus, the vast majority of the code is really just boilerplate to read a directory of files and extract their contents.

Anyway, before I can go into that, it was necessary to first create a schema to hold the data we'd need to query. This is basically just a table to hold whatever information you like about each document, with a tsvector column for searching. For my test I decided to make the path to the file the primary key, and to store the title of the HTML document and the text in the body. The body text isn't really needed, but it does make it a touch easier to use a trigger to automatically maintain the tsvector data - plus of course, I could also then use that column and get rid of the source documents.

Here's what the table looked like:

  CREATE TABLE docs
  (
      path text NOT NULL,
      title text NOT NULL,
      body text NOT NULL,
      tsv tsvector NOT NULL,
      PRIMARY KEY (path)
  );

  CREATE INDEX docs_tsv_idx
      ON docs USING gin (tsv);

I also added the aforementioned trigger to maintain the tsvector. I got slightly fancy with this, indexing both the title and the body fields, and giving the[...]
カテゴリー: postgresql

Vasilis Ventirozos: Accessing PostgreSQL data from AWS Lambda

2019-06-29(土) 04:45:00
All databases need some maintenance that we usually perform using scheduled tasks. For example, if you have an RDS instance and you want to get a bloat report once a month, you’ll probably need a small EC2 instance just to do these kinds of things. In this post, we will talk about accessing RDS, getting a result from a SQL statement and reporting it to a slack channel using a scheduled Lambda function; let’s call this poor man’s crontab :)>
Before we start, psycopg2 isn’t supported by lambda so it needs to be packaged into the lambda deployment package along with any other dependencies, but since psycopg2 requires libpq it needs to be compiled with libpq statically linked. There are many binaries of psycopg2 that you can download, but I would suggest you compile your own, using the latest PostgreSQL source code. We’ll talk about all these steps throughout this post.
We’ll cover the following:
  • Creating a new AWS postgres RDS instance
  • How to compile psycopg (we’ll use docker for that)
  • Code for the function
  • Packaging
  • Lambda function, trigger, schedule
  • Testing

I won’t get into much detail about the steps of building an RDS instance as it is pretty straight forward, assuming you have your aws client setup, you’ll need something similar to the following, if not, go to aws console and clickety-click.
#!/bin/sh
aws rds create-db-instance \
--db-subnet-group-name [your subnet group] \
--db-security-groups [your security group] \
--db-instance-identifier [your instance identifier] \
--db-instance-class db.t2.micro \
--engine postgres \
--allocated-storage 5 \
--no-publicly-accessible \
--db-name [your db name] \
--master-username [your username] \
--master-user-password [your password]\
--backup-retention-period 3
For compiling psycopg2 we need a linux installation, I am on a mac (don’t hate) so I used a docker container (ubuntu bionic) for the compiles. You can find the Dockerfile here

The steps for compiling are the following :
# Download sou[...]
カテゴリー: postgresql

Bruce Momjian: Ibiza: A Different Type of Conference

2019-06-28(金) 22:45:01

Having returned from last week's Ibiza conference, I have a new understanding of the event's goals. I know there was some uncertainty about the event, for several reasons:

  • Having a conference at a resort is a new thing for our community. We started years ago with conferences in universities, and steadily grew to hotel-based conferences in minor and then major cities.
  • Ibiza has a reputation in some countries as a party destination. The wildest thing I saw were yelping dogs being walked along the beach.
  • The beach mention often confused people. This was part of an effort to raise the importance of the hallway track, rather than it being just scheduled holes between technical talks. I didn't realize it was possible during a 90-minute lunch break to eat and swim in the ocean, but I did it!
  • There is historical abuse of resort-based conferences as paid or tax-free vacations. This was certainly not the case for Ibiza, but it is an additional hurdle.

I returned from the conference with a warm feeling for the venue, the people I met, and the things I learned, as did my wife and daughter. While resort conferences are not for everybody, they are popular in other industries, and there is certainly a need for this type of event. The next scheduled "beach" conference is in Bali, and I plan to attend.

カテゴリー: postgresql

Bruce Momjian: The Democratization of Databases

2019-06-28(金) 02:15:01

Having delivered my new talk, The Democratization of Databases, at PostGres IBiZa and Postgres Vision, I am now posting my slides online. It covers the history of various governing structures and why democracy provides superior results. It has been well received.

カテゴリー: postgresql

Joe Conway: PostgreSQL Deep Dive: PostgreSQL Defaults and Impact on Security - Part 2

2019-06-27(木) 22:17:33

In Part 1 of this blog, we covered quite a bit of information with respect to how a PostgreSQL database is initially configured by default from the perspective of discretionary access control. We also saw how to inspect those default behaviors using the crunchy_check_access extension. In Part 2, we will explore the situation identified in CVE-2018-1058 and discuss how to protect yourself.

CVE-2018-1058: Explained and Exploited

Finally we have come to the Pièce De Résistance!

CVE-2018-1058 describes how a user can create objects, named the same as objects in different schemas which can change the behavior of other users' queries, potentially causing unexpected or malicious behavior. This is also known as a trojan-horse attack. In order to fully understand how this situation may arise, we will first review some fundamental PostgreSQL concepts.

カテゴリー: postgresql

Luca Ferrari: PostgreSQL Administrator Account WITH NOLOGIN (recover your role)

2019-06-27(木) 09:00:00

Today I got an email from a friend of mine that locked out of his own database due to a little mistake.

PostgreSQL Administrator Account WITH NOLOGIN (recover your postgres role)

What if you get locked out your own cluster due to a simple and, to some extent, stupid error? Let’s see it in quick list of steps.
First of all, lock the default postgres account so that the default administrator cannot any more log in the clsuter:

% psql -U postgres -c "ALTER ROLE postgres WITH NOLOGIN" testdb ALTER ROLE % psql -U postgres -c "SELECT version();" testdb psql: FATAL: role "postgres" is not permitted to log in

What a mess!

PostgreSQL has a specific recovery mode, called single user mode, that resemble the operating system single user mode and can be used for such situations. Let’s see how.
First of all, shut down the cluster, avoid more damages of what you have already done!

% sudo service postgresql stop


Now, start the postgres process in single user mode. You need to know the data directory of your cluster in order for it to work:

% sudo -u postgres postgres --single -D /mnt/pg_data/pgdata/11.1 PostgreSQL stand-alone backend 11.3 backend>

What happened? I used the operating system user postgres to launch the operating system process postgres (ok there’s a little name confusion here!) in single (--single) mode for my own data directory (-D). I got a prompt, I’m connected to the backend process directly, so this is not the same as a local or TCP/IP...

カテゴリー: postgresql

Kaarel Moppel: Version 1.6 of pgwatch2 PostgreSQL monitoring tool released

2019-06-26(水) 17:00:20

After almost half a year since last updates I’m glad to announce that another set of useful features and improvements have found their way to pgwatch2 – our Open Source PostgreSQL monitoring tool of choice. New version is incremented to 1.6 but continuing the naming tradition I’m also calling it “Feature Pack 5” as it’s mostly a feature release (with a couple of bugfixes as well). So here a quick overview on changes – for the “full monty” and when updating from a previous version please turn to the Github changelog link below.

Highlights – Prometheus and Patroni support

The choice of data storage “backends” has been widened again (previous version added PostgreSQL) – it’s now possible to leverage the very popular Prometheus TSDB to store metrics gathered by pgwatch2! But a recommendation – I’d use Prometheus only when it’s already running in your organization and would strongly consider the alternatives (Postgres, InfluxDB) if planning to only monitor Postgres and starting a fresh monitoring setup as Prometheus is not able to store string values which is not optimal for Postgres monitoring. Also remember that we’re talking about the “pull” model now – previously metrics were pushed to the datastore. To get quickly started there’s also a suitable “predefined config” provided and an “DB Overview” dashboard similar to Postgres / InfluxDB.

The second big feature add support for our favourite Postgres HA cluster manager – Patroni, and should especially appeal to people running Postgres in a “cloudy” manner. In short the feature means that you can provide connection information to your Distributed Consensus Store (etcd, Zookeeper, Consul) that’s powering Patroni and then pgwatch2 will periodically consult this DCS and fetch the active “to be monitored” instances listing from there. Remember – the standard way is to exactly tell on which IP-s / hostnames your Postgres instances are available. For the time being though to keep it simple access is only possible to DCS-s that are not using passwords (which is the common ca

[...]
カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - More progress reporting

2019-06-26(水) 15:16:35

Since Postgres 11, it is possible to monitor progress activity of running manual VACUUM and even autovacuum using a dedicated system catalog called pg_stat_progress_vacuum. This is vital for operators when doing some long-running operations, because it is possible to not blindly wait for an operation to finish. When doing performance workload analysis, this is also proving to be helpful in evaluating VACUUM job progressing for tuning system-level parameters or even relation-level once depending on the load pattern.

Postgres 12 has added more monitoring in this area thanks for a set of commits. Here is the one for CLUSTER and VACUUM FULL:

commit: 6f97457e0ddd8b421ca5e483439ef0318e6fc89a author: Robert Haas <rhaas@postgresql.org> date: Mon, 25 Mar 2019 10:59:04 -0400 Add progress reporting for CLUSTER and VACUUM FULL. This uses the same progress reporting infrastructure added in commit c16dc1aca5e01e6acaadfcf38f5fc964a381dc62 and extends it to these additional cases. We lack the ability to track the internal progress of sorts and index builds so the information reported is coarse-grained for some parts of the operation, but it still seems like a significant improvement over having nothing at all. Tatsuro Yamada, reviewed by Thomas Munro, Masahiko Sawada, Michael Paquier, Jeff Janes, Alvaro Herrera, Rafia Sabih, and by me. A fair amount of polishing also by me. Discussion: http://postgr.es/m/59A77072.3090401@lab.ntt.co.jp

And here is the second one for CREATE INDEX and REINDEX:

commit: ab0dfc961b6a821f23d9c40c723d11380ce195a6 author: Alvaro Herrera <alvherre@alvh.no-ip.org> date: Tue, 2 Apr 2019 15:18:08 -0300 Report progress of CREATE INDEX operations This uses the progress reporting infrastructure added by c16dc1aca5e0, adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY. There are two pieces to this: one is index-AM-agnostic, and the other is AM-specific. The latter is fairly elaborate for btrees, including reportage for parallel index builds and the separate phases that btree index c[...]
カテゴリー: postgresql

Richard Yen: The Challenges of Setting max_connections and Why You Should Use a Connection Pooler

2019-06-26(水) 06:01:09
Introduction

PostgreSQL is “The World’s Most Advanced Open Source Database,” and I believe it. In my 10+ years of working with it, it’s been solid, serving up data to SaaS loads of over 1000 queries per second, rarely going down, surviving all manner of accusations of corruption (which turned out to be software engineering error) and performance degradation (which turned out to be user error). It’s got so many features and extensions that suit every need, maybe 50-60% of which most users don’t use on a regular basis, or haven’t even heard about. Unfortunately, in my recent experiences in technical support, I’ve learned that Postgres is generally very trusting, but that’s the case with many pieces of technology; it makes no judgments of how you should tune the parameters in postgresql.conf, just like the gas pedal on your sports car doesn’t make any judgments about how fast you should be driving. Just because you can put the pedal to the metal doesn’t mean you should do that in rush hour traffic, and just because you can set a parameter very high doesn’t necessarily mean that you should set it that high to serve a high-traffic OLTP application.

One of the parameters that gets the most misunderstanding is max_connections. It’s understandable that on modern systems with lots of CPUs and lots of RAM, serving modern SaaS loads to a global user base, one can see tens of thousands of user sessions at a time, each trying to query the database to update a user’s status, upload a selfie, or whatever else users might do. Naturally, a DBA would want to set max_connections in postgresql.conf to a value that would match the traffic pattern the application would send to the database, but that comes at a cost. One example of such a cost would be connection/disconnection latency; for every connection that is created, the OS needs to allocate memory to the process that is opening the network socket, and PostgreSQL needs to do its own under-the-hood computations to establish that connection. Scale that up to thousands of user sessi

[...]
カテゴリー: postgresql

Joe Conway: PostgreSQL Deep Dive: PostgreSQL Defaults and Impact on Security - Part 1

2019-06-26(水) 03:25:05

Recently I gave a "deep dive" talk on the topic of PostgreSQL security, and I wanted to capture one part of that content into a blog since this format is both better for making that content stand on its own and for expanding on it a bit.

Specifically, in this two-part blog, we will cover a PostgreSQL extension that I wrote called crunchy_check_access -- the motivation behind it and what it does -- and then use that extension to probe and understand the consequences of the default, out of the box, security of PostgreSQL. We will also examine the implications of choices we make when setting up "users" and "groups". To cap it off, we will explore the situation identified in CVE-2018-1058 and discuss how to protect yourself.

Warning: this two-part blog is a wall of text (and SQL). It would probably be more suited as a chapter in a long book. I encourage you to stick with it though, as I believe even those with a great deal of PostgreSQL experience will find the content very enlightening. At a minimum it will probably remind you of things you once learned but have since forgotten.

カテゴリー: postgresql

Mark Wong: PDXPUG July Meetup at OSCON: New things coming to Postgres 12

2019-06-25(火) 08:31:25

Note location for the July meeting is at OSCON!

When: 7-9pm Wednesday July 17, 2019
Where: Oregon Convention Center Room D135/136

New versions of PostgreSQL usually get released every fall with lots of improvements and new features. Come hear about some of the highlighted changes expected to come in the next new release!

This special edition meetup also welcomes any OSCON attendee.  Also come and meet the local Portland PostgreSQL user group to ask any PostgreSQL related questions!
カテゴリー: postgresql

Jobin Augustine: Managing Multiple PostgreSQL Instances on Ubuntu/Debian

2019-06-24(月) 23:46:14
Introduction

Those DBAs who are experts in one database system look for other database systems to have “similar features”. It is a human tendency to look at any new technology and compare it with a world they are familiar with. Most of the time, I keep addressing Oracle DBAs who are looking for similar features or setup in PostgreSQL, but this time it is for MySQL DBA. MySQL historically ships mysqld_multi to manage multiple instances of MySQL on a server. Even though this is not a core feature in MySQL, but rather a wrapper, I understand that it is something widely used and my college Fernando blogged about it.

In a discussion a few months back, one of my friends asked how PostgreSQL manages multiple instances and I agreed to write on that as it will be useful for anyone who looks for a similar setup. In short, the answer is that the PostgreSQL community manages it using wrapper scripts and the Linux service manager. During this time, systemd started becoming a standard among Linux distros and every Linux package started relying on it for service management. To my surprise, some of the MySQL packages started removing mysqld_multi from the bundle and started moving towards systemd as the way to manage multiple instances. So everything is getting converged to systemd in the new world. In this blog, I want to show how this can be done on Ubuntu/Debian Linux because all necessary wrapper scripts already exist on those distros as part of postgres-common package which gets installed by default with every PostgreSQL installation. Nothing prevents us from doing a similar wrapper on other Linux distros also.

$ dpkg -l | grep postgresql-common ii postgresql-common 201.pgdg18.10+1 all PostgreSQL database-cluster manager Default Installation

The default installation of PostgreSQL from apt repo of postgresql.org will contain postgresql-common backage for Debian/Ubuntu.

$ sudo apt install postgresql-11 Reading package lists... Done ... The following additional packages will be installed: libllvm7 libpq5 libsensors4 pgdg-ke[...]
カテゴリー: postgresql

ページ