postgresql

elein mustain: Having Group By Clauses elein’s GeneralBits

planet postgresql - 2019-03-18(月) 11:33:59

Some people go to great lengths to avoid GROUP BY and HAVING clauses in their queries. The error messages are fussy but they are usually right. GROUP BY and HAVING key words are essential for good SQL reporting.

The primary reason for GROUP BY is to reduce the number of rows, usually by aggregation. It produces only one row for each matching grouping from the input. This allows you to make sophisticated calculations via ordinary SQL.

Fruit Example:

We have some fruit:

item | source | amt | fresh_until ---------+---------+-----+------------ bananas | Chile | 50 | 2019-05-01 bananas | Bolivia | 25 | 2019-04-15 bananas | Chile | 150 | 2019-07-10 apples | USA-WA | 75 | 2019-07-01 apples | USA-CA | 75 | 2019-08-15 apples | Canada | 80 | 2019-08-01 grapes | USA-CA | 120 | 2019-07-15 (7 rows)

This next case allows us to look forward.  Mid-year, what fruits will be available? We do this with the same query as above, however, after the query runs we check the values of min(fresh_until) by using a having clause. HAVING is how you qualify an aggregate.

select item, count(source) as srcs, sum(amt) as tot_amt, min(fresh_until) as min_fresh_until from fruit group by item having min(fresh_until) > '30-jun-2019'; item | srcs | tot_amt | min_fresh_until --------+------+---------+---------------- grapes | 1 | 120 | 2019-07-15 apples | 3 | 230 | 2019-07-01 (2 rows)

All of the apples and grapes will be available mid-year.

A target list may contain non-aggregates and aggregates. Those non-aggregate columns in the target list
should be in the group by clause. The error message says so. The order of the columns in the group by clause matters. It determines how the aggregates are grouped. The order is often hierarchical. What that means to your columns is your focus. It could be fruit, or sources and/or fresh_until date.

Playing Cards Examples

Let’s look at another set of examples that illustrate extracting information on playing cards. You can learn about cards on Wikipedia St

[...]
カテゴリー: postgresql

Venkata Nagothi: How to Replicate PostgreSQL Data to Remote Sites

planet postgresql - 2019-03-15(金) 21:08:35

In a busy database environment with larger size databases, the need for real-time data replication is a common occurrence. Applications often need the production data to be replicated in real-time to remote sites for analytics and other critical business operations needs.

DBAs also need to ensure that the data is replicated continuously to the remote sites to meet various requirements. These requirements, though, may not always be to replicate the whole database; there can also be a need to replicate only a subset of the data (like a Table or set of Tables or data from multiple tables using an SQL for analytics, reporting etc.)

In this blog, we will focus on how to replicate tables to remote databases in real-time.

What is Table-Level Replication?

Table-level replication is the mechanism of replicating the data of a specific table or set of tables from one database (source) to another database (target) hosted remotely in a distributed environment. Table level replication ensures table data is distributed continuously and remains consistent across replicated (target) sites.

Why Use Table-Level Replication?

Table level replication is an essential need in larger, complex, highly distributed environments. In my experience, there was always a need to replicate a set of tables from a production database to a data warehousing for reporting purposes. The data has to be replicated continuously to ensure reports are getting the latest data. In critical environments, staleness of the data cannot be tolerated, so, the data changes happening on production must be replicated immediately to the target site. This can be a real challenge for DBA’s having to forecast various factors to ensure an efficient and smooth table replication.

Let us look at some requirements that table-level replication solves:

  • The reports can run on a database in an environment other than production, like data warehousing
  • A distributed database environment with distributed applications extracting data from multiple sites. In case of distributed web or
[...]
カテゴリー: postgresql

Baron Schwartz: New Book: DevOps for the Database

planet postgresql - 2019-03-15(金) 04:17:36

I’ve written a new 65-page ebook, titled DevOps for the Database. Drawn from what I’ve observed over the last 15 or so years in my own experience as well as that of many others, it tries to answer why and how some organizations deliver value to customers fast, run data-intensive applications with high quality, and have great engineering cultures—and why some don’t.

カテゴリー: postgresql

Craig Kerstiens: Fun with SQL: Text and system functions

planet postgresql - 2019-03-14(木) 01:26:00

SQL by itself is great and powerful, and Postgres supports a broad array of more modern SQL including things like window functions and common table expressions. But rarely do I write a query where I don’t want to tweak or format the data I’m getting back out of the database. Thankfully Postgres has a rich array of functions to help with converting or formatting data. These built-in functions save me from having to do the logic elsewhere or write my own functions, in other words I have to do less work because Postgres has already done it for me which I’m always happy about.

We’ve covered a set of functions earlier, today we’re going to look at some different categories of functions to dive deeper.

Manipulating strings

When building a nice cleamn report I’m often doing some cleaning up of the data. This could be as simple as combining my first_name and last_name column. In that case a simple concatenation is quite useful:

SELECT first_name || last_name as name FROM users;

But as you can expect I can do much more. If I’m perfoming some basic search for a keyword or delimiter I can search for the position of that string. Then I can feed that into a case statement:

SELECT CASE WHEN position(',' in my_string) > 0 then True ELSE False END FROM my_table;

Of course if you’re working with some field that is delimited you may want to take the above to first find the correct field. From there you likely would want to split that delimited field up. For that we can use split_part and grab the appropriate section of our delimited field:

SELECT CASE WHEN position(',' in my_string) > 0 then split_part(my_string, ',', 1) ELSE null END FROM my_table;

Finally, if you’re dealing with messy input/machine data there are a number of functions that can be handing for cleaning up that data:

  • substr - allows you to extract a substring
  • rtrim - removes the longest part of a matching string from the right side
  • reverse - for the next time you’re asked to reverse a string in an interview :)
  • regexp_replace - for
[...]
カテゴリー: postgresql

Michael Banck: Integrating Patroni into Debian

planet postgresql - 2019-03-13(水) 21:16:14
Integrating Patroni into Debian Patroni is a PostgreSQL high availability solution with a focus on containers and Kubernetes. Until recently, the available Debian packages had to be configured manually and did not integrate well with the rest of the distribution. For the upcoming Debian 10 "Buster" release, the Patroni packages... 13-03 Michael Banck
カテゴリー: postgresql

Hans-Juergen Schoenig: unaccent: Getting rid of umlauts, accents and special characters

planet postgresql - 2019-03-13(水) 18:20:07

Database people dealing with natural languages are all painfully aware of the fact that encodings, special characters, accents and alike are usually hard to deal with. This is especially true if you want to implement search in a user friendly way.

Consider the following example in PostgreSQL: My name contains a couple of of super inconvenient special characters, which will cause issues for people around the globe. The correct spelling of my surname is “Schönig”, which is pretty hard to type on most keyboards I have seen around the world. And: Who cares about my special characters anyway? People might just want to type in “Schonig” into some search form and find information about me (ideally related to PostgreSQL and database work).

Here is the problem:

test=# SELECT 'Schönig' = 'Schonig'; ?column? ---------- f (1 row) test=# SELECT 'Schönig' = 'Schoenig'; ?column? ---------- f (1 row)

The “=” operator compares those two strings and comes to the conclusion that those two strings are not identical. Therefore the correct answer is “false”. While that is true from a technical point of view it might be a real issue and end users might be unsatisfied with the result. Problems like that can make daily life pretty hard. A solution to the problem is therefore needed.

unaccent: PostgreSQL provides a useful extension

If you want to improve your user experience you can turn to the “unaccent” extension, which is shipped as part of the PostgreSQL contrib package. Installing “unaccent” is really easy:

test=# CREATE EXTENSION unaccent; CREATE EXTENSION

In the next step you can call the “unaccent” function to clean a string and turn it into something more useful. This is what happens when we use the “unaccent” function on my name and the name of my PostgreSQL support company:

test=# SELECT unaccent('Hans-Jürgen Schönig, Gröhrmühlgasse 26, Wiener Neustadt'); unaccent --------------------------------------------------------- Hans-Jurgen Schonig, Grohrmuhlgasse 26, Wiener[...]
カテゴリー: postgresql

Kristi Anderson: Managing High Availability in PostgreSQL – Part II

planet postgresql - 2019-03-13(水) 00:10:46

Are you deploying PostgreSQL in the cloud and want to understand your options for achieving high availability? In our previous blog post, Managing High Availability in PostgreSQL – Part I, we discussed the capabilities and functioning of PostgreSQL Automatic Failover (PAF) by ClusterLabs. In Part II, we’re introducing you to an alternative open source tool, Replication Manager from 2ndQuadrant, to be closely followed by Part III where we dive into our third alternative, Patroni by Zalando.

Replication Manager (repmgr)

repmgr is an open-source tool suite developed by 2ndQuadrant for managing replication and failover of your PostgreSQL clusters. It provides the tools to setup, configure, manage, and monitor replication of PostgreSQL, and also enables you to perform manual switchover and failover tasks using repmgr utility. This free tool supports and enhances PostgreSQL’s built-in streaming replication.

Replication Manager provides two main tools to manage replication and failover of PostgreSQL.

repmgr
  • A command-line interface utility which enables you to perform various administrative tasks.
  • repmgr enables you to setup standby servers, promote standbys, do a switchover, and monitor the status of your PostgreSQL cluster.
  • It also provides dry run option for almost all of the administrative commands.
repmgrd

This is the daemon which:

  • Actively monitors the PostgreSQL clusters and performs necessary actions based on the state of the cluster.
  • Performs automatic failover in case the primary node goes down by promoting the most eligible standby as the new primary.
  • Provides an option to monitor and store the data related to replication performance.
  • Provides notification by invoking the user scripts for registered events.
How it Works

repmrg not only manages the replication of PostgreSQL clusters, but also has capabilities for setting up the standby servers for replication. Following the initial installation, we need to make changes to the repmgr configuration file (repmgr.conf) with the required details on each server.

[...]
カテゴリー: postgresql

Regina Obe: PostGIS 2.5.2, 2.4.7, 2.3.9 Released

planet postgresql - 2019-03-11(月) 09:00:00

The PostGIS development team is pleased to provide bug fix 2.5.2, 2.4.7, and 2.3.9 for the 2.5, 2.4, and 2.3 stable branches.

These are the first versions to be able to compile against Proj 6.0.0, You must upgrade to these if you are using Proj 6.

2.5.2 This release supports PostgreSQL 9.3-11 (will compile against PostgreSQL 12, but not pass tests. Use only for pg_upgrade. You are encouraged to use the PostGIS 3.0 unreleased branch with PostgreSQL 12 , which has features specifically designed to take advantage of features new in PostgreSQL 12).

2.4.7 This release supports PostgreSQL 9.3-10.

2.3.9

This release supports PostgreSQL 9.2-10.

View all closed tickets for 2.5.2, 2.4.7, 2.3.9.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE; ALTER EXTENSION postgis_topology UPDATE; ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

カテゴリー: postgresql

Bruce Momjian: The High Value of Data

planet postgresql - 2019-03-09(土) 01:30:01

There was a time when every piece of software had to be purchased: operating systems, compilers, middleware, text editors. Those days are mostly gone, though there are a few holdouts (e.g., MS Windows, vertical applications). What happened is that open source software has come to dominate most uses, and software selection is rarely based on cost requirements.

One of the final holdouts for purchased software is databases. You might think that is because database software is complex, but so is the other software mentioned. The big difference is that while non-database software processes or stores user data in a simple or standard way, databases lock user data inside the database. This data locking is a requirement for fast, reliable, and concurrent data access, but it does place the database on a different monetary plane.

In any organization, it is really their data that is valuable, and because the database is so tightly coupled to that valuable data, database software becomes something that is worth significant investment. This explains why databases have resisted the open source commoditization that has happened to so much other purchased software. (Custom database applications tied to complex business logic has also slowed the migration.)

Continue Reading »

カテゴリー: postgresql

Michael Paquier: Postgres 12 highlight - Connection slots and WAL senders

planet postgresql - 2019-03-08(金) 16:28:51

The maximum number of connections a PostgreSQL can accept is configured using max_connections. When attempting to connect to a server already at full capacity, logically the server complains:

$ psql psql: FATAL: sorry, too many clients already

It is possible to define connection policies, for example at database level with CREATE DATABASE or ALTER DATABASE, and even have superuser-only connection slots using superuser_reserved_connections, so as a superuser has a reserved space to be able to perform some activities even with a server full.

When creating a connection for replication purposes, the connection is spawned under a special status with the context of a WAL sender which is in charge of the communication, and speaks the replication protocol, so as it is possible to do replication, to take base backups, etc. A lot of those tasks are important for availability. One problem however is that this connection uses a shared memory slot which is part of max_connections. Hence, it is possible to get into a case where an application bloats the connections, and it becomes impossible to connect with a replication connection. This can be rather bad for availability, because this could the creation of a new standby after a failover for example.

One way to counter that is to connect to the server for base backups and standbys with a superuser role. Still this is not completely right either as by design there can be replication roles, which allow a role to connect to a server in replication mode, without being a superuser. In this context, this is where the following commit of Postgres 12 becomes handy:

commit: ea92368cd1da1e290f9ab8efb7f60cb7598fc310 author: Michael Paquier <michael@paquier.xyz> date: Tue, 12 Feb 2019 10:07:56 +0900 Move max_wal_senders out of max_connections for connection slot handling Since its introduction, max_wal_senders is counted as part of max_connections when it comes to define how many connection slots can be used for replication connections with a WAL sender context. This can lead t[...]
カテゴリー: postgresql

Viorel Tabara: Benchmarking Managed PostgreSQL Cloud Solutions - Part One: Amazon Aurora

planet postgresql - 2019-03-08(金) 07:37:18

This blog starts a multi-series documenting my journey on benchmarking PostgreSQL in the cloud.

The first part includes an overview of benchmarking tools, and kickstarts the fun with Amazon Aurora PostgreSQL.

Selecting the PostgreSQL Cloud Services Providers

A while ago I came across the AWS benchmark procedure for Aurora, and thought it would be really cool if I could take that test and run it on other cloud hosting providers. To Amazon’s credit, out of the three most known utility computing providers — AWS, Google, and Microsoft — AWS is the only major contributor to PostgreSQL development, and the first to offer managed PostgreSQL service (dating back in November 2013).

While managed PostgreSQL services are also available from a plethora of PostgreSQL Hosting Providers, I wanted to focus on the said three cloud computing providers since their environments are where many organizations looking for the advantages of cloud computing choose to run their applications, provided that they have the required know-how on managing PostgreSQL. I am a firm believer that in today’s IT landscape, organizations working with critical workloads in the cloud would greatly benefit from the services of a specialized PostgreSQL service provider, that can help them navigate the complex world of GUCS and myriads of SlideShare presentations.

Selecting the Right Benchmark Tool

Benchmarking PostgreSQL comes up quite often on performance mailing list, and as stressed countless of times the tests are not intended to validate a configuration for a real life application. However, selecting the right benchmark tool and parameters are important in order to gather meaningful results. I would expect every cloud provider to provide procedures for benchmarking their services, especially when the first cloud experience may not start on the right foot. The good news is that two of the three players in this test, have included benchmarks in their documentation. The AWS Benchmark Procedure for Aurora guide is easy to find, available right on

[...]
カテゴリー: postgresql

Bruce Momjian: Tool Vendor/Support Options

planet postgresql - 2019-03-08(金) 00:15:02

Having explained that lock-in is not a binary option, what are the Postgres tool support options available, at a high level?

  • Develop in-house database tools and support them yourself
  • Use open source tools and support them yourself
  • Use open source tools with vendor support (hopefully the vendor supports your chosen tools)
  • Use closed-source tools with vendor support

Of course, you can mix and match these options, i.e., use a support vendor for the open source tools they support, use other open source tools they don't support, and use some tools you develop in-house, e.g.:

  • open source Postgres database (vendor support)
  • pgBackRest for backup (vendor support)
  • patroni for failover (community support channels)
  • In-house developed tools (self support)

I went over these options in more detail in this presentation. This diversity of options is rarely available for closed-source, single-vendor database solutions.

カテゴリー: postgresql

Laurenz Albe: “LOCK TABLE” can harm your database’s health

planet postgresql - 2019-03-07(木) 18:00:17
© Laurenz Albe 2019

 

Many people know that explicit table locks with LOCK TABLE are bad style and usually a consequence of bad design. The main reason is that they hamper concurrency and hence performance.

Through a recent support case I learned that there are even worse effects of explicit table locks.

Table locks

Before an SQL statement uses a table, it takes the appropriate table lock. This prevents concurrent use that would conflict with its operation. For example, reading from a table will take a ACCESS SHARE lock which will conflict with the ACCESS EXCLUSIVE lock that TRUNCATE needs.

You can find a description of the individual lock levels in the documentation. There is also the matrix that shows which lock levels conflict with each other.

You don’t have to perform these table locks explicitly, PostgreSQL does it for you automatically.

Explicit table locks with the LOCK TABLE statement

You can also explicitly request locks on a table with the LOCK statement:

LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

There are some cases where it is useful and indicated to use such an explicit table lock. One example is a bulk update of a table, where you want to avoid deadlocks with other transactions that modify the table at the same time. In that case you would use a SHARE lock on the table that prevents concurrent data modifications:

LOCK atable IN SHARE MODE; Typical mistakes with LOCK TABLE

Unfortunately most people don’t think hard enough and just use “LOCK atable” without thinking that the default lock mode is ACCESS EXCLUSIVE, which blocks all concurrent access to the table, even read access. This harms performance more than necessary.

But most of the time, tables are locked because developers don’t know that there are less restrictive ways to achieve what they want:

  • You don’t want concurrent transactions to modify a row between the time you read it and the time you update it? Use SELECT ... FOR UPDATE!
    If concurrent modifications are unlikely and you are not sure that you ar
[...]
カテゴリー: postgresql

ページ