postgresql

Bruce Momjian: Limiting Superuser Activity

planet postgresql - 2019-01-31(木) 02:15:01

This interesting email thread explores the question of how much you can prevent or detect unauthorized database superuser activity. The main conclusions from the thread are:

  • It is impossible to restrict database administrator access without hindering their ability to perform their jobs
  • Monitoring superuser activity is the most reasonable way to detect and hopefully discourage unauthorized activity
  • Monitoring includes:
    • Assign a separate account to each administrator for auditing purposes; do not use generic/shared accounts
    • Use an auditing tool to record database activity, e.g., pgAudit
    • Use syslog to send database logs to a computer not under database administrators' control
    • Record all shell command activity in a similar way

There is also a helpful summary email.

カテゴリー: postgresql

Liaqat Andrabi: Webinar: Banking on Postgres – Financial Application Considerations [Follow up]

planet postgresql - 2019-01-31(木) 00:34:37

The demand for PostgreSQL within the financial industry has been rapidly increasing in the recent years; mainly due to reduction in licensing costs, better choice of open source tools, and the robust enterprise features that PostgreSQL provides.

2ndQuadrant hosted the “Banking on Postgres” webinar to discuss attributes of financial databases based on Postgres, configuration processes, hardware needs, availability, backups, and more.

The webinar was presented by Shaun Thomas, Principal Consultant at 2ndQuadrant. Those who weren’t able to attend the live event can now view the recording here.

For any questions or comments regarding Postgres-BDR, please send an email to info@2ndQuadrant.com.

カテゴリー: postgresql

Pavel Stehule: plpgsql_check - new report for code coverage ratio calculation

planet postgresql - 2019-01-30(水) 18:11:00
Few months ago I integrated a profiler into plpgsql_check.

The result of prifiling is line oriented:

postgres=# select lineno, avg_time, source from plpgsql_profiler_function_tb('fx(int)');
┌────────┬──────────┬───────────────────────────────────────────────────────────────────┐
│ lineno │ avg_time │ source │
╞════════╪══════════╪═══════════════════════════════════════════════════════════════════╡
│ 1 │ │ │
│ 2 │ │ declare result int = 0; │
│ 3 │ 0.075 │ begin │
│ 4 │ 0.202 │ for i in 1..$1 loop │
│ 5 │ 0.005 │ select result + i into result; select result + i into result; │
│ 6 │ │ end loop; │
│ 7 │ 0 │ return result; │
│ 8 │ │ end; │
└────────┴──────────┴───────────────────────────────────────────────────────────────────┘
(9 rows)

This format is well readable, but it is not practical for calculation of code coverage metrics. So this week I wrote new function, that produce
report based on commands:

CREATE OR REPLACE FUNCTION public.fx1(a integer)
RETURNS integer
LANGUAGE plpgsql
1 AS $function$
2 begin
3 if a > 10 then
4 raise notice 'ahoj';
5 return -1;
6 else
7 raise notice 'nazdar';
8 return 1;
9 end if;
10 end;
11 $function$

postgres=# select stmtid, parent_stmtid, parent_note, lineno, exec_stmts, stmtname
from plpgsql_profiler_function_statements_tb('fx1');
┌────────┬───────────────┬─────────────┬────────┬────────────┬─────────────────┐
│ stmtid │ parent_stmtid │ parent_note │ lineno │ exec[...]
カテゴリー: postgresql

Dave Cramer: A Guide to Building an Active-Active PostgreSQL Cluster

planet postgresql - 2019-01-29(火) 23:39:33

One of the toughest challenges facing database engineers today is ensuring their data is always accessible so they can meet the high-availability  requirements for their applications.

While this problem may seem trivial in the world of applications where one can have many instances behind geographically distributed load balancers, in the database world where there is only one version of the data globally, the problem faces many obstacles.

PostgreSQL replication has advanced considerably in recent major releases, including continuous improvements to streaming replication and the addition of logical replication in PostgreSQL 10. While these capabilities represent important enhancements to PostgreSQL, enabling users to address a wide variety of clustered architectures, they do not (easily) address the use cases where the application requires access to an updatable database in two or more geographic regions - often referred to as an "active-active" cluster.

More specifically, an active-active cluster is one where the application can write to any instance in the cluster and the data will be written to all of the instances in the cluster, enabling each instance in the cluster to be used to:


  • Provide near zero downtime as the new instance is already in a read/write state; there is no need to reconfigure it.
  • Provide near zero downtime upgrades from one version to another
  • Improve latency for users in geographically distributed clusters. By providing an instance physically closer to the user latency is reduced.

While there are a number of proprietary solutions that attempt to address active-active PostgreSQL requirements, this post and a series to follow provides users with potential reference architectures and configurations that enable active-active PostgreSQL configurations using entirely open source software.

This post is of course only one approach to deploying an active-active PostgreSQL cluster.  There are other ways to deploy an active-active setup with PostgreSQL.  I will cover some ways to do this in the f

[...]
カテゴリー: postgresql

Kaarel Moppel: Major feature update for the pgwatch2 Postgres monitoring tool

planet postgresql - 2019-01-29(火) 18:13:23

Again I could make good (hopefully) use of the “slow time” around the turn of the year…and managed to push out another set of features for our Open Source PostgreSQL monitoring tool called pgwatch2 – so a quick overview on changes in this post. Continuing the tradition I’m calling it “Feature Pack 4” as it’s mostly about new features. Git and Docker images carry version number 1.5.0. As our last pgwatch2-related blogpost covered only 1.4.0, I’ll include here also most important stuff from 1.4.5 minor feature release.

Highlight – Monitoring Postgres with Postgres

This is the biggest one this time – finally and quite approprietly for “Postgres-minded” people, there’s now a chance to store all the gathered metrics in Postgres! This of course doesn’t necessarily mean that Postgres is best for storing Time-Series Data although it performs very nicely thanks to JSONB…but in general it’s a good compromise – more disk space (~3-4x) at comparable query times to InfluxDB…but with full power of SQL! Meaning some saved time learning a new (and quite limited) query language. And after all, only a few people are running dozens and dozens of databases so performance is mostly not and issue. And on the plus side we can now ask questions that were previously plainly not possible (no joins, remember) or were only possible by storing some extra columns of data (de-normalizing).

The new functionality is designed for the latest Postgres version of 11, but as people run all kinds of different versions and might not want to set up a new cluster, there is also a legacy mode, that will cost more IO though. In total there are 4 different “schema modes” so that people could optimize their IO based on needs:

  • a separate normal table for each distinct metric (legacy mode)
  • a separate partitioned table for each distinct metric + weekly partitions
  • a separate partitioned table for each distinct metric + separate sub-table for each distinct monitored host + monthly partitions. Best for monitoring 50+ DB-s
  • custom mode – all data inserted into a si
[...]
カテゴリー: postgresql

Bruce Momjian: Postgres Encryption Maze

planet postgresql - 2019-01-28(月) 23:30:01

This wide-ranging email thread covers many of the challenges of adding encryption to Postgres. There is discussion of:

  • The need to understand the threats you are protecting against, "For anyone to offer a proper solution, you need to say what purpose your encryption will serve."
  • The need for layers of protection
  • The questionable usefulness of storage encryption, "Thus, unless you move your DB server on a regular basis, I can't see the usefulness of whole database encryption (WDE) on a static machine."
  • The value of encrypting network storage, "Having the 'disk' associated with a specific server encrypted can provide some level of protection from another machine which also has access to the underlying infrastructure from being able to access that data."
  • Credit Card industry requirements, "Non-Pubic Information (NPI) data should not be logged nor stored on a physical device in non-encrypted mode."
  • The limits of per-column encryption, "It is extremely unlikely you just want all the data in the database encrypted." (These five emails from another thread, 1, 2, 3, 4, 5, also discuss this topic.)
  • The many other database products that support built-in column-level encryption

As you can see, the discussion was all over the map. The Postgres project probably needs to do a better job communicating about these options and their benefits.

カテゴリー: postgresql

Sebastian Insausti: PostgreSQL High Availability with Master-Slave & Master-Master Architectures

planet postgresql - 2019-01-28(月) 16:23:15

Below is an excerpt from our whitepaper “PostgreSQL Management and Automation with ClusterControl” which can be downloaded for free.

Database servers can work together to allow a second server to take over quickly if the primary server fails (high availability), or to allow several computers to serve the same data (load balancing).

For HA configuration we can have several architectures, but the basic ones would be master-slave and master-master architectures.

PostgreSQL Master-Slave Architectures Related resources  Read the Whitepaper  ClusterControl for PostgreSQL  Become a PostgreSQL DBA Blog Series

These architectures enable us to maintain an master database with one or more standby servers ready to take over operations if the primary server fails. These standby databases will remain synchronized (or almost synchronized) with the master.

The replication between the master and the slaves can be made via SQL statements (logical standbys) or via the internal data structure modifications (physical standbys). PostgreSQL uses a stream of write-ahead log (WAL) records to keep the standby databases synchronized. If the main server fails, the standby contains almost all of the data of the main server, and can be quickly made the new master database server. This can be synchronous or asynchronous and can only be done for the entire database Server.

Setting up streaming replication is a task that requires some steps to be followed thoroughly. For those steps and some more background on this subject, please see: Become a PostgreSQL DBA - How to Setup Streaming Replication for High Availability.

From version 10, PostgreSQL includes the option to setup logical replication.

Logical replication allows a database server to send a stream of data modifications to another server. PostgreSQL logical replication constructs a stream of logical data modifications from the WAL. Logical replication allows the data changes from individual tables to be replicated. It doesn’t require a particular server to be designated as a ma

[...]
カテゴリー: postgresql

Adrien Nayrat: pg_sampletolog: An extension to log a sample of statements

planet postgresql - 2019-01-28(月) 15:00:00
This article will introduce you to an extension that I developed in order to log a sample of statements. When a DBA is faced with a performance problem, he will inspect logs, but also the pg_stat_stat_statements view. An expensive query will appear in pg_stat_stat_statements and in the logs if the query exceeds log_min_duration_statement. Thus, the DBA can replay the query, and obtains its execution plan to investigate. To go even further, it is possible to enable the auto_explain extension.
カテゴリー: postgresql

Dan Langille: Fixing a sequence value

planet postgresql - 2019-01-28(月) 03:58:59
Yesterday I copied data from the old production server to the new production server. One thing I missed, but did think about at the time, was updating the sequence used by the table in question. Looking at the table definition: The report_log_id_seq value will be wrong. When the reports run, they will use values for [...]
カテゴリー: postgresql

Dan Langille: Moving the email notifications to the new server

planet postgresql - 2019-01-27(日) 05:43:54
Ever since the new server went into production, sometime in 2017, the notification emails, which tell you what has changed, have been coming from the old server. I never did that changeover. The old server was still processing incoming commits, so the notifications were accurate. However, new accounts would not get notifications and changes to [...]
カテゴリー: postgresql

Sebastian Insausti: Top GUI Tools for PostgreSQL

planet postgresql - 2019-01-26(土) 04:49:42

Managing databases from the command line does come with a learning curve to get the most out of it.

The command line can sometimes be arduous and the display may not be optimal for what you are doing.

Browsing through databases and tables, checking indexes or user privileges, monitoring, managing, and even coding can get really messy when trying to handle it through the console.

It’s not that you don't need to manage the command line commands (it's for sure a must), but there are some tools that can help you speed up many of the daily DBA tasks.

Let's look at what these tools are about and review some of them.

What is a GUI Tool?

A GUI or Graphical User Interface is a software that simplifies the tasks of the users through graphical icons and visual indicators. The actions are performed by using graphical elements.

Why Should I Use a GUI Tool?

Using a GUI is not a must, but it can be useful. One of the main advantages of the GUIs is that they are, in general, easier to learn than a lot of commands and probably one action on the GUI could generate a few commands to perform the task.

Another advantage could be that the GUI is more friendly than the command line, and in most cases, you don't need any programming or sysadmin knowledge to use it.

But, you should be careful before performing a task from the GUI, because by using the wrong button, you could generate a big issue like deleting a table; and for this reason, do be careful when using this kind of tool.

Top GUI Tools for PostgreSQL

Now, let's see some of the most commons GUI tools for PostgreSQL.

Note that, for the installation examples, we'll test it on Ubuntu 18.04 Bionic.

pgAdmin

pgAdmin is one of the most popular Open Source administration and development platforms for PostgreSQL.

It's designed to meet the needs of both novice and experienced PostgreSQL users alike, providing a powerful graphical interface that simplifies the creation, maintenance and use of database objects.

It's supported on Linux, Mac OS X, and Windows. It supports all PostgreSQL

[...]
カテゴリー: postgresql

Shaun M. Thomas: PG Phriday: Terrific Throughput Tracking

planet postgresql - 2019-01-26(土) 02:00:24

Postgres has a lot of built-in information functions that most people don’t know about. Some of these are critical components to identifying replication lag, but what if we could use them for something else, like throughput?

This man’s one simple trick can track actual database throughput; DBAs hate him!

Everybody Knows

Let’s take a look at a common query we might use to track replication lag between a Postgres 11 Primary and one or more Replica nodes.

SELECT client_addr, pg_wal_lsn_diff( pg_current_wal_lsn(), sent_lsn ) AS sent_lag, pg_wal_lsn_diff( pg_current_wal_lsn(), write_lsn ) AS write_lag, pg_wal_lsn_diff( pg_current_wal_lsn(), flush_lsn ) AS flush_lag, pg_wal_lsn_diff( pg_current_wal_lsn(), replay_lsn ) AS replay_lag FROM pg_stat_replication;

This gives us the number of bytes the Primary has transmitted to remote servers, given a number of different metrics. How much have we sent? Has that data been written to disk? How much has been synced and is thus safe from crashes? How much has actually been replayed on the Postgres data files? For more information on the pg_stat_replication view, check out the documentation.

If we were using slots instead, we might do this:

SELECT slot_name, pg_wal_lsn_diff( pg_current_wal_lsn(), restart_lsn ) as restart_lag, pg_wal_lsn_diff( pg_current_wal_lsn(), confirmed_flush_lsn ) as flush_lag FROM pg_replication_slots;

There is somewhat less information in pg_replication_slots, as it only really tells us the minimum LSN position the Primary needs to retain for that replica. Of course, if that stops advancing while the Primary keeps working, that’s lag we can see even while the replica is disconnected.

This is especially important with replication slots, since they necessarily mean the Primary is retaining WAL files a replica may need. We don’t want the replay lag v

[...]
カテゴリー: postgresql

Bruce Momjian: Pooler Authentication

planet postgresql - 2019-01-26(土) 00:15:01

One frequent complaint about connection poolers is the limited number of authentication methods they support. While some of this is caused by the large amount of work required to support all 14 Postgres authentication methods, the bigger reason is that only a few authentication methods allow for the clean passing of authentication credentials through an intermediate server.

Specifically, all the password-based authentication methods (scram-sha-256, md5, password) can easily pass credentials from the client through the pooler to the database server. (This is not possible using SCRAM with channel binding.) Many of the other authentication methods, e.g. cert, are designed to prevent man-in-the-middle attacks and therefore actively thwart passing through of credentials. For these, effectively, you have to set up two sets of credentials for each user — one for client to pooler, and another from pooler to database server, and keep them synchronized.

A pooler built-in to Postgres would have fewer authentication pass-through problems, though internal poolers have some down sides too, as I already stated.

カテゴリー: postgresql

Vasilis Ventirozos: PostgreSQL 12 : Where in copy from

planet postgresql - 2019-01-25(金) 19:44:00
5 days ago a commit made it to 12devel that implements WHERE clause in COPY FROM.
Today we're gonna see how it works and see how someone could achieve the same by using file_fdw.

To begin with, lets create a table and put some data in it.

create table test (
id int,
date timestamp without time zone,
);

insert into test (id,date) select generate_series (1,1000000)            ,'2015-01-01';
insert into test (id,date) select generate_series (1000001,2000000),'2016-01-01';
insert into test (id,date) select generate_series (2000001,3000000),'2017-01-01'; insert into test (id,date) select generate_series (3000001,4000000),'2018-01-01'; insert into test (id,date) select generate_series (4000001,5000000),'2019-01-01';
now, lets make this a bit larger than 170MB, dump the data in csv and truncate the table :
monkey=# insert into test select * from test; INSERT 0 5000000 monkey=# insert into test select * from test; INSERT 0 10000000 monkey=# select count(*) from test ;   count ----------  20000000 (1 row)
monkey=# copy test to '/home/vasilis/test.csv' with csv; COPY 20000000
monkey=# truncate test; TRUNCATE TABLE
vasilis@Wrath > ls -lh ~/test.csv -rw-r--r-- 1 vasilis vasilis 759M Jan 25 12:24 /home/vasilis/test.csv
Our test file is about 750Mb, now with an empty table , lets import only the rows that are up to 2 years old :
monkey=# copy test from '/home/vasilis/test.csv' with csv where date >= now() - INTERVAL '2 year' ; COPY 8000000 Time: 17770.267 ms (00:17.770)
It worked , awesome !
Now, lets compare to the alternative, file_fdw :
monkey=# CREATE EXTENSION file_fdw; CREATE EXTENSION Time: 7.288 ms
monkey=# CREATE SERVER pgtest FOREIGN DATA WRAPPER file_fdw; CREATE SERVER Time: 3.957 ms
monkey=# create foreign table file_test (id int, date timestamp without time zone, name text) server pgtest options (filename '/home/vasilis/test.csv', format 'csv'); CREATE FOREIGN TABLE Time: 16.463 ms
monkey=# truncate test ; TRUNCATE TABLE Time: 15.123 ms
monkey=# insert into test select * from file_test [...]
カテゴリー: postgresql

Jehan-Guillaume (ioguix) de Rorthais: Build a PostreSQL Automated Failover in 5 minutes

planet postgresql - 2019-01-25(金) 06:20:00

I’ve been working with Vagrant to quickly build a fresh test cluster for PAF development. Combined with virsh snapshot related commands, I save a lot of time during my tons of tests by quickly rollbacking the whole cluster to the initial state.

Continue Reading »
カテゴリー: postgresql

Robert Haas: How Much maintenance_work_mem Do I Need?

planet postgresql - 2019-01-25(金) 02:04:00
While I generally like PostgreSQL's documentation quite a bit, there are some areas where it is not nearly specific enough for users to understand what they need to do. The documentation for maintenance_work_mem is one of those places. It says, and I quote, "Larger settings might improve performance for vacuuming and for restoring database dumps," but that isn't really very much help, because if it might improve performance, it also might not improve performance, and you might like to know which is the case before deciding to raise the value, so that you don't waste memory.  TL;DR: Try maintenance_work_mem = 1GB.  Read on for more specific advice.

Read more »
カテゴリー: postgresql

Umur Cubukcu: Microsoft Acquires Citus Data: Creating the World’s Best Postgres Experience Together

planet postgresql - 2019-01-25(金) 01:41:00

Today, I’m very excited to announce the next chapter in our company’s journey: Microsoft has acquired Citus Data.

When we founded Citus Data eight years ago, the world was different. Clouds and big data were newfangled. The common perception was that relational databases were, by design, scale up only—limiting their ability to handle cloud scale applications and big data workloads. This brought the rise of Hadoop and all the other NoSQL databases people were creating at the time. At Citus Data, we had a different idea: that we would embrace the relational database, while also extending it to make it horizontally scalable, resilient, and worry-free. That instead of re-implementing the database from scratch, we would build upon PostgreSQL and its open and extensible ecosystem.

Fast forward to 2019 and today’s news: we are thrilled to join a team who deeply understands databases and is keenly focused on meeting customers where they are. Both Citus and Microsoft share a mission of openness, empowering developers, and choice. And we both love PostgreSQL. We are excited about joining forces, and the value that doing so will create: Delivering to our community and our customers the world’s best PostgreSQL experience.

As I reflect on our Citus Data journey, I am very proud of what our team has accomplished. We created Citus to transform PostgreSQL into a distributed database—giving developers game-changing performance improvements and delivering queries that are magnitudes faster than proprietary implementations of Postgres. We packaged Citus as an open source extension of PostgreSQL—so you could always stay current with the latest version of Postgres, unlike all forks of databases prior to it. We launched our Citus Cloud database as a service and grew it to power billions of transactions every day—creating the world’s first horizontally scalable relational database that you can run both on premises, and as a fully-managed service on the cloud.

The most fulfilling part of this journey has been seeing all the things our

[...]
カテゴリー: postgresql

Jeff McCormick: What's New in Crunchy PostgreSQL Operator 3.5

planet postgresql - 2019-01-25(金) 01:12:00

Crunchy Data is happy to announce the release of the open source  PostgreSQL Operator 3.5 for Kubernetes project, which you can find here: https://github.com/CrunchyData/postgres-operator/

This latest release provides further feature enhancements designed to support users intending to deploy large-scale PostgreSQL clusters on Kubernetes, with enterprise high-availability and disaster recovery requirements.

When combined with the Crunchy PostgreSQL Container Suite, the PostgreSQL Operator provides an open source, Kubernetes-native PostgreSQL-as-a-Service capability.

Read on to see what is new in PostgreSQL Operator 3.5.

カテゴリー: postgresql

Robert Haas: Who Contributed to PostgreSQL Development in 2018?

planet postgresql - 2019-01-24(木) 03:57:00
This is my third annual post on who contributes to PostgreSQL development.  I have been asked a few times to include information on who employs these contributors, but I have chosen not to do that, partly but not only because I couldn't really vouch for the accuracy of any such information, nor would I be able to make it complete.  The employers of several people who contributed prominently in 2018 are unknown to me.
Read more »
カテゴリー: postgresql

Bruce Momjian: Synchronizing Authentication

planet postgresql - 2019-01-24(木) 02:00:01

I have already talked about external password security. What I would like to talk about now is keeping an external-password data store synchronized with Postgres.

Synchronizing the password is not the problem (the password is only stored in the external password store), but what about the existence of the user. If you create a user in LDAP or PAM, you would like that user to also be created in Postgres. Another synchronization problem is role membership. If you add or remove someone from a role in LDAP, it would be nice if the user's Postgres role membership was also updated.

ldap2pg can do this in batch mode. It will compare LDAP and Postgres and modify Postgres users and role membership to match LDAP. This email thread talks about a custom solution then instantly creates users in Postgres when they are created in LDAP, rather than waiting for a periodic run of ldap2pg.

カテゴリー: postgresql

ページ