planet postgresql

Subscribe to planet postgresql のフィード
Planet PostgreSQL
更新: 34分 47秒 前

Bruce Momjian: Cryptographically Authenticated Rows

2018-09-01(土) 08:15:01

When storing data in the database, there is an assumption that you have to trust the database administrator to not modify data in the database. While this is generally true, it is possible to detect changes (but not removal) of database rows.

To illustrate this, let's first create a table:

CREATE TABLE secure_demo ( id SERIAL, car_type TEXT, license TEXT, activity TEXT, event_timestamp TIMESTAMP WITH TIME ZONE, username NAME, hmac BYTEA);

Continue Reading »

カテゴリー: postgresql

Ibrar Ahmed: Tuning PostgreSQL Database Parameters to Optimize Performance

2018-09-01(土) 00:38:27

Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. Default values are set to ensure that PostgreSQL runs everywhere, with the least resources it can consume and so that it doesn’t cause any vulnerabilities. It has default settings for all of the database parameters. It is primarily the responsibility of the database administrator or developer to tune PostgreSQL according to their system’s workload. In this blog, we will establish basic guidelines for setting PostgreSQL database parameters to improve database performance according to workload.

Bear in mind that while optimizing PostgreSQL server configuration improves performance, a database developer must also be diligent when writing queries for the application. If queries perform full table scans where an index could be used or perform heavy joins or expensive aggregate operations, then the system can still perform poorly even if the database parameters are tuned. It is important to pay attention to performance when writing database queries.

Nevertheless, database parameters are very important too, so let’s take a look at the eight that have the greatest potential to improve performance

PostgreSQL’s Tuneable Parameters shared_buffer

PostgreSQL uses its own buffer and also uses kernel buffered IO. That means data is stored in memory twice, first in PostgreSQL buffer and then kernel buffer. Unlike other databases, PostgreSQL does not provide direct IO. This is called double buffering. The PostgreSQL buffer is called shared_buffer which is the most effective tunable parameter for most operating systems. This parameter sets how much dedicated memory will be used by PostgreSQL for cache.

The default value of shared_buffer is set very low and you will not get much benefit from that. It’s low because certain machines and operating systems do not support higher values. But in most modern machines, you need to increase this value for optimal performance.

The recommended value is 25% of your total machine RAM. You should try

[...]
カテゴリー: postgresql

Sebastian Insausti: How to Monitor PostgreSQL using Nagios

2018-08-31(金) 18:33:28
Introduction

Regardless of database technology, it is necessary to have a monitoring setup,both to detect problems and take action, or simply to know the current state of our systems.

For this purpose there are several tools, paid and free. In this blog we will focus on one in particular: Nagios Core.

What is Nagios Core?

Nagios Core is an Open Source system for monitoring hosts, networks and services. It allows to configure alerts and has different states for them. It allows the implementation of plugins, developed by the community, or even allows us to configure our own monitoring scripts.

How to Install Nagios?

The official documentation shows us how to install Nagios Core on CentOS or Ubuntu systems.

Let's see an example of the necessary steps for the installation on CentOS 7.

Packages required

[root@Nagios ~]# yum install -y wget httpd php gcc glibc glibc-common gd gd-devel make net-snmp unzip

Download Nagios Core, Nagios Plugins and NRPE

[root@Nagios ~]# wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.4.2.tar.gz [root@Nagios ~]# wget http://nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz [root@Nagios ~]# wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.2.1/nrpe-3.2.1.tar.gz

Add Nagios User and Group

[root@Nagios ~]# useradd nagios [root@Nagios ~]# groupadd nagcmd [root@Nagios ~]# usermod -a -G nagcmd nagios [root@Nagios ~]# usermod -a -G nagios,nagcmd apache

Nagios Installation

[root@Nagios ~]# tar zxvf nagios-4.4.2.tar.gz [root@Nagios ~]# cd nagios-4.4.2 [root@Nagios nagios-4.4.2]# ./configure --with-command-group=nagcmd [root@Nagios nagios-4.4.2]# make all [root@Nagios nagios-4.4.2]# make install [root@Nagios nagios-4.4.2]# make install-init [root@Nagios nagios-4.4.2]# make install-config [root@Nagios nagios-4.4.2]# make install-commandmode [root@Nagios nagios-4.4.2]# make install-webconf [root@Nagios nagios-4.4.2]# cp -R contrib/eventhandlers/ /usr/local/nagios/libexec/ [root@Nagios nagios-4.4.2]# chown -R nagios:nagios /usr/local/nagios/libexec/eve[...]
カテゴリー: postgresql

Shaun M. Thomas: [Video] Data Integration with PostgreSQL

2018-08-30(木) 15:02:01

Just in case you missed the live broadcast, the video of my presentation below covers various topics around integration of PostgreSQL with other data sources and database technologies.

This presentation covers the following topics:

  • What is a Foreign Data Wrapper?
  • How to query MySQL, a Flat file, a Python script, a REST interface and a different Postgres Node
  • Perform all of the above simultaneously
  • Take snapshots of this data for fast access
  • Tweak remote systems for better performance
  • Package as an API for distribution
  • Stream to Kafka
  • Write data to… MongoDB!?
  • What does all of this have in common?

It’s an exciting topic, and I hope more developers and admins begin to see Postgres as the global integration system it really is.

カテゴリー: postgresql

Luca Ferrari: Managing Multiple PostgreSQL Installations with pgenv

2018-08-30(木) 09:00:00

pgenv is a shell script that allows you to quickly manage multiple PostgreSQL installations within the same host. It reminds somehow perlbrew (for Perl 5) and systems like that. In this post I briefly show how to use pgenv as well as I explain which changes I made to it.

Managing Multiple PostgreSQL Installations with pgenv

pgenv is another pearl from theory. It is a bash single script that allows you to download, build, start and stop (as well as nuke) several PostgreSQL installations within the same host.
It is worth noting that pgenv is not, at least now, an enterprise-level PostgreSQL management tool, rather an easy way to keep test instances clean and organized. It can be very useful to keep several clusters on which doing experiments, testing, and so on.

I first discovered pgenv reading this blog post by David, and I thought it was cool to have a single script to help me manage several environments. I must be honest, this is not the first tool like this I have seen for PostgreSQL, but somehow it caught my attention. I then cloned the repository and start using it. And since I’m curious, I read the source code. Well, ehm, bash? Ok, it is not my favourite shell anymore, but surely it can speed up development while shorting the code with respect to more portable shells.

pgenv works with a command-oriented interface: as in git or other developer-oriented tools you specify a command (e.g., build) and optionally a specific PostgreSQL version to apply the command to. pgenv works on a single cluster at time, by linking and unlinking the specific instance...

カテゴリー: postgresql

Dimitri Fontaine: Preventing SQL Injections

2018-08-30(木) 08:17:00

An SQL Injection is a security breach, one made famous by the Exploits of a Mom xkcd comic episode in which we read about little Bobby Tables:

PostgreSQL implements a protocol level facility to send the static SQL query text separately from its dynamic arguments. An SQL injection happens when the database server is mistakenly led to consider a dynamic argument of a query as part of the query text. Sending those parts as separate entities over the protocol means that SQL injection is no longer possible.

カテゴリー: postgresql

Craig Kerstiens: Postgres data types you should consider using

2018-08-30(木) 03:07:00

Postgres is a rich and powerful database. And the existence of PostgreSQL extension APIs have enabled Postgres to expand its capabilities beyond the boundaries of what you would expect in a traditional relational database. Examples of popular Postgres extensions today include HyperLogLog, which gives you approximate distincts with a small footprint—to rich geospatial support via PostGIS—to Citus which helps you scale out your Postgres database across multiple nodes to improve performance for multi-tenant SaaS applications and real-time analytics dashboards—to the built-in full text search capabilities in PostgreSQL. With all the bells and whistles you can layer into Postgres, sometimes the most basic built-ins get overlooked.

PostgreSQL has nearly 100 different data types, and these data types can come with their own tuned indexing or their own specialized functions. You probably already use the basics such as integers and text, and today we’re going to take a survey of less-used but incredibly powerful PostgreSQL data types.

JSONB tops the list of Postgres data types

Postgres first received JSON support in Postgres 9.2. But the initial JSON support in 9.2 was about JSON validation, hence was less ideal for many use cases that needed JSON as well as fast query performance.

A couple of years later we got the the successor to the JSON datatype: JSONB. JSONB is a binary version of JSON stored on disk. JSONB compresses, so you lose whitespace, but it comes with some powerful index types to allow you to work much more flexibly with your JSON data.

JSONB is great for unstructured data, and with Postgres you can easily join JSONB data to your other relational models. We use JSONB ourselves heavily at Citus for things such as feature flags, event observation data, and recording logs. You can index JSONB data with GIN indexes which allow you to query on keys and speed up your lookups, since the GIN index makes the full document automatically available for you.

Range types are a calendar app’s best friend

Let’s face it,

[...]
カテゴリー: postgresql

Bruce Momjian: Foreign Data Wrappers and Passwords

2018-08-30(木) 00:45:01

Foreign data wrappers (FDW) allow data to be read and written to foreign data sources, like NoSQL stores or other Postgres servers. Unfortunately the authentication supported by FDWs is typically limited to passwords defined using CREATE USER MAPPING. For example, postgres_fdw only supports password-based authentication, e.g. SCRAM. Though only the database administrator can see the password, this can still be a security issue.

Ideally, at least some of the Postgres FDWs should support more sophisticated authentication methods, particularly SSL certificates. Another option would be to allow user authentication to be sent through FDWs, so the user has the same permissions on the FDW source and target. There is no technical reason FDW authentication is limited to passwords. This problem has been discussed, and it looks like someone has a plan for solving it, so hopefully it will be improved soon.

カテゴリー: postgresql

Ibrar Ahmed: Tune Linux Kernel Parameters For PostgreSQL Optimization

2018-08-29(水) 21:33:46

For optimum performance, a PostgreSQL database depends on the operating system parameters being defined correctly. Poorly configured OS kernel parameters can cause degradation in database server performance. Therefore, it is imperative that these parameters are configured according to the database server and its workload. In this post, we will discuss some important kernel parameters that can affect database server performance and how these should be tuned.

SHMMAX / SHMALL

SHMMAX is a kernel parameter used to define the maximum size of a single shared memory segment a Linux process can allocate. Until version 9.2, PostgreSQL uses System V (SysV) that requires SHMMAX setting. After 9.2, PostgreSQL switched to POSIX shared memory. So now it requires fewer bytes of System V shared memory.

Prior to version 9.3 SHMMAX was the most important kernel parameter. The value of SHMMAX is in bytes.

Similarly SHMALL is another kernel parameter used to define system wide total amount of shared memory pages. To view the current values for SHMMAX, SHMALL or SHMMIN, use the ipcs command.

$ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 1073741824 max total shared memory (kbytes) = 17179869184 min seg size (bytes) = 1 $ ipcs -M IPC status from as of Thu Aug 16 22:20:35 PKT 2018 shminfo: shmmax: 16777216 (max shared memory segment size) shmmin: 1 (min shared memory segment size) shmmni: 32 (max number of shared memory identifiers) shmseg: 8 (max shared memory segments per process) shmall: 1024 (max amount of shared memory in pages)

PostgreSQL uses System V IPC to allocate the shared memory. This parameter is one of the most important kernel parameters. Whenever you get following error messages, it means that you have an older version PostgreSQL and you have a very low SHMMAX value. Users are expected to adjust and increase the value according to the shared memory they are going to use.

Possible misconfiguration e[...]
カテゴリー: postgresql

Brian Fehrle: A Guide to Partitioning Data In PostgreSQL

2018-08-29(水) 18:58:00
What is Data Partitioning?

For databases with extremely large tables, partitioning is a wonderful and crafty trick for database designers to improve database performance and make maintenance much easier. The maximum table size allowed in a PostgreSQL database is 32TB, however unless it’s running on a not-yet-invented computer from the future, performance issues may arise on a table with only a hundredth of that space.

Partitioning splits a table into multiple tables, and generally is done in a way that applications accessing the table don’t notice any difference, other than being faster to access the data that it needs. By splitting the table into multiple tables, the idea is to allow the execution of the queries to have to scan much smaller tables and indexes to find the data needed. Regardless of how efficient an index strategy is, scanning an index for a table that’s 50GB will always be much faster than an index that’s for a table at 500GB. This applies to table scans as well, because sometimes table scans are just unavoidable.

When introducing a partitioned table to the query planner, there are a few things to know and understand about the query planner itself. Before any query is actually executed, the query planner will take the query and plan out the most efficient way it will access the data. By having the data split up across different tables, the planner can decide what tables to access, and what tables to completely ignore, based on what each table contains.

This is done by adding constraints to the split up tables that define what data is allowed in each table, and with a good design, we can have the query planner scan a small subset of data rather than the whole thing.

Should A Table Be Partitioned?

Partitioning can drastically improve performance on a table when done right, but if done wrong or when not needed, it can make performance worse, even unusable.

How big is the table?

There is no real hardline rule for how big a table must be before partitioning is an option, but based on database access

[...]
カテゴリー: postgresql

Christophe Pettus: Don’t LOCK tables. Just don’t.

2018-08-29(水) 09:00:10

It’s not uncommon that an application needs to serialize access to one or more resources. The temptation is very high to use the LOCK TABLE SQL statement to do this.

Resist the temptation.

There are many issues with using LOCK:

  • It blocks autovacuum, which can cause bloat or even transaction ID wraparound in extreme cases.
  • An ACCESS EXCLUSIVE lock (the default mode) is passed down to secondaries, which can block queries there, or even cause deadlock-type situations.
  • It’s easy to cause deadlocks with bad LOCKing order.

If the goal is to serialize access, consider using advisory locks instead. They have all of the benefits of a LOCK on a table, while not actually blocking access to autovacuum, or access on secondaries.

(Yes, some database tools may need to take explicit locks for a variety of reasons; that’s a different matter, of course.)

カテゴリー: postgresql

Hans-Juergen Schoenig: What hot_standby_feedback in PostgreSQL really does

2018-08-28(火) 19:23:24

Many of you out there using PostgreSQL streaming replication might wonder what this hot_standby_feedback parameter in postgresql.conf really does. Support customers keep asking this question so it might be useful to share this knowledge with a broader audience of PostgreSQL users out there.

What VACUUM does in PostgreSQL

VACUUM is an essential command in PostgreSQL its goal is to clean out dead rows, which are not needed by anyone anymore. The idea is to reuse space inside a table later as new data comes in. The important thing is: The purpose of VACUUM is to reuse space inside a table – this does not necessarily imply that a relation will shrink. Also: Keep in mind that VACUUM can only clean out dead rows, if they are not need anymore by some other transaction running on your PostgreSQL server.

Consider the following image:

How hot_standby_feedback and VACUUM work together in PostgreSQL

As you can see we have two connections here. The first connection on the left side is running a lengthy SELECT statement. Now keep in mind: An SQL statement will basically “freeze” its view of the data. Within an SQL statement the world does not “change” – the query will always see the same set of data regardless of changes made concurrently. That is really really important to understand.

Let us take a look at the second transaction. It will delete some data and commit. The question that naturally arises is: When can PostgreSQL really delete this row from disk? DELETE itself cannot really clean the row from disk because there might still be a ROLLBACK instead of a COMMIT. In other words a rows must not be deleted on DELETE. PostgreSQL can only mark it as dead for the current transaction. As you can see other transactions might still be able to see those deleted rows.
However, even COMMIT does not have the right to really clean out the row. Remember: The transaction on the left side can still see the dead row because the SELECT statement does not change its snapshot while it is running. COMMIT is therefore too early to clean out

[...]
カテゴリー: postgresql

Achilleas Mantzios: Multitenancy Options for PostgreSQL

2018-08-28(火) 18:58:00

Multi-tenancy in a software system is called the separation of data according to a set of criteria in order to satisfy a set of objectives. The magnitude/extend, the nature and the final implementation of this separation is dependent on those criteria and objectives. Multi-tenancy is basically a case of data partitioning but we’ll try to avoid this term for the obvious reasons (the term in PostgreSQL has a very specific meaning and is reserved, as declarative table partitioning was introduced in postgresql 10).

The criteria might be:

  1. according to the id of an important master table, which symbolizes the tenant id which might represent:
    1. a company/organization within a larger holding group
    2. a department within a company/organization
    3. a regional office/branch of the same company/organization
  2. according to a user’s location/IP
  3. according to a user’s position inside the company/organization

The objectives might be:

  1. separation of physical or virtual resources
  2. separation of system resources
  3. security
  4. accuracy and convenience of management/users at the various levels of the company/organization

Note by fulfilling an objective we also fulfill all the objectives beneath, i.e. by fulfilling A we also fulfill B, C and D, by fulfilling B we also fulfill C and D, and so forth.

If we want to fulfill objective A we may choose to deploy each tenant as a separate database cluster within its own physical/virtual server. This gives maximum separation of resources and security but gives poor results when we need to see the whole data as one, i.e. the consolidated view of the whole system.

If we want to only achieve objective B we might deploy each tenant as a separate postgresql instance in the same server. This would give us control over how much space would be assigned to each instance, and also some control (depending on the OS) on CPU/mem utilization. This case is not essentially different than A. In the modern cloud computing era, the gap between A and B tends to get smaller and smaller, so that A will be most probably the p

[...]
カテゴリー: postgresql

Federico Campoli: A mechanical elephant

2018-08-28(火) 09:00:00

In the previous post we modified the apt role for controlling the setup in a declarative way. Then we added a ssh role for configuring the three devuan servers. The role is used to configure the server’s postgres process owner for ssh passwordless connection.

In this tutorial we’ll complete the setup for the postgres user and then we’ll configure the database clusters with a new role.

カテゴリー: postgresql

Bruce Momjian: Certificate Revocation Lists

2018-08-28(火) 08:30:01

If you are setting up Postgres server or client TLS/SSL certificates, be sure to also configure support for a certificate revocation list (CRL). This list, distributed by the certificate authority, lists certificates that should no longer be trusted.

While the CRL will initially likely be empty, a time will come when a private key used by a certificate or device is exposed in an unauthorized manner, or an employee who had access to private keys leaves your organization. When that happens, you will need the ability to invalidate certificates — having that ability pre-configured will help, especially during a crisis.

カテゴリー: postgresql

Daniel Vérité: Beware of your next glibc upgrade

2018-08-28(火) 01:15:12

GNU libc 2.28, released on August 1, 2018, has among its new features a major update of its Unicode locale data with new collation information.

From the announcement:

The localization data for ISO 14651 is updated to match the 2016 Edition 4 release of the standard, this matches data provided by Unicode 9.0.0. This update introduces significant improvements to the collation of Unicode characters. […] With the update many locales have been updated to take advantage of the new collation information. The new collation information has increased the size of the compiled locale archive or binary locales.

For Postgres databases using language and region-sensitive collations, which tend to be the default nowadays, it means that certain strings might sort differently after this upgrade. A critical consequence is that indexes that depend on such collations must be rebuilt immediately after the upgrade. Servers in WAL-based/streaming replication setups should also be upgraded together since a standby must run the same libc/locales as its primary.

The risk otherwise is index corruption issues, as mentioned for instance in these two threads from pgsql-general: “Issues with german locale on CentOS 5,6,7”, and “The dangers of streaming across versions of glibc: A cautionary tale”

So while this issue is not new, what’s special about glibc-2.28 is the scale of the update in locales, which is unprecedented in recent times. Previously and since year 2000, according to bug#14095, the locale data in glibc were modified on a case-by-case basis. This time, there’s a big merge to close the gap with the standard.

To get a feel for the extent of these changes, I’ve installed ArchLinux which already has glibc-2.28, along with PostgreSQL 10.5, and compared some query results against the same Postgres on Debian 9 (“stretch”) with glibc-2.24.

I expected changes, but not so broad. Simple tests on plain ASCII strings reveal obvious differences immediately. For instance, with the en_US.UTF-8 locale:

Debian stretch (glibc 2.24)

=# select ver[...]
カテゴリー: postgresql

Pavel Trukhanov: Real world SSD wearout

2018-08-28(火) 00:16:45

A year ago we’ve added SMART metrics collection to our monitoring agent that collects disk drive attributes on clients servers.

So here a couple of interesting cases from the real world.

Because we needed it to work without installing any additional software, like smartmontools, we implemented collection not of all the attributes, but only basic and not vendor-specific ones — to be able to provide consistent experience. And also that way we skipped burdensome task of maintaining a knowledge base of specific stuff — and I like that a lot :)

This time we’ll discuss only SMART attribute named “media wearout indicator”. Normalized, it shows a percentage of “write resource” left in the device. Under the hood the device keeps track of the number of cycles the NAND media has undergone, and the percentage is calculated against the maximum number of cycles for that device. The normalized value declines linearly from 100 to 1 as the average erase cycle count increases from 0.

Are there any actually dead SSDs?

Though SSDs are pretty common nowadays, just couple of years earlier you could hear a lot of fear talk about SSD wearout. So we wanted to see if some of it were true. So we searched for the maximum wearout across all the devices of all of our clients.

It was just 1%

Reading the docs says it just won’t go below 1%. So it is weared out.

We notified this client. Turns out it was a dedicated server in Hetzner. Their support replaced the device:

Do SSDs die fast?

As we introduced SMART monitoring for some of the clients already some time ago, we have accumulated history. And now we can see it on a timeline.

A server with highest wearout rate we have across our clients servers unfortunately was added to okmeter.io monitoring only two month ago:

This chart indicates that during these two month only, it burned through 8% of “write resource”.

So 100% of this SSD lifetime under that load will end in 100/(8/2) = 2 years.

Is that a lot or too little? I don’t know. But let’s check what kind of load it’s serving?

As you can see

[...]
カテゴリー: postgresql

Vladimir Svedov: Understanding System Columns in PostgreSQL

2018-08-27(月) 18:58:00

So you sit with your hands over a keyboard and think “what fun I can have to make my lifetime even curiouser?..” Well - create a table of course!

vao=# create table nocol(); CREATE TABLE vao=# select * from nocol; -- (0 rows)

What fun is it about a table with no data?.. Absolutely none! But I can easily fix it:

vao=# insert into nocol default values; INSERT 0 1

It looks weird and quite stupid to have a table with no columns and one row. Not to mention it is not clear what “default values” there were inserted… Well - reading few lines from docs reveals that “All columns will be filled with their default values.” Yet I have no columns! Well - I surely have some:

vao=# select attname, attnum, atttypid::regtype, attisdropped::text from pg_attribute where attrelid = 'nocol'::regclass; attname | attnum | atttypid | attisdropped ----------+--------+----------+-------------- tableoid | -7 | oid | false cmax | -6 | cid | false xmax | -5 | xid | false cmin | -4 | cid | false xmin | -3 | xid | false ctid | -1 | tid | false (6 rows)

So these six are definitely not the ALTER TABLE DROP COLUMN zombies because attisdropped is false. Also I see that the type name of those columns end up with “id”. Reading the bottom section of Object Identifier Types will give the idea. Another funny observation is - the -2 is missing! I wonder where I could have lost it - I just created a table after all! Hm, what object identifier is missing in my table? By definition I mean. I have tuple, command and xact ids. Well unless some “global over whole db identifier”, like oid?.. Checking is easy - I will create table with OIDS:

vao=# create table nocol_withoid() with oids; CREATE TABLE vao=# select attname, attnum, atttypid::regtype, attisdropped::text from pg_attribute where attrelid = 'nocol_withoid'::regclass; attname | attnum | atttypid | attisdropped ----------+--------+----------+-------------- tableoid | -7 | oid | false cmax | -6 | c[...]
カテゴリー: postgresql

Pavel Stehule: New release of pspg pager

2018-08-27(月) 17:00:00
I redesigned some work with mouse - ncurses native implementation is simple, but slow by design.

A default layout of pspg is based on old Norton Commander layout. It is good for beginners, because almost all controls are visible. Probably, when you work with pspg longer time, then you prefer more visible content against auxiliary lines. The lines (bars) can be disabled now - you can run pspg with option --no-bars. The pspg is available from github https://github.com/okbob/pspg
カテゴリー: postgresql

Jobin Augustine: PostgreSQL Accessing MySQL as a Data Source Using mysqsl_fdw

2018-08-25(土) 00:38:55

There are many organizations where front/web-facing applications use MySQL and back end processing uses PostgreSQL®. Any system integration between these applications generally involves the replication—or duplication—of data from system to system. We recently blogged about pg_chameleon which can be used replicate data from MySQL® to PostgreSQL. mysql_fdw can play a key role in eliminating the problem of replicating/duplicating data. In order to eliminate maintaining the same data physically in both postgres and MySQL, we can use mysql_fdw. This allows PostgreSQL to access MySQL tables and to use them as if they are local tables in PostgreSQL. mysql_fdw can be used, too, with Percona Server for MySQL, our drop-in replacement for MySQL.

This post is to showcase how easy it is to set that up and get them working together. We will address a few points that we skipped while discussing about FDWs in general in our previous post

Preparing MySQL for fdw connectivity

On the MySQL server side, we need to set up a user to allow for access to MySQL from the PostgreSQL server side. We recommend Percona Server for MySQL if you are setting it up for the first time.

mysql> create user 'fdw_user'@'%' identified by 'Secret!123';

This user needs to have privileges on the tables which are to be presented as foreign tables in PostgreSQL.

mysql> grant select,insert,update,delete on EMP to fdw_user@'%'; Query OK, 0 rows affected (0.00 sec) mysql> grant select,insert,update,delete on DEPT to fdw_user@'%'; Query OK, 0 rows affected (0.00 sec) Installing mysql_fdw on PostgreSQL server

Under the hood, MySQL FDW (mysql_fdw) facilitates the use of PostgreSQL server as a client for MySQL Server, which means it can then fetch data from the MySQL database as a client. Obviously, mysql_fdw uses MySQL client libraries. Nowadays, many Linux distributions are packaged with MariaDB® libraries. This works well enough for mysql_fdw to function. If we install mysql_fdw from the PGDG repo, then mariadb-devel.x86_64 packages will be installed alongsid

[...]
カテゴリー: postgresql

ページ