Michael Paquier: Two-phase commit and temporary objects

planet postgresql - 2019-02-01(金) 17:01:39

A couple of weeks ago a bug has popped up on the community mailing lists about the use of temporary objects in two-phase commit. After discussions, the result is the following commit:

commit: c5660e0aa52d5df27accd8e5e97295cf0e64f7d4 author: Michael Paquier <michael@paquier.xyz> date: Fri, 18 Jan 2019 09:21:44 +0900 Restrict the use of temporary namespace in two-phase transactions Attempting to use a temporary table within a two-phase transaction is forbidden for ages. However, there have been uncovered grounds for a couple of other object types and commands which work on temporary objects with two-phase commit. In short, trying to create, lock or drop an object on a temporary schema should not be authorized within a two-phase transaction, as it would cause its state to create dependencies with other sessions, causing all sorts of side effects with the existing session or other sessions spawned later on trying to use the same temporary schema name. Regression tests are added to cover all the grounds found, the original report mentioned function creation, but monitoring closer there are many other patterns with LOCK, DROP or CREATE EXTENSION which are involved. One of the symptoms resulting in combining both is that the session which used the temporary schema is not able to shut down completely, waiting for being able to drop the temporary schema, something that it cannot complete because of the two-phase transaction involved with temporary objects. In this case the client is able to disconnect but the session remains alive on the backend-side, potentially blocking connection backend slots from being used. Other problems reported could also involve server crashes. This is back-patched down to v10, which is where 9b013dc has introduced MyXactFlags, something that this patch relies on. Reported-by: Alexey Bashtanov Author: Michael Paquier Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/5d910e2e-0db8-ec06-dd5f-baec420513c3@imap.cc Backpatch-through: 10

In PostgreSQL, temporary objects are as

カテゴリー: postgresql

Blast from the Past

planet PHP - 2019-02-01(金) 16:00:00
カテゴリー: php

CoderCruise 2019 - Call for Speakers

php.net - 2019-02-01(金) 07:01:27
カテゴリー: php

DDD Europe notes - Day 1

planet PHP - 2019-02-01(金) 04:00:00
Eric Evans: Keynote ("Language in Context")

Starting out with the basics (words have meaning within a context; when we make the boundary of this context explicit we end up with a bounded context), Eric discussed two main topics: the Big Ball of Mud, and bounded contexts in the context (no pun intended) of microservices.

Legacy applications are always framed in a negative way, as if it's something to get away from. Personally, I've come to enjoy them a lot. However, there will always be the urge to work around legacy code. The Bubble Context (PDF) can be a great way of creating a new model that works well next to the already existing models. To keep a safe buffer between the new and the old model, you could build an Anti-Corruption Layer (ACL). A fun thing Eric mentioned is that the buffer works in two directions. The ACL also allows the old model to keep functioning without being disturbed by all the new things that are going on in the Bubble Context.

Given that a bounded context may or may not align with an actual subdomain, it may very well be that a legacy context is actually a Big Ball of Mud, with unified models, and simply just a lot of domain knowledge written between the many spaghetti lines that are in there. However, even though it is a mess, and it's getting harder to work with it every day, it may still be what you could call a "Mature Productive Context". The question is: is it still aligned with business views? If it is, we could improve at least maintainability and the cost of change by performing local refactorings. If it isn't, it'll be very hard to change anything. If the basic assumptions of the model change, rework will be very costly.

As a matter of fact, for a project I'm currently working on, we're looking into a module (or context), which requires some heavy refactoring, because it has become technically very hard to work with it. However, the business is still quite happy about it, and it's quite central to its processes.

An important Domain-Driven approach which can be used in the area of legacy code is where you analyze the different subdomains, and find out which ones are generic, and which ones are "core" to the business. As an example, in the aforementioned project there are actually two candidates for context-level improvements. One is related to Sales (which is the heart of this financial application), and one is related to Addressbook records (which is very much supportive to the Sales part). One could say it's even generic, in the sense that an off the shelf solution might be preferable. We wouldn't want to spend a lot of design or development effort there either.

Eric mentioned the term "Quaint Context" as a suitable name for a context that one would consider "legacy". It uses outdated technology probably, and has become hard to maintain. It won't be possible to make big changes there (as mentioned, because these basic assumptions can't easily be changed), so another good name could be "Patch-by-Patch Context".

With a microservice architecture, another option to deal with legacy contexts becomes what Eric calls the "Exposed Legacy Asset" (yet another nice term!). This will be a legacy application which starts to adapt to the microservices environment by producing messages that will be useful for actual microservices in that environment. For instance, database triggers could be used to produce events. The external events themselves don't have to be as low-level as the internal events that caused them.

Eric touches on several other interesting aspects of microservice architecture, but I wanted to briefly mention some other relevant ideas here. Eric looked back at 15 years of Domain-Driven Design and proposed that by now we maybe need a definition of DDD itself. He doesn't want DDD to be just a club, but asks for intellectual honesty. If you try to apply DDD and somehow it fails, you should share this story. If you're skeptical about some aspect of DDD, talk about it. I like how it boils down to focusing on the core domain, exploring models together, and speaking a ubiquitous language in an explicitly bounded context. Nice!

Rebecca Wirfs-Brock: Growing Your Design Heuristics Toolkit

This one was a workshop with limited access, so I was lucky I could attend it. Rebecca had spoken in a previous edition of the conference about heuristics, which triggered my interest in the idea. The workshop was about the process behind it. It had some interesting pointers, like a PDF about the concept and a book by Billy Vaughn Koen: Discussion of the Method. Definitely things to check ou

Truncated by Planet PHP, read more at the original (another 2340 bytes)

カテゴリー: php

Keith Fiske: Managing Transaction ID Exhaustion (Wraparound) in PostgreSQL

planet postgresql - 2019-01-31(木) 21:14:00

One of the most critical topics to understand when administering a PostgresSQL database is the concept of transaction IDs (TXID) and that they can be exhausted if not monitored properly. However, this blog post isn't going to go into the details of what it TXID exhaustion actually is. The Routine Vacuuming section of the documentation is probably one of the most important to read and understand so I will refer you there. What this blog post is going to cover is an easy way to monitor for it and what can be done to prevent it ever being a problem.

カテゴリー: postgresql

Using img2lambda to publish your Serverless PHP layer

planet PHP - 2019-01-31(木) 20:02:00

This interesting tweet by Clare Liguori came to my attention last week:

This new img2lambda tool will take the layers of a Docker container and convert them to AWS layers for use in Lambda.

I poked around with Clare's example and updated my lambda-php project in order to understand how it works. I also rewrote my runtime's bootstrap to make it clearer.

The clever thing from my point of view is that you can build your PHP runtime layer locally using Docker and then publish the layers to AWS for use in your Lambda functions. This means you can now use the layer in different projects and have them all reference the same PHP runtime.

The magic of Docker

This is all done with the magic of Docker. Specifically, we create a Dockerfile that creates to containers:

  • A container to build the PHP binary
  • A container containing the layers that img2lambda will upload to AWS

Simplified, it looks like this:

# Build PHP in the Lambda container FROM amazonlinux:2017.03.1.20170812 as builder # PHP version of create ARG ver="7.3.1" RUN sed -i 's;^releasever.*;releasever=2017.03;;' /etc/yum.conf && \ yum clean all && \ yum install -y autoconf bison gcc gcc-c++ make libcurl-devel {etc} RUN curl -sL https://github.com/php/php-src/archive/php-${ver}.tar.gz | tar -xvz && \ cd php-src-php-${ver} && \ ./buildconf --force && \ ./configure --prefix=/opt/php/ --with-openssl --with-curl --{etc} && \ make install && \ /opt/php/bin/php -v # copy php binary into /runtime/bin/php RUN mkdir -p /runtime/bin && \ cp /opt/php/bin/php /runtime/bin/php # copy bootstrap files into /runtime COPY src/* /runtime/ # Create runtime container for use with img2lambda FROM lambci/lambda:provided as runtime COPY --from=builder /runtime /opt/

The first part download and compiles the PHP binary and puts it into /runtime/bin/php and also copies in the bootstrap files required to make the layer act as a Lambda runtime.

The second part simply copies all the files in /runtime in the first container into a new container. This new container containers a single layer which is our PHP runtime.

To build it we do:

$ docker build -t lambda-php-runtime .

The tag name can be anything.

Create the the layer in AWS

To create the AWS layer we use img2lambda:

$ img2lambda -i lambda-php-runtime:latest -r eu-west-2 -n lambda-php73

This tool will find the layer in the runtime container and upload it to eu-west2 and then store the identifier in output/layers.json which looks like this.

[ "arn:aws:lambda:eu-west-2:66...06:layer:lambda-php73-sha256-e3c4...2e618b:1" ]

Using in Serverless Framework

You can take the ARN number and use it for your actions in serverless.yml like this:

functions: hello: handler: handler.hello layers: - "arn:aws:lambda:eu-west-2:66...06:layer:lambda-php73-sha256-e3c4...2e618b:1"

This is the way that we can re-use our PHP runtime in all our actions across multiple projects.

Serverless also allows you to import data from JSON files into your serverless.yml file, so you can also do:

functions: hello: handler: handler.hello # {file name}.{function name}. In this case: hello() in handler.php layers: ${file(../php-runtime/output/layers.json)}

This has the advantage that when you re-build and upload PHP runtime layer, your project picks it up, which could also be convenient.

Testing locally

One of the benefits of building the runtime in Docker is that we can run it locally for testing our Lambda function. As our runtime container inherits from lambci/lambda:provided, we get an environment that looks very much like Lambda itself.

To test locally, we use another container and inherits from our runtime one and copies in the PHP files for our function. The Dockerfile looks something like this:

FROM lambda-php-runtime as function COPY handler.php /var/task/src/handler.php

We create our cont

Truncated by Planet PHP, read more at the original (another 1321 bytes)

カテゴリー: php

Bruce Momjian: Limiting Superuser Activity

planet postgresql - 2019-01-31(木) 02:15:01

This interesting email thread explores the question of how much you can prevent or detect unauthorized database superuser activity. The main conclusions from the thread are:

  • It is impossible to restrict database administrator access without hindering their ability to perform their jobs
  • Monitoring superuser activity is the most reasonable way to detect and hopefully discourage unauthorized activity
  • Monitoring includes:
    • Assign a separate account to each administrator for auditing purposes; do not use generic/shared accounts
    • Use an auditing tool to record database activity, e.g., pgAudit
    • Use syslog to send database logs to a computer not under database administrators' control
    • Record all shell command activity in a similar way

There is also a helpful summary email.

カテゴリー: postgresql

Ketting 2.5 release.

planet PHP - 2019-01-31(木) 02:00:00

Ketting is an attempt at making a generic Hypermedia API client for javascript. It uses a ‘link’ and following links as the most important primitive.

Last week I released version 2.5.0 on NPM so maybe it’s time to list some of the improvements since the last update in order from most to least interesting.

JSON:API support

JSON:API is a standard for making JSON-based APIs, and it has some support for hypermedia ideas as well.

In particular, the standard has a links object, and it also defines a relationship between collections and members of collections.

This release now ships with a ‘representor’ for JSON API, which means you can now do something like this:

const myResource = await home .follow('next') // This might be a html5 document .follow('author') // This might be a HAL document .follow('about') // This might be a JSON:API document

All of these will now be seamless regardless of which format any of these hops used, only when you get() and put() the format is relevant again.

A lot of help in getting this right came from JSON:API community members such as Gabe Sullice and Ethan Resnick. If you’re interested in the full discussion about how JSON:API maps to HATEOAS read the Github issue.

go() function

Not every endpoint will have every appropriate link to another resource. Sometimes you just have to work with what you have, and fetch a resource yourself.

The go() function is a function that exists on a resource and simply gives you a new resource based on the url you passed. The url can be relative and will be resolved based on the ‘current’ resource.

const anotherResource = resource.go('?q=hello'); Resource is generic

If you use typescript with this package, it might be desirable to have some typing.

Adding good support for this will be an ongoing process, but as a first step the Resource class is now generic.

This means that you can define a function such as:

function foo(): Resource<MyBodyType> { // ... }

Typescript will then use MyBodyType as the type that will be returned from get() and refresh() and will demand that the value passed to put() has this type.

By default this is any.

I think this is a decent first step, but I also imagine this will have to continue to evolve.

esModuleInterop is false

The Typescript build now has esModuleInterop off. Apparently keeping it set to true caused anyone who uses the library and has this setting set to false to get errors.

So in order to not force people to change this setting, esModuleInterop is now off, which means the library will work regardless of your own preference.

カテゴリー: php

Liaqat Andrabi: Webinar: Banking on Postgres – Financial Application Considerations [Follow up]

planet postgresql - 2019-01-31(木) 00:34:37

The demand for PostgreSQL within the financial industry has been rapidly increasing in the recent years; mainly due to reduction in licensing costs, better choice of open source tools, and the robust enterprise features that PostgreSQL provides.

2ndQuadrant hosted the “Banking on Postgres” webinar to discuss attributes of financial databases based on Postgres, configuration processes, hardware needs, availability, backups, and more.

The webinar was presented by Shaun Thomas, Principal Consultant at 2ndQuadrant. Those who weren’t able to attend the live event can now view the recording here.

For any questions or comments regarding Postgres-BDR, please send an email to info@2ndQuadrant.com.

カテゴリー: postgresql

Pavel Stehule: plpgsql_check - new report for code coverage ratio calculation

planet postgresql - 2019-01-30(水) 18:11:00
Few months ago I integrated a profiler into plpgsql_check.

The result of prifiling is line oriented:

postgres=# select lineno, avg_time, source from plpgsql_profiler_function_tb('fx(int)');
│ lineno │ avg_time │ source │
│ 1 │ │ │
│ 2 │ │ declare result int = 0; │
│ 3 │ 0.075 │ begin │
│ 4 │ 0.202 │ for i in 1..$1 loop │
│ 5 │ 0.005 │ select result + i into result; select result + i into result; │
│ 6 │ │ end loop; │
│ 7 │ 0 │ return result; │
│ 8 │ │ end; │
(9 rows)

This format is well readable, but it is not practical for calculation of code coverage metrics. So this week I wrote new function, that produce
report based on commands:

CREATE OR REPLACE FUNCTION public.fx1(a integer)
RETURNS integer
LANGUAGE plpgsql
1 AS $function$
2 begin
3 if a > 10 then
4 raise notice 'ahoj';
5 return -1;
6 else
7 raise notice 'nazdar';
8 return 1;
9 end if;
10 end;
11 $function$

postgres=# select stmtid, parent_stmtid, parent_note, lineno, exec_stmts, stmtname
from plpgsql_profiler_function_statements_tb('fx1');
│ stmtid │ parent_stmtid │ parent_note │ lineno │ exec[...]
カテゴリー: postgresql

php[tek] 2019

php.net - 2019-01-30(水) 02:59:31
カテゴリー: php

409 Conflict

planet PHP - 2019-01-30(水) 00:00:00

409 Conflict is used in cases where the HTTP request was valid, but the current state of the server prevents it from being executed.

This is useful for APIs. A few hypothetical examples:

  1. A user wants to add a file to a folder, but the folder doesn’t exist yet.
  2. Amazon S3 uses it when you try to delete a bucket that’s not empty.
  3. You’re creating a new blog post, but it refers to a category that was deleted.

The key here is that it might be possible to run the same request again after this conflict is resolved, by for example:

  1. Creating the folder you’re trying to upload to.
  2. Removing all items from the S3 bucket before removing it.
  3. Re-creating or undeleting the category you’re trying to assign to the blog post.

The HTTP specification recommends that the HTTP response body contains information in the response body that the client or user can use to resolve the conflict.

Example HTTP/1.1 409 Conflict Content-Type: text/plain You tried to upload a file to a folder that doesn't exist. Create the folder and try again! Usage

If you are building an API, it’s a good idea to emit 409 conflict for these situations. It can be seen as a more specific code than 400 and sometimes 403.

You might even be able to define a response body that a client can use to automatically resolve the conflict, or at least inform the user in a standard way what was wrong.

カテゴリー: php

Dave Cramer: A Guide to Building an Active-Active PostgreSQL Cluster

planet postgresql - 2019-01-29(火) 23:39:33

One of the toughest challenges facing database engineers today is ensuring their data is always accessible so they can meet the high-availability  requirements for their applications.

While this problem may seem trivial in the world of applications where one can have many instances behind geographically distributed load balancers, in the database world where there is only one version of the data globally, the problem faces many obstacles.

PostgreSQL replication has advanced considerably in recent major releases, including continuous improvements to streaming replication and the addition of logical replication in PostgreSQL 10. While these capabilities represent important enhancements to PostgreSQL, enabling users to address a wide variety of clustered architectures, they do not (easily) address the use cases where the application requires access to an updatable database in two or more geographic regions - often referred to as an "active-active" cluster.

More specifically, an active-active cluster is one where the application can write to any instance in the cluster and the data will be written to all of the instances in the cluster, enabling each instance in the cluster to be used to:

  • Provide near zero downtime as the new instance is already in a read/write state; there is no need to reconfigure it.
  • Provide near zero downtime upgrades from one version to another
  • Improve latency for users in geographically distributed clusters. By providing an instance physically closer to the user latency is reduced.

While there are a number of proprietary solutions that attempt to address active-active PostgreSQL requirements, this post and a series to follow provides users with potential reference architectures and configurations that enable active-active PostgreSQL configurations using entirely open source software.

This post is of course only one approach to deploying an active-active PostgreSQL cluster.  There are other ways to deploy an active-active setup with PostgreSQL.  I will cover some ways to do this in the f

カテゴリー: postgresql

Kaarel Moppel: Major feature update for the pgwatch2 Postgres monitoring tool

planet postgresql - 2019-01-29(火) 18:13:23

Again I could make good (hopefully) use of the “slow time” around the turn of the year…and managed to push out another set of features for our Open Source PostgreSQL monitoring tool called pgwatch2 – so a quick overview on changes in this post. Continuing the tradition I’m calling it “Feature Pack 4” as it’s mostly about new features. Git and Docker images carry version number 1.5.0. As our last pgwatch2-related blogpost covered only 1.4.0, I’ll include here also most important stuff from 1.4.5 minor feature release.

Highlight – Monitoring Postgres with Postgres

This is the biggest one this time – finally and quite approprietly for “Postgres-minded” people, there’s now a chance to store all the gathered metrics in Postgres! This of course doesn’t necessarily mean that Postgres is best for storing Time-Series Data although it performs very nicely thanks to JSONB…but in general it’s a good compromise – more disk space (~3-4x) at comparable query times to InfluxDB…but with full power of SQL! Meaning some saved time learning a new (and quite limited) query language. And after all, only a few people are running dozens and dozens of databases so performance is mostly not and issue. And on the plus side we can now ask questions that were previously plainly not possible (no joins, remember) or were only possible by storing some extra columns of data (de-normalizing).

The new functionality is designed for the latest Postgres version of 11, but as people run all kinds of different versions and might not want to set up a new cluster, there is also a legacy mode, that will cost more IO though. In total there are 4 different “schema modes” so that people could optimize their IO based on needs:

  • a separate normal table for each distinct metric (legacy mode)
  • a separate partitioned table for each distinct metric + weekly partitions
  • a separate partitioned table for each distinct metric + separate sub-table for each distinct monitored host + monthly partitions. Best for monitoring 50+ DB-s
  • custom mode – all data inserted into a si
カテゴリー: postgresql