フィードアグリゲーター

Louise Grandjonc: PostgreSQL's indexes - GIN algorithms

planet postgresql - 2019-07-02(火) 02:20:22
Introduction In the previous article, I introduced the internal data structure of a postgres GIN index. Now it’s time to talk about searching and maintaining them. Searching in a GIN index When postgres uses the GIN index, you’ll notice in the EXPLAIN that it’s always doing a bitmap heap scan. It means that the search algorithm returns a map of tids (pointers to rows) in physical memory order. Considering the way the data is stored inside the posting lists and trees, it makes a lot of sense to use this method.
カテゴリー: postgresql

Louise Grandjonc: PostgreSQL's indexes - GIN

planet postgresql - 2019-07-02(火) 02:20:21
Introduction GIN (Generalized Invertes Index) indexes are often used to index arrays, jsonb, and tsvector (for fulltext search) columns. When it comes to array, they are, for example, used to verify if an array contains an other array or elements (the <@ operator). In the documentation you can see the full list of operators. But the question that I want to answer in this article is: “why should we use GIN indexes for this data types and operators ?
カテゴリー: postgresql

Ibrar Ahmed: Deep Dive Into PostgreSQL Indexes Webinar: Q & A

planet postgresql - 2019-07-01(月) 22:39:13

I want to thank everybody who attended my session “Deep Dive in PostgreSQL Indexes” in Austin. It was quite a wonderful experience! To cover a bigger audience, I also did a webinar on the same topic. There were many questions in that webinar, but unfortunately, there was not enough time to cover each and every question, so I decided to have a followup Q&A session with a blog post on the topic.

The recording and slides are available here. Below is the list of your questions.

Q1: Is there a concept of the functional index in PostgreSQL?

The functional index is when an index is defined on the result of a function. This is present in PostgreSQL 7. In later releases of PostgreSQL, you can define the index on the result of an expression. The expression can be any PostgreSQL expression. The function is a subset of expression.

Q2: Can I create an index using a table row data that is attached to the index? Does my query have to use index-only-scan?

Yes, if all selected columns are in the index, then index-only-scans will be used.

Q3: It’s a little bit hard to choose the right index type. Is there any offer mechanism in Postgres for index types?

Yes, that’s true that is hard to choose, but it all depends on your data and queries types. I have mentioned recommendations in my slides. There is no mechanism for that in PostgreSQL.

Q4: Is it possible to manipulate which index is used by a query if you have multiple indexes?

No, it’s optimizer’s responsibility to choose between defined the indexes. Believe that the PostgreSQL optimizer will choose the best option for you.

Q5: If there’s a need to index a date column, would it be better to use B-TREE or BRIN index?

A date data type is mostly aligned with physical storage. If you are not making many updates and delete operations, then go for the BRIN. But if you have too many updates and deletes, then B-TREE is best for you.

Q6: For time-based filters what kind of index is preferable?

Time is also aligned with physical storag

[...]
カテゴリー: postgresql

Hans-Juergen Schoenig: PostgreSQL meets “Data Science and AI”

planet postgresql - 2019-07-01(月) 18:40:51

Machine Learning, Deep Learning, Artificial Intelligence, and Data Science have become really important topics these days. Everybody is looking for new technologies and tries to make use of those recent developments. PostgreSQL can help to achieve those goals, and it offers reliable backend storage for your data.

PostgreSQL vs. CSV files vs. commercial databases

In the past we have seen that many in the Data Science and AI community are still using CSV files to process data. While there is a lot of tooling available (e.g. the Python pandas library) it might not be the best of all decision to build your models on CSV files.

So what is wrong with a CSV file? Well, actually nothing as long as …

  • the format of your file is correct
  • the amount of data is reasonable
  • ALL the data in the file is needed
  • nothing changes

All those points might seem obvious BUT my practical experience shows that they are not and that those issues are seriously underappreciated. What do I mean by that?

Recently we have seen a customer, who wanted to store interest rates for some financial models. What did we find in the data? Let me quote: “3 percent” (as a string). Now: How do you expect your AI models to work if the data is so wrong? To do data mining you will need high quality data – not some crap. Which leads to an important observation:

“If you are doing Artificial Intelligence 80% of your effort has to go into data preparation – only 20% are the real model”

In short: You cannot seriously expect a model to learn something about “interest rates” if the raw data does not even ensure that the interest rate is actually a number.

The second remark is about the amount of data: Suppose you got a TB of data. If you want to process or maybe sort your data. What do you think? Who is better at sorting? A database, which has been optimized to sort data for 30 years or some hand written Python program, which you have written over a weekend? From experience I can tell you: You got no chance. The d

[...]
カテゴリー: postgresql

Regina Obe: PostGIS 3.0.0alpha3

planet postgresql - 2019-07-01(月) 09:00:00

The PostGIS development team is pleased to release PostGIS 3.0.0alpha3.

This release works with PostgreSQL 9.5-12beta2 and GEOS >= 3.6

Best served with PostgreSQL 12beta2.

Continue Reading by clicking title hyperlink ..
カテゴリー: postgresql

Dave Page: Indexing Documents for Full Text Search

planet postgresql - 2019-07-01(月) 00:28:00
During a conversation with $BOSS a couple of days ago the topic of the new Full Text Search in MongoDB came up. He (knowing of, but presumably having never worked with FTS in PostgreSQL) suggested I might create a tool like the one the MongoDB guys were proudly showing off in their keynote video from their recent conference to make it easy to index and search HTML documents on disk. I looked at him with a slight tilt of the head and remarked that it's already easy to do that, and just takes a few lines of Python code or similar.

So, here I am on my flight back to the UK having written 36 lines of code whilst munching on my breakfast and enjoying my free seat upgrade. I should probably note that those 36 lines of code include import statements, comments, blank lines for readability, and a nicely formatted 7 line SQL query that could easily be just one line. Plus, the vast majority of the code is really just boilerplate to read a directory of files and extract their contents.

Anyway, before I can go into that, it was necessary to first create a schema to hold the data we'd need to query. This is basically just a table to hold whatever information you like about each document, with a tsvector column for searching. For my test I decided to make the path to the file the primary key, and to store the title of the HTML document and the text in the body. The body text isn't really needed, but it does make it a touch easier to use a trigger to automatically maintain the tsvector data - plus of course, I could also then use that column and get rid of the source documents.

Here's what the table looked like:

  CREATE TABLE docs
  (
      path text NOT NULL,
      title text NOT NULL,
      body text NOT NULL,
      tsv tsvector NOT NULL,
      PRIMARY KEY (path)
  );

  CREATE INDEX docs_tsv_idx
      ON docs USING gin (tsv);

I also added the aforementioned trigger to maintain the tsvector. I got slightly fancy with this, indexing both the title and the body fields, and giving the[...]
カテゴリー: postgresql

Laravel News: Debugging .gitignore

phpdeveloper.org - 2019-06-30(日) 00:00:01
Sometimes you might create a file that is unexpectedly ignored by git. Although it doesn’t happen often, it can be tricky to figure out which .gitignore file is responsible for ignoring a file in your project. It is super easy once you know about a handy git co...
カテゴリー: php

colinodell.com - Blog: league/commonmark 1.0.0 has been released!

phpdeveloper.org - 2019-06-29(土) 23:22:21

After 5 years of development, 3,000,000 downloads, and 58 releases, I'm extremely pleased to announce that league/commonmark version 1.0.0 has been released! What is league/commonmark? league/commonmark is a highly-extensible PHP Markdown parser which is fully compliant with the CommonMark spec. ...

カテゴリー: php

php|architect: Internal Apparatus: Memoization

phpdeveloper.org - 2019-06-29(土) 06:30:01

By Edward Barnard Memoization is another form of memory use. I see it regularly used for improving application performance, but it’s also used to improve compiler performance. To learn the concept, we’ll use a PHP project with database tables, then look at other ways to use the technique....

カテゴリー: php

PHP: Hypertext Preprocessor: PHP 7.4.0 alpha 2 Released

phpdeveloper.org - 2019-06-29(土) 06:30:01

PHP team is glad to announce the release of the second PHP 7.4.0 version, PHP 7.4.0 Alpha 2. This continues the PHP 7.4 release cycle, the rough outline of which is specified in the PHP Wiki. For source downloads of PHP 7.4.0 Alpha 2 please visit the download page. Please carefully test this version...

カテゴリー: php

Tom&aacute;&scaron; Votruba Blog: 7 News and Changes in Symplify 6

phpdeveloper.org - 2019-06-29(土) 06:30:01

Do you use Easy Coding Standard, Package Builder or Statie? Do you need to upgrade safely? Do you want to benefit from new features?

This post shows 7 news and changes, that might affect you (in a good way).

カテゴリー: php

Vasilis Ventirozos: Accessing PostgreSQL data from AWS Lambda

planet postgresql - 2019-06-29(土) 04:45:00
All databases need some maintenance that we usually perform using scheduled tasks. For example, if you have an RDS instance and you want to get a bloat report once a month, you’ll probably need a small EC2 instance just to do these kinds of things. In this post, we will talk about accessing RDS, getting a result from a SQL statement and reporting it to a slack channel using a scheduled Lambda function; let’s call this poor man’s crontab :)>
Before we start, psycopg2 isn’t supported by lambda so it needs to be packaged into the lambda deployment package along with any other dependencies, but since psycopg2 requires libpq it needs to be compiled with libpq statically linked. There are many binaries of psycopg2 that you can download, but I would suggest you compile your own, using the latest PostgreSQL source code. We’ll talk about all these steps throughout this post.
We’ll cover the following:
  • Creating a new AWS postgres RDS instance
  • How to compile psycopg (we’ll use docker for that)
  • Code for the function
  • Packaging
  • Lambda function, trigger, schedule
  • Testing

I won’t get into much detail about the steps of building an RDS instance as it is pretty straight forward, assuming you have your aws client setup, you’ll need something similar to the following, if not, go to aws console and clickety-click.
#!/bin/sh
aws rds create-db-instance \
--db-subnet-group-name [your subnet group] \
--db-security-groups [your security group] \
--db-instance-identifier [your instance identifier] \
--db-instance-class db.t2.micro \
--engine postgres \
--allocated-storage 5 \
--no-publicly-accessible \
--db-name [your db name] \
--master-username [your username] \
--master-user-password [your password]\
--backup-retention-period 3
For compiling psycopg2 we need a linux installation, I am on a mac (don’t hate) so I used a docker container (ubuntu bionic) for the compiles. You can find the Dockerfile here

The steps for compiling are the following :
# Download sou[...]
カテゴリー: postgresql

Bruce Momjian: Ibiza: A Different Type of Conference

planet postgresql - 2019-06-28(金) 22:45:01

Having returned from last week's Ibiza conference, I have a new understanding of the event's goals. I know there was some uncertainty about the event, for several reasons:

  • Having a conference at a resort is a new thing for our community. We started years ago with conferences in universities, and steadily grew to hotel-based conferences in minor and then major cities.
  • Ibiza has a reputation in some countries as a party destination. The wildest thing I saw were yelping dogs being walked along the beach.
  • The beach mention often confused people. This was part of an effort to raise the importance of the hallway track, rather than it being just scheduled holes between technical talks. I didn't realize it was possible during a 90-minute lunch break to eat and swim in the ocean, but I did it!
  • There is historical abuse of resort-based conferences as paid or tax-free vacations. This was certainly not the case for Ibiza, but it is an additional hurdle.

I returned from the conference with a warm feeling for the venue, the people I met, and the things I learned, as did my wife and daughter. While resort conferences are not for everybody, they are popular in other industries, and there is certainly a need for this type of event. The next scheduled "beach" conference is in Bali, and I plan to attend.

カテゴリー: postgresql

第 162 回理事会議事録 (2019-6)

www.postgresql.jp news - 2019-06-27(木) 19:09:23
第 162 回理事会議事録 (2019-6) anzai 2019/06/27 (木) - 19:09
カテゴリー: postgresql

ページ