PostgreSQL

/pg138

A place to discuss the PostgreSQL RDBMS

UPDATE heavy workloads on PG are known to scatter rows around on pages which may increase IO when scanning many rows in a single query. Unless you are getting 100% HOT updates (most aren't).

To solve this in the past, I've used CLUSTER -- which isn't great because it requires a lock and takes a long time. It's almost unusable.

pg_repack solves this by doing online re-clustering and storage reclamation.

https://reorg.github.io/pg_repack/
A rebuttal to Uber’s PG/MySQL paper.

Keep in mind, PG has changed since this discussion. A lot of the replication issues are long since put to rest.

The storage sub-systems are the interesting point of difference between these systems. MYSQL’s indexes point to primary keys instead of a CTID. An update to a MySQL row will have same storage location but PG will be a new CTID/page — unless it was a Heap Only Tuple (HOT) update. But this means PG can reduce IO for queries.

As with anything, it’s always a tradeoff. However, database systems in particular are especially sensitive to workload details. The tradeoffs really matter!

https://thebuild.com/presentations/uber-perconalive-2017.pdf
initial thoughts on AWS DSQL

1. PG compatible (pg is dominant at this point)
2. Most of the article talked about HA
3. AWS has a proprietary, internal distributed transaction log service that is supposedly the most important piece of AWS and everything in AWS depends on it. Hope that teams makes 10M annual tc each.
4. It depresses me to think that the future of compute and storage is: run it on aws

It's really neat to think about having a network storage layer that can support many database frontends. this seems like the future. Neon et al. are moving in this direction. Also something that Stonebraker has been talking about for a while now.

https://aws.amazon.com/blogs/database/introducing-amazon-aurora-dsql/
It's quite rich that the "Tablespaces" section follows the "Destroying a Database" section since misconfiguring table spaces is an excellent way to absolutely hose your database.
https://imagedelivery.net/BXluQx4ige9GuW0Ia56BHw/256edf86-79ed-42ef-18dc-95a83e9ab000/original
pg_column_size is nice. I knew a bigint was 8 bytes, but I was curious to see the storage size for comparable numeric data.
https://imagedelivery.net/BXluQx4ige9GuW0Ia56BHw/a5f983a7-32b2-4a09-6b42-c5c2a83daf00/original
Fun PG-related meetup for the SF folks
1229
Harry
@htormey·17:05 30/10/2024
Hosting a panel of expert speakers from engineering at Roblox, YugabyteDB and Coinbase tonight in SF. The subject is SQL at scale, if you are interested in joining signup below. Should be an interesting discussion. https://lu.ma/rcr09cp4
Did you know that Postgres was originally written in LISP? (and C)

"We expected that it would be especially easy to write the optimizer and inference engine in LISP, since both are mostly tree processing modules"

https://dsf.berkeley.edu/papers/ERL-M90-34.pdf
PostgreSQL's C driver is called: libpq

What's PQ?

Q was short for QUEL. QUEL is a dead query language once used by Ingress. PostgreSQL was post-Ingress.

At one time PG supported a query language called PostQUEL.
just added a new chain to @indexsupply and used PG 17!

eager to see how the new vector IO impacts table scans
Very slick: https://postgres.new

Now make a system that takes your schema and inspects pg_stat_statements on your production system and makes actionable recommendations of migrations to improve performance.
One of those ideas that sounds good at first but upon deeper reflection reveals otherwise (project abandoned). Might still be useful for querying information about your infrastructure, but not for creating.
https://iasql.com
Supabase CEO on user complaints the self hosted version is limited and sucks compared to the cloud version

https://www.reddit.com/r/selfhosted/s/v02qg93ao9
How does Postgres manage to do this? 🧐

SELECT S
WHERE W
ORDER BY O
> takes 0.193s

-----
SELECT S
WHERE W
ORDER BY O
LIMIT 20
> takes 74.129s
I enjoy using PGTune when setting up a fresh pg server. Maybe you will too:

https://pgtune.leopard.in.ua
Anyone got a feature / fix that they are eager to see land in 17?

Here are a few I'm looking forward to:
- COPY option ON_ERROR ignore to discard error rows
- to_bin() and to_oct()
Impressive tool from Xata providing a solution for zero-downtime, REVERSIBLE (!!!) migrations—the holy grail.

Excited for this to eventually reach v1. Current feature set is very impressive, but it's not yet ready to handle all possible migrations just yet. Something to keep an eye on.
https://xata.io/blog/pgroll-schema-migrations-postgres
Great post on how PG lock behavior during schema changes. It's surprising (but obvious in hindsight) how a statement that obtains a lock which doesn't conflict with a schema change could still be blocked by another statement that does obtain a blocking lock, due to the FIFO nature of the lock queue.

https://xata.io/blog/migrations-and-exclusive-locks
A CTE (WITH) can be MATERIALIZED or NOT MATERIALIZED.

When MATERIALIZED, the query is computed only once for the outer query. Good for reducing work. Not good when you reference the CTE multiple times with different predicates.

NOT MATERIALIZED forces PG to "inline" the CTE which allows predicate push down but possibly duplicates work.

The default is MATERIALIZED when a CTE is referenced more than once.

https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-CTE-MATERIALIZATION
Have you talked with your loved ones
about independent ordering options lately?
https://imagedelivery.net/BXluQx4ige9GuW0Ia56BHw/11f55890-e26d-44bb-fa15-018bd367a300/original
without looking it up, please guess the following limits:

1. relations per database
2. relation size
What's a good rule of thumb for shared_buffers size? 25% of RAM?
I sometimes find myself thinking: this would a great case for PG ARRAYs

But it always comes back to bite me
most projects that "embed" pg binaries into the test suites end up using this one maven repo: https://mvnrepository.com/artifact/io.zonky.test.postgres

ZONKY