Latest Posts

Introducing the TPC series - TPC-H Query 1: Column Storage and Local Aggregation

Reading Time: 9 min, Date: 7/14/2025

After the wonderful feedback on the previous blog about Iceberg - it is now time to switch gears. Databases are more than row storage engines. They are algorithm machines, helping that...

Iceberg, The Right Idea - The Wrong Spec - Part 2 of 2: The Spec

Reading Time: 21 min, Date: 7/10/2025

Let us finally look at what is so wrong with the Iceberg spec and why this simply isn't a serious attempt at solving the metadata problem of large Data Lakes. In the first part of this I took...

Iceberg, The Right Idea - The Wrong Spec - Part 1 of 2: History

Reading Time: 17 min, Date: 7/6/2025

Iceberg: The great unifying vision finally allowing us to escape the vendor lock-in of our database engines. One table and metadata format to find them ... And in the darkness bind I the...

Greed vs Bravery Based Engineering

Reading Time: 13 min, Date: 3/1/2025

It is difficult to find words that accurately describe the cruelty, selfishness and outright evil on display from the White House these days. The guiding principle of Gordon Gekko: is...

Coupling, Complexity, and Coding

Reading Time: 12 min, Date: 2/23/2025

Why is the IT industry obsessed with decoupling? Does breaking systems into smaller parts you can understand individually really make them easier to manage and scale? Today, we the of...

Making Decent Python Libraries - Part 1

Reading Time: 6 min, Date: 12/16/2024

Python has now infected computer science departments and data analysts across the planet. The resulting ecosystem is a mess of libraries - that are often poorly designed out outright...

Why are Databases so Hard to Make? Part 4 - Digging up Graves

Reading Time: 9 min, Date: 10/4/2024

In my last post about high speed DML, I talked how it is possible to modify tables at the kind of speeds that a modern SSD can deliver. I sketched an outline of an algorithm that can easily us...

Why are Databases so Hard to Make? Part 3 - High Speed DML

Reading Time: 14 min, Date: 9/24/2024

After a brief intermezzo about testing (read about my thoughts here: Testing is Hard and we often use the wrong Incentives) - it is time to continue our journey together to where we will A...

Testing is Hard and we often use the wrong Incentives

Reading Time: 14 min, Date: 6/30/2024

I have been spending a lot of time thinking about testing and reviewing testing lately. At a superficial level - testing looks simple: Write test matrix, code tests, run tests, learn we...

Why are Databases so Hard to Make? Part 2 - Logging to Disk

Reading Time: 8 min, Date: 6/23/2024

Transaction logs. Why are they so important and why are they so hard to make?