What is a good database key?

A central value add of modeling databases for analytical purposes and speed is that you can restore the sanity, often lost in the source system, that comes from using good keys in each table. But what exactly does it mean for a key to be “good”? Key Properties Keys, they refer to “something” that is … Read more

Row or Column based Storage?

These days, columnar storage formats are getting a lot more attention in relational databases. Parquet, with its superior compression, is quickly taking over from CSV formats. SAP Hana, Vertica, Yellowbrick, Databricks, SQL Azure and many others all promote columnar representations as the best option for analytical workloads. It is not hard to see why. But … Read more

Recommended Reading List

This page maintains my list of recommended reading if you are interested in learning about databases. This is very much work in progress – so check back here later for more. Query Optimisation Dan Tow: “Generating Optimal Execution Plans” A great resource to answer the question: What query plan do you want to get? Anjo … Read more

What is Cost in a Query Planner?

If you ever looked at query plan from a SQL databases – you are likely to have come into something called “cost”. You may even have heard that most advanced database use “cost based optimisers” But what exactly is cost and how does the database use it? SQL Query optimisation – Introduction The goal of … Read more

Join Elimination in Query Planners

Most relational databases are both by human SQL query writers and tools that automatically generate queries. Query tools frequently add joins into the queries, even when those joins are not needed. This results in unnecessary query slowdowns and increase CPU usage. For example, consider the classic Star Schema model with a fact table in the … Read more