Using change data capture to perform flexible aggregations with DynamoDB and Druid

DynamoDB is often a perfect fit as the primary, operational system of record store for many types of application. It is fast, maintenance free and (if you use it well) economical. However it cannot perform aggregations or provide analytics on the data it holds. Reflecting the same data in another store like Apache Druid is commonplace. The below video demonstrates this idea in operation. The DynamoDB system of record is updated and Apache Druid is then used to perform aggregations on up to date values....

August 7, 2022 · Alex Reid

Running Druid on Cloud Dataproc

Today I discovered a ridiculously easy way to run a Druid cluster on GCP: flick a switch when creating a Cloud Dataproc cluster. It’s even a recent version (0.17 at time of writing). Great, right? (Assuming you don’t mind using something labelled alpha by Google.) Customisation There is literally no documentation other than the page I stumbled across: Cloud Dataproc Druid Component. After running up a small cluster, I noticed some things were missing:...

April 16, 2020 · Alex Reid

Squeezing ClickHouse into Cloud Run

Here is one of my bad ideas that was nevertheless fun to think through. I am not suggesting you actually do this for anything serious. Really, I’m not. Serverless data technologies already exist. The idea I really like ClickHouse. Compared to the expanse of complex software in the big data space, it’s refreshing to run a single process. Although not without its foibles, it’s very fast and versatile. Running it on Cloud Run is likely a bad idea....

January 23, 2020 · Alex Reid

Exploring Druid

Update February 2019 — this post still gets a few views. Some of the content probably still makes sense, but Druid has moved on a lot.Managed services, SQL support, BI tools and so on. I’m still happily using it in production! The big data technology space is vibrant but crowded. There is a plethora of technologies, often appearing to be in competition with each other. There are several excellent SQL-on-Hadoop projects, for instance....

January 23, 2017 · Alex Reid