Using change data capture to perform flexible aggregations with DynamoDB and Druid

DynamoDB is often a perfect fit as the primary, operational system of record store for many types of application. It is fast, maintenance free and (if you use it well) economical. However it cannot perform aggregations or provide analytics on the data it holds. Reflecting the same data in another store like Apache Druid is commonplace. The below video demonstrates this idea in operation. The DynamoDB system of record is updated and Apache Druid is then used to perform aggregations on up to date values....

August 7, 2022 · Alex Reid

Squeezing ClickHouse into Cloud Run

Here is one of my bad ideas that was nevertheless fun to think through. I am not suggesting you actually do this for anything serious. Really, I’m not. Serverless data technologies already exist. The idea I really like ClickHouse. Compared to the expanse of complex software in the big data space, it’s refreshing to run a single process. Although not without its foibles, it’s very fast and versatile. Running it on Cloud Run is likely a bad idea....

January 23, 2020 · Alex Reid

Exploring Druid

Update February 2019 — this post still gets a few views. Some of the content probably still makes sense, but Druid has moved on a lot.Managed services, SQL support, BI tools and so on. I’m still happily using it in production! The big data technology space is vibrant but crowded. There is a plethora of technologies, often appearing to be in competition with each other. There are several excellent SQL-on-Hadoop projects, for instance....

January 23, 2017 · Alex Reid