Using change data capture to perform flexible aggregations with DynamoDB and Druid

DynamoDB is often a perfect fit as the primary, operational system of record store for many types of application. It is fast, maintenance free and (if you use it well) economical. However it cannot perform aggregations or provide analytics on the data it holds. Reflecting the same data in another store like Apache Druid is commonplace. The below video demonstrates this idea in operation. The DynamoDB system of record is updated and Apache Druid is then used to perform aggregations on up to date values....

August 7, 2022 · Alex Reid

Patching in a development service

Composing systems out of smaller microservices has been commonplace for several years now. One trade off is increased complexity around development environments. Suppose the system you are working on consists of hundreds of services that all potentially make requests to one and other. If you are unlucky you might be faced with the task of spinning everything up locally or within a new cloud provider account. This is costly and often too much work....

July 14, 2022 · Alex Reid

DynamoDB pagination with page numbers in URLs

Back when we all used SQL databases, it was common to paginate through large result sets by appending LIMIT offset, rows per page to a SELECT query. Depending on the schema, data volume and database engine, this was inefficient to varying degrees. On smaller result sets and with the right indexes, it was… posssibly OK. On larger result sets, the high page numbers would get progressively slower. Databases like DynamoDB prevent this inefficiency by handling pagination differently....

October 27, 2021 · Alex Reid

Filtering and pagination with Cloud Bigtable

In the previous series of posts, we built a data model capable of filtering and paginating product comments with DynamoDB. This post explores how we could solve the same problem with Cloud Bigtable. You might wonder why another technology is now being discussed. It is my belief that a lot of the thinking that goes into a data model design is somewhat portable, whether it be DynamoDB, Cloud Bigtable, Cassandra, HBase, or maybe even Redis....

December 2, 2020 · Alex Reid

Storage and retrieval of comment statistics in DynamoDB, using index overloading and sparse indexes

This series of posts demonstrates efficient filtering and pagination with DynamoDB. Part 1: Duplicating data with Lambda and DynamoDB streams to support filtering Part 2: Using global secondary indexes and parallel queries to reduce storage footprint and write less code Part 3: How to make pagination work when the output of multiple queries have been combined Part 4: Storage and retrieval of comment statistics using index overloading and sparse indexes...

November 25, 2020 · Alex Reid

DynamoDB pagination when multiple queries have been combined

This series of posts demonstrates efficient filtering and pagination with DynamoDB. Part 1: Duplicating data with Lambda and DynamoDB streams to support filtering Part 2: Using global secondary indexes and parallel queries to reduce storage footprint and write less code Part 3: How to make pagination work when the output of multiple queries have been combined Part 4: Storage and retrieval of comment statistics using index overloading and sparse indexes...

November 24, 2020 · Alex Reid

Filtering with GSIs and parallel queries in DynamoDB

This series of posts demonstrates efficient filtering and pagination with DynamoDB. Part 1: Duplicating data with Lambda and DynamoDB streams to support filtering Part 2: Using global secondary indexes and parallel queries to reduce storage footprint and write less code Part 3: How to make pagination work when the output of multiple queries have been combined Part 4: Storage and retrieval of comment statistics using index overloading and sparse indexes...

November 21, 2020 · Alex Reid

Filtering without using filters in DynamoDB

It never ceases to amaze me just how much is possible through the seemingly constrained model that DynamoDB gives us. It’s a fun puzzle to try to support access patterns beyond a simple key value lookup, or the retrieval of an ordered set of items. The NoSQL gods teach us to store data in a way that mirrors our application’s functionality. This is often achieved by duplicating data so that it appears in multiple predefined sets for inexpensive retrieval....

November 9, 2020 · Alex Reid

Running Druid on Cloud Dataproc

Today I discovered a ridiculously easy way to run a Druid cluster on GCP: flick a switch when creating a Cloud Dataproc cluster. It’s even a recent version (0.17 at time of writing). Great, right? (Assuming you don’t mind using something labelled alpha by Google.) Customisation There is literally no documentation other than the page I stumbled across: Cloud Dataproc Druid Component. After running up a small cluster, I noticed some things were missing:...

April 16, 2020 · Alex Reid

Squeezing ClickHouse into Cloud Run

Here is one of my bad ideas that was nevertheless fun to think through. I am not suggesting you actually do this for anything serious. Really, I’m not. Serverless data technologies already exist. The idea I really like ClickHouse. Compared to the expanse of complex software in the big data space, it’s refreshing to run a single process. Although not without its foibles, it’s very fast and versatile. Running it on Cloud Run is likely a bad idea....

January 23, 2020 · Alex Reid

Hello Hugo and Cloud Run!

I’ve been meaning to get off Medium for a while so decided to self-host these posts. Things have changed quite a lot since the last time I did this, which is probably approaching twenty years ago. The page you’re seeing is coming from nginx image hosted on Google Cloud Run, which contains a site generated by Hugo. Google Cloud Build is used to build this image and deploy it to Google Cloud Run....

January 19, 2020 · Alex Reid

Driving an OLED display with a Raspberry Pi and AWS IoT

This project started when my son asked me for a replica train departures board for Christmas. I thought this was a great idea and this looks to be a really neat implementation, but I wanted us to have a go at building one ourselves. I promised that if we failed miserably I’d buy him one! After getting some off-the-shelf code up and running, I wanted to rethink the software. The resulting code is available on GitHub....

December 19, 2019 · Alex Reid

Exploring Druid

Update February 2019 — this post still gets a few views. Some of the content probably still makes sense, but Druid has moved on a lot.Managed services, SQL support, BI tools and so on. I’m still happily using it in production! The big data technology space is vibrant but crowded. There is a plethora of technologies, often appearing to be in competition with each other. There are several excellent SQL-on-Hadoop projects, for instance....

January 23, 2017 · Alex Reid

There's no shame in code that is simply good enough

Back in my early teens when I started developing what could loosely be called software, I didn’t know what I was doing. If it compiled, ran and produced mostly the expected results, then the job was done. As a new programmer, I was immensely productive. Of course, problems came when it was time to fix bugs or extend the software. It was often easier to just start again than to try and understand the rat’s nest of poorly structured and unintelligible code....

November 21, 2014 · Alex Reid