Tech | Alex Reid

Taming real-time data in Excel with rtd.pub

rtd.pub is a platform for easily connecting any real-time data source to Microsoft Excel. You can write code in Go and Python, or simply configure pre-built open source connectors. This is a design and approach I’ve been mulling over for the past couple of months. I have now implemented it. This, and of course spreadsheets in general, are somewhat boring. However, they’re the original low code/RAD tool: in terms of bang for buck and the control they put into the hands of domain experts, nothing comes close....

NATS as a web application backend

I have used NATS on various projects over the years. It provides a high performance transport layer than can be used to connect applications and services. If you haven’t already heard of it, this video is a fantastic primer. NATS supports websocket connections. nats.ws runs in browsers. This means we can directly use NATS as the backend for browser-based applications that need to display realtime streams of data. Why is this interesting?...

End of 2022

Some technologies I’ve enjoyed using this year: Go: still a lovely, simple to use and learn language. I took the time to understand the fairly new (?) support for generics. When I return to a Go codebase I feel right at home. The code is incredibly easy to read and the standard library is a joy. It remains my preference for writing APIs. Consul and Envoy: when you have a lot of services to securely link together, this is a great combination....

Using change data capture to perform flexible aggregations with DynamoDB and Druid

DynamoDB is often a perfect fit as the primary, operational system of record store for many types of application. It is fast, maintenance free and (if you use it well) economical. However it cannot perform aggregations or provide analytics on the data it holds. Reflecting the same data in another store like Apache Druid is commonplace. The below video demonstrates this idea in operation. The DynamoDB system of record is updated and Apache Druid is then used to perform aggregations on up to date values....

Patching in a development service

Composing systems out of smaller microservices has been commonplace for several years now. One trade off is increased complexity around development environments. Suppose the system you are working on consists of hundreds of services that all potentially make requests to one and other. If you are unlucky you might be faced with the task of spinning everything up locally or within a new cloud provider account. This is costly and often too much work....

DynamoDB pagination with page numbers in URLs

Back when we all used SQL databases, it was common to paginate through large result sets by appending LIMIT offset, rows per page to a SELECT query. Depending on the schema, data volume and database engine, this was inefficient to varying degrees. On smaller result sets and with the right indexes, it was… posssibly OK. On larger result sets, the high page numbers would get progressively slower. Databases like DynamoDB prevent this inefficiency by handling pagination differently....

Filtering and pagination with Cloud Bigtable

In the previous series of posts, we built a data model capable of filtering and paginating product comments with DynamoDB. This post explores how we could solve the same problem with Cloud Bigtable. You might wonder why another technology is now being discussed. It is my belief that a lot of the thinking that goes into a data model design is somewhat portable, whether it be DynamoDB, Cloud Bigtable, Cassandra, HBase, or maybe even Redis....

Storage and retrieval of comment statistics in DynamoDB, using index overloading and sparse indexes

This series of posts demonstrates efficient filtering and pagination with DynamoDB. Part 1: Duplicating data with Lambda and DynamoDB streams to support filtering Part 2: Using global secondary indexes and parallel queries to reduce storage footprint and write less code Part 3: How to make pagination work when the output of multiple queries have been combined Part 4: Storage and retrieval of comment statistics using index overloading and sparse indexes...

DynamoDB pagination when multiple queries have been combined

This series of posts demonstrates efficient filtering and pagination with DynamoDB. Part 1: Duplicating data with Lambda and DynamoDB streams to support filtering Part 2: Using global secondary indexes and parallel queries to reduce storage footprint and write less code Part 3: How to make pagination work when the output of multiple queries have been combined Part 4: Storage and retrieval of comment statistics using index overloading and sparse indexes...

Filtering with GSIs and parallel queries in DynamoDB

This series of posts demonstrates efficient filtering and pagination with DynamoDB. Part 1: Duplicating data with Lambda and DynamoDB streams to support filtering Part 2: Using global secondary indexes and parallel queries to reduce storage footprint and write less code Part 3: How to make pagination work when the output of multiple queries have been combined Part 4: Storage and retrieval of comment statistics using index overloading and sparse indexes...

Filtering without using filters in DynamoDB

It never ceases to amaze me just how much is possible through the seemingly constrained model that DynamoDB gives us. It’s a fun puzzle to try to support access patterns beyond a simple key value lookup, or the retrieval of an ordered set of items. The NoSQL gods teach us to store data in a way that mirrors our application’s functionality. This is often achieved by duplicating data so that it appears in multiple predefined sets for inexpensive retrieval....

Running Druid on Cloud Dataproc

Today I discovered a ridiculously easy way to run a Druid cluster on GCP: flick a switch when creating a Cloud Dataproc cluster. It’s even a recent version (0.17 at time of writing). Great, right? (Assuming you don’t mind using something labelled alpha by Google.) Customisation There is literally no documentation other than the page I stumbled across: Cloud Dataproc Druid Component. After running up a small cluster, I noticed some things were missing:...

Squeezing ClickHouse into Cloud Run

Here is one of my bad ideas that was nevertheless fun to think through. I am not suggesting you actually do this for anything serious. Really, I’m not. Serverless data technologies already exist. The idea I really like ClickHouse. Compared to the expanse of complex software in the big data space, it’s refreshing to run a single process. Although not without its foibles, it’s very fast and versatile. Running it on Cloud Run is likely a bad idea....

Hello Hugo and Cloud Run!

I’ve been meaning to get off Medium for a while so decided to self-host these posts. Things have changed quite a lot since the last time I did this, which is probably approaching twenty years ago. The page you’re seeing is coming from nginx image hosted on Google Cloud Run, which contains a site generated by Hugo. Google Cloud Build is used to build this image and deploy it to Google Cloud Run....

Driving an OLED display with a Raspberry Pi and AWS IoT

This project started when my son asked me for a replica train departures board for Christmas. I thought this was a great idea and this looks to be a really neat implementation, but I wanted us to have a go at building one ourselves. I promised that if we failed miserably I’d buy him one! After getting some off-the-shelf code up and running, I wanted to rethink the software. The resulting code is available on GitHub....

Exploring Druid

Update February 2019 — this post still gets a few views. Some of the content probably still makes sense, but Druid has moved on a lot.Managed services, SQL support, BI tools and so on. I’m still happily using it in production! The big data technology space is vibrant but crowded. There is a plethora of technologies, often appearing to be in competition with each other. There are several excellent SQL-on-Hadoop projects, for instance....

There's no shame in code that is simply good enough

Back in my early teens when I started developing what could loosely be called software, I didn’t know what I was doing. If it compiled, ran and produced mostly the expected results, then the job was done. As a new programmer, I was immensely productive. Of course, problems came when it was time to fix bugs or extend the software. It was often easier to just start again than to try and understand the rat’s nest of poorly structured and unintelligible code....