Jul 15, 2017 • Programming • 280 words • 2 min read

Cistern Design Notes

Cistern is a project I started 3 years ago, back when I ran a hosting business and wanted a simple tool to aggregate network flow datagrams from my switches.

Cistern hasn’t been a priority for me for a while since I stopped running that business and haven’t touched a physical switch in a long time. At a certain point, I wanted it to support more than just layer 2 and layer 3 network flow information, so I added support (via my appflow package) for generic HTTP application flows.

Development basically stopped at that point. There was a bunch of stuff I didn’t like about the implementation. I wrote a custom time series storage engine for it, but it’s hard to work with just metrics for flow data. I wanted raw events to group in arbitrary ways. The internal architecture of Cistern also moved to a really complicated message passing system with lots of channels, goroutines, and callbacks.

It’s time for the third rewrite.

I don’t have a detailed design since I’m just getting started with the rewrite, but here are my high-level notes:

Events data model
- Group by and aggregate events in arbitrary ways
- Automatic roll-up and dimension reduction
“State sharing architecture” instead of internal message passing
- Inspired by Ken Duda’s talk about Arista’s EOS architecture (on YouTube)
Cloud native
- Support for AWS VPC Flow Logs and CloudWatch Logs
- Automatic backup and restore from S3
- Maybe an AMI to deploy into a VPC
Simple CLI tool to “query” a Cistern node
(Eventually) Grafana support

So yeah, lots of neat stuff coming soon!

The goal is to keep things simple, developer-friendly, and be a great foundation to build on top of.

Misframe