Minimum Viable Database

Jul 15 2015

At Grand Round Table our web app has a Mongo database for saving user data for authentication. Can't go wrong, right? Mongo is easy to install and use. You can just throw data into it and get it back out.

Sure, but once you deploy that app to a server that's another process that you have to make sure is running. Init scripts to make sure it boots right. Data migrations to plan for. Then your app won't load correctly if there's something wrong with the database. Meanwhile it's using up memory on your machine.

In our case, it's really a waste. Our app is stateless except for that user data and there isn't much of it. Mongo was fine, but overkill. So when I was doing some work on the app recently I thought why not use something simpler, like Sqlite?

Yeah. Sqlite is pretty simple. And since it's just a file it's portable. I was intrigued with the idea since a few months earlier I had created a Sqlite database of over million traffic court cases in Philadelphia. I did it because although the data was rich, it could use some cleaning up and I knew it would be a pain for someone to load multiple files for analysis in R or whatever. Python's got a Sqlite driver in the standard library so it seemed like an easy thing to do. The output was a single file on Amazon S3 that anyone can download and instantly write SQL queries against, don't even need to get a programming language involved.

Back to work though. As I continued to think about it, did I even really need SQL?

Would a key/value store be enough, like LevelDB? Without getting into too much detail, I wasn't worried about key collisions in this case (in which, for example, a key of dave is meant to represent two different things). I just had to put objects in and get them back out later. If I just used LevelDB I wouldn't have to worry about schemas and I would have the same portability as I did with Sqlite.

The exercise made me think more about the database choices that we make. It's almost natural to want to use whatever is new and hot. Maybe the marketing people behind that new shiny database talk about how "web scale" and faster it is, but this isn't always true.

Are you storing relational data (ie. you'll have to join different data types based on a shared value)? You should definitely use a relational database even though it isn't sexy (you really can't go wrong with PostgreSQL by the way). Is your data of varying structures and you don't really care about a schema? This could be one of the reasons that NoSQL could perhaps work for you. Need to do some spatial queries on GIS point data? PostgreSQL with PostGIS is a fantasic, full-featured solution, but check first if SpatiaLite is enough for your needs.

There's been an explosion in databases over the last decade or so as every niche in the market gets filled. With it comes to marketing hype about speed and size ("Big Data!") but it's our responsibility to take the time to distinguish between talking points and reality. Choose only the database you need, not the one that you want.

Discuss this post with me on Twitter.

Send a pull request for this post on GitHub.

Dave Walk is a software developer, basketball nerd and wannabe runner living in Philadelphia. He enjoys constantly learning and creating solutions with Go, JavaScript and Python. This is his website.