I decided to look into using a Graph Database, which is able to elegantly handle relationships and data which doesn’t fit well into simple tables. Fundamentally a Graph Database thinks in terms of vertexes (things) and edges (connections between things).
A graph database can model complex relationships: for example here the Pali Nikayas are children of “su” (Suttas) and “pi” (Pali). There is no need for one of these to be the root of the data.
I found a database called ArangoDB that thoroughly exceeds expectations. ArangoDB is a multi-model database, which means it can act as a document store (like MongoDB), key: value store (like Redis) and a graph database (like neo4j). Not only that but it caters to providing JSON data to client side rendering frameworks - in that sense it is highly practical.
Some highlights:
- Natively JSON, basically it speaks, thinks in and understands JSON.
- Can act as a standalone backend server, presenting a REST API. It has authentication and permissions.
- Is a turnkey application. You just install it and it’s ready to go - no dependencies or complications.
- The devs really care about performance, it has a fast c/c++ core and a V8 javascript layer on top of the core (Foxx microservices), rather than being a java monstrosity.
- It has a web interface with a built-in visualizer and query analyzer.
- There is a consistently clear emphasis on ease of use and ease of learning.
- ArangoDB decreases the number of things you need to know, rather than increases.
Some things it does right:
- Plugins are implemented in javascript, they are close to the core so are ideal for implementing custom logic available on API endpoints. It is 1000000% easier to implement a plugin for ArangoDB than for one of the java monstrosities like elasticsearch.
- Database for a modern PWA, it is practically a dropin replacement for something like MySQL if you want to use JSON and have a REST API - and that’s to say nothing of it’s multi-model capabilities.
- Microservices are a big buzzword but they can also be a nightmare to manage. But with ArangoDB instead of running a bunch of different services, you run just one: ArangoDB. The microservices are js plugins that run inside it - snug up against the data where there is no communication overhead. Instead of proliferation of services you get consolidation. At first it seemed like a weird bolted on feature. Now I recognize it’s bloody brilliant, in fact it’s practically common sense.
- The Arango Query Language (AQL) resembles python and javascript by using
for ... in ...
constructions and such. The flow of AQL is decidedly straightforward and it is far more readable than SQL. A good language provides expressive power, that is relatively few words are required to express your will and make the software carry it out. Like SQL, AQL fulfills this promise while being more readable. - Pragmatic. A “pure” Graph DB can be rendered impractical by adhering to an ideology of graphy purity, a multi-model database does not suffer from ideology, by also acting as a document store and key:value store it can enjoy extremely high performance for operations which a graph model is ill-suited for.
- Import/Export is as JSON files, cleanly separated into structure and data.
There are a lot of things that traditional databases do wrong now… to be fair, MySQL is over 20 years old now, it comes from a completely different era of computing. ArangoDB is only 5 years old, it has grown up in the modern era of computing, and it shows. It also seems to be a product of some very fine German engineering (it is an opensource project, but backed by a German database company)
I believe that ArangoDB can be a complete backend solution for delivering data to the frontend, using custom endpoints implemented in javascript for any logic too fancy for the standard API functions. So we could use it, but what are the compelling reasons?
- To get a consistency guarantee. When all the data is loaded into a graph database, and you try and link everything together with links, you know if there are problems like typos in uids.
- To get data wrangling functions and a REST API for free - also has a nice web interface baked in.
- To have the data in a format which can be exported, and then imported into other applications like visualizers.
- A farewell to Python (at least in the web server). While I love Python, there are advantages in using one language consistently.
- Infinite Possibilities, ArangoDB offers both great flexibility and ease of expression with blazing fast performance (contrast Elasticsearch: which is powerful but too sluggish for many tasks I can imagine) and it is easy to implement highly performant plugins for custom logic.
- A clear winner: ArangoDB is clearly better than alternatives - the closest competitor is OrientDB, but ArangoDB offers a host of side benefits, most of which come from the c/c++ core + V8 architecture.
Why not:
- The more a service on a server is leveraged, the harder it might be to implement functionality into offline mode. Flipside: with ArangoDB any custom logic is implemented in javascript which can be shared with client, and at the end of the day internally the data is basically JSON so can be understood readily by javascript - just some of the data-wrangling functions we got for free earlier have to be reproduced.
- Not all beer and pizzas. For example vinaya parallels in the graph is basically a bomb (if there are 20 things all parallel of each other, that’s 400 links if expressed in the most naive way as a graph). Altough the fact that ArangoDB is a multi-model database with a powerful plugin architecture entirely mitigates this issue - that is to say even though it doesn’t solve all problems, it also doesn’t get in the way of solutions.
- Graph databases aren’t very well known and the most well known one (neo4j) is kind of esoteric, ArangoDB is the youngest of those to appear on the radar.
- Technology lock in. This is obviously unavoidable, you have to use some technology or another. A good strategy is probably to minimize how esoteric your technologies of choice are. ArangoDB isn’t popular I believe mainly due to being new, but it’s also very straightforward in every way.
I’m surprised I didn’t look into a Graph database before now - altough that’s because I barely even know they exist, no Graph DB has risen to prominence and it could easily be assumed they are only suitable for esoteric purposes. Multi-model databases are even more unusual and even newer. It’s probably only in the past 2 or 3 years that it would have started looking like a good idea.