Graph Library Nonsense

Bits and pieces of open source thinking. Not a central part of the redesign, but is one of a few general tidy ups I wanted to get done as part of the process of releasing some long running open source code.

API Cleanup

A big part of why this work is still not done is the divergent APIs between the JS and Ruby versions.

gem install mementus
npm install mementus

So really, the main cleanup task here is to go through and review each part of the API, then choose the best one and port it across to the other language while documenting the reference implementation.

Creating graphs with entities and values

I decided to move some code in another library. Although I don’t like adding code or making things bigger than they need to be, this solves some questionable loose behaviour with untyped property objects being scattered across the graph. While this style of dynamic stringly typed API does go with the flow in both Ruby and JS, it also leads to a certain level of cruft and imprecision when building content and knowledge structures.

The first step of getting this in will be to switch it on as an optional extension to the Node and Edge property accessors. Instead of having direct attribute access to a hash/associative array to get and set properties, the graph elements will support new accessors for entity and/or value with an object type that binds to the element label.

// Import model
import { Planet, Moon, Orbit } from "solarsystem.js"

// Definition step
const graph = new Graph(builder => {
  builder.defineNode("planet", Planet)
  builder.defineNode("moon", Moon)
  builder.defineEdge("orbit", Orbit)

  // Create node instances with values assigned
  const from_neptune = new Node("planet", Planet.fromArray(["Neptune", 1.638]))
  const to_triton = new Node("moon", Moon.fromArray(["Triton", 2.061]))

  // Model the orbit as an edge rather than a node just to show the API.
  // I can’t remember if relationship properties are indexed or not, but if not
  // don’t do it this way—use the index queries on nodes to make things simple
  // and fast.
  // See: https://community.neo4j.com/t/heavy-relationships-vs-node-hops-which-is-better/4466/2
  builder.addEdge("orbit", from_neptune, to_triton, Orbit.fromArray([
    "retrograde", 354759, 0.000016, 5.876854, 4.39
  ]))
})

// Basic example of attaching value objects to graph elements
node = graph.node(id)
const bodyDensity = node.value.density

// This contrasts with the old API that supports dynamic attribute access
const bodyDensityFromStr = node.props["density"]
const bodyDensityFromMetaPrg = node.props.density

// Hopscotch through each connected element to get to the moon value
graph.n("planet").out_e("orbit").out("moon", { name: "Triton" }).value.density

After writing all of that as an API sketch, am I convinced yet? I think I pretty much am at this point. Let’s lean-in to the fact this is a memory store not a database and use the host language more expressively.

What this unlocks

Cross-project extensibility. Jumping over from JavaScript into Ruby now. Previous instances of model code similar to this included the whole wrangled setup of generic value object and entity classes.

MyBlog = Maetl::Model::Blog

post = MyBlog::Post.new(load_text_doc(path))
tags = [
  MyBlog::Tag.new(name: "web"),
  MyBlog::Tag.new(name: "design")
]

MyJournal = Maetl::Model::Journal
entry = MyJournal::Entry.new(load_text_doc(path))
entry.link
entry.title
entry.summary
entry.created
entry.published
entry.updated

If Model::Blog and Model::Journal here are part of a thin blog definition layer that implements the generic value type from the graph package there are emergent possibilities for layering custom wrappers over the graph query API. This would make the graph API much easier to use in specific contexts.

Forwarding Post#tags to an edge traversal might look like the following:

module Maetl
  module Model
    ARCHIVE_DATE = DateTime.new(2020)

    # ...
  end

  class Graph
    attr_reader :graph

    def initialize(builder)
      @graph = Mementus::Graph.new(builder)
    end

    def posts
      graph.n("post")
    end

    def archive
      graph.n("post")
           .sort_asc(:published_at)
           .take_until { |p| p.published > Model::ARCHIVE_DATE }
    end
  end
end

Already got a bunch of code that looks and works like this. The main task ahead is going to be prying all the value objects free and bundling that back into the Mementus API.

Fixing inconsistencies in the APIs isn’t on the critical path for the redesign but is helpful for a whole lot of other (hidden) stuff in the background.

The main thing is that I want to play around with object models and prototype new content ideas in code, so just need to take care of all this annoying plumbing before I’m properly able to do that in a satisfactory way.

Object-oriented fuckery

Neither JavaScript nor Ruby properly support value objects so organising data in this way requires library support or an app-specific pattern to copy.

For fine-grained memory graph models that have some crossover with a traditional programming approach using a database, questions come up around mutability and shared-memory deduplication versus allocating multiple objects representing the same ‘thing’. I think it’s a good idea to avoid this as much as possible and use immutable structs for everything but if that isn’t possible for whatever reason, we’d need a way to separately handle objects with equality based on identity as well as equality based on value.

Those equality semantics could potentially mark out the essential difference between Entity and Value base types. They should have a near-identical attribute API. I think I will port Value over for starters and not worry about Entity until I have a specific need for it on a project.

trig_station1 = Point.new(latitude, longitude, elevation)
trig_station2 = Point.new(latitude, longitude, elevation)

if trig_station1 == trig_station2
  puts "equality based on value"
else
  puts "equality based on identity"
end

Lightweight struct API

I’ve introduced two methods of generating struct classes with attribute accessors that match a particular schema.

For basic untyped attribute models that simply require a value object as a placeholder for node or edge data, the Value constructor follows the calling convention of the stdlib Struct in Ruby, with attributes specified as a list of slots.

Point = Value.new(:latitude, :longitude, :elevation)

For models where attribute type checking is required, use the alternative keyword-based constructor to generate a Struct.

Point = Value.new(latitude: String, longitude: String, elevation: Integer)

Both methods of defining a Value will generate a Struct with a patched constructor API that supports keyword arguments and type validation.

The great thing about essential model plumbing code is how logical unit testing becomes. The objects and attributes are self contained and exercising their different options translates directly to the setup-expectation-teardown pattern of many testing frameworks (eg: Rspec here).

specify "value equality" do
  Weight = Value.new(value: Integer, unit: String)
  w1 = Weight.new(value: 50, unit: "kgs")
  w2 = Weight.new(value: 50, unit: "kgs")
  expect(w1 == w2).to be(true)
  expect(w1 === w2).to be(true)
  expect(w1).to eq(w2)
end

This contrivance is possibly an idiosyncratic approach that goes against a lot of people’s tastes. I get the argument that using manually declared ‘PORO’ classes with an attribute builder helper would be clearer and easier to explain but there are some benefits to leveraging the stdlib Struct here.

Firstly, the code required is very minimal because the struct library itself handles the underlying accessor methods and value equality. Less code leads to less implementation complexity and (hopefully) fewer maintenance issues.

Secondly, in MRI the underlying RVALUE C type is a union that includes a specialised RStruct type, so stdlib structs seem to have special affordances. I’ve generally found them to be faster than alternatives (this may change between Ruby versions though and I haven’t benchmarked this latest version yet so ymmv).

If anything here becomes an annoyance it’s easily possible to introduce an alternative Entity model with plain object and attr_reader semantics. How this translates to JavaScript implementation patterns, I’m not entirely sure but if I procrastinate on it for long enough, specialised Tuple and Record types might land in browsers and I’ll finally be able to do what I want properly.

Now that there’s an initial commit landing this functionality, it’s time to wire up the graph builder API to support a mapping between node and edge labels and defined value types.

Graph builder recap

The default immutable graph provides access to the underlying write methods during the initialization phase when the graph is first constructed. Here, the graph is presumed to be in an intermediate state and once the object is returned, capability to add nodes and edges is no longer available. (Overriding this requires creating a mutable graph—out of scope here, but the write methods should be basically the same.)

There are three different ways to write a node with associated metadata: set_node, add_node, and create_node but we’ll ignore set_node for now.

graph.add_node(id: id, label: label, props: props)

graph.create_node do |node|
  node.id = id
  node.label = label
  node.props = props
end

This translates to JavaScript as:

graph.addNode({id, label, props})

graph.createNode(node => {
  node.id = id
  node.label = label
  node.props = props
})

A quick win would be to assign a value binding instead of a props object/hash here.

graph.create_node do |node|
  node.label = :location
  node.value = Location.new(lat: latitude, lon: longitude)
end

A simple way to hook this in as a schema validation layer would be to provide a way to define a mapping between labels and value objects.

graph.define_node(:location, Location)

Could this be opt-in and called automatically on the first instance of create_node being defined? Or required, which might make it easier to generate an alternative schema validation mapping for binding add_node parameters to labels.

The schema validation could automatically accept both an instance of the value type or a plain hash and ensure that only allowed parameters make it through in the right format.

graph.add_node(label: label, value: { lat: latitude, lon: longitude})
graph.add_node(label: label, value: Location.new(lat: latitude, lon: longitude))

Sketching all this out highlights the lack of clarity in the API autogeneration of IDs vs user-specified IDs. Incremental or hash ID generation is unrelated to this new feature but making these other changes to the builder API means maybe it’s is a good time to fix IDs as well.

Work in progress

Fixing the builder API. Asking more of the user in terms of configuring the graph for specific type mappings and ID strategies. Once it’s all done it shouldn’t need to change much for a while.

Next step

Stepping through each section of the API, there are a few things to push over to the JS side from Ruby but most of the changes are backporting new naming conventions and improved ways of doing things from JS into Ruby.