As we know neo4j has a master slave replication with eventual consistency so there is not the typical ACID requirements. The way is ether wring the master which pushes to the slaves. But it is also possible to write to the slaves directly which is super save but much slower since syncronization between slaves is required.
In gerneral (not very specific to neo4j there are a view concerns)
- Cluster management (how to handle new machines joining or leaving the cluster as well as heartbeat messages) this also holds true for failover (Master election, Distribution of Master status)
- Replication (synchronized id-generation, distributed locks, and so on
Neo4j was building on Apache Zookeeper to take care of the concerns. Michael points out that there have been problems with using Zookeeper.
- How to koordinate Zookeeper with neo4j cluster
- unrelieable operations
- people did not like the typology required from the zookeper architecture
- Also Zookeeper is electing a new master to often which especially bad in a heavy load environment
- no dynamic reconfigeration of the Zookeeper cluster.
The solution of neo4j was to rewrite the multi-paxos paradigm and replace zookeper. Micheal especially suggests to read the Paxos Made Simple paper by Leslie Lamport. The core exists of State Machines implemented using Java Enums.
I still remember a lot of discussions in the reading club on distributed graph data bases. We never actually looked into Apache Zookeper and the Paxos paradigm which would certainly an interesting technique to learn!
In the next part there was a lot of detail discussions which where hard to follow for me since I am so far not familiar with the Paxos Paradigm.
If you are curious about the HA of neo4j and you can bet I am you can look into Peter’s screencast that leads you through setting up neo4j HA
Setting up a local HA cluster in Neo4j 1.9 from Peter Neubauer on Vimeo.