Setting up a Neo4j Causal Cluster

5 min readApr 12, 2021

Overview

Neo4j’s Causal Clustering provides three main features:

Safety: Core Servers provide a fault-tolerant platform for transaction processing which will remain available while a simple majority of those Core Servers are functioning.
Scale: Read Replicas provides a massively scalable platform for graph queries that enables very large graph workloads to be executed in a widely distributed topology.
Causal consistency: when invoked, a client application is guaranteed to read at least its own writes.

Together, this allows the end-user system to be fully functional and both read and write to the database in the event of multiple hardware and network failures and makes reasoning about database interactions straightforward.

Operational view

From an operational point of view, it is useful to view the cluster as being composed of servers with two different roles: Cores and Read Replicas.

Causal Cluster Architecture

The two roles are foundational in any production deployment but are managed at different scales from one another and undertake different roles in managing the fault tolerance and scalability of the overall cluster.

Core Servers

The main responsibility of Core Servers is to safeguard data. Core Servers do so by replicating all transactions using the Raft protocol. Raft ensures that the data is safely durable before confirming transaction commitment to the end-user application. In practice, this means once a majority of Core Servers in a cluster (N/2+1) have accepted the transaction, it is safe to acknowledge the commit to the end-user application.

The safety requirement has an impact on write latency. Implicitly writes will be acknowledged by the fastest majority, but as the number of Core Servers in the cluster grows so do the size of the majority needed to acknowledge a write.

In practice, this means that there are relatively few machines in a typical Core Server cluster, enough to provide sufficient fault tolerance for the specific deployment. This is calculated with the formula M = 2F + 1 where M is the number of Core Servers required to tolerate F faults.

For example:

In order to tolerate two failed Core Servers, we would need to deploy a cluster of five Cores.
The smallest fault-tolerant cluster, a cluster that can tolerate one fault, must have three Cores.
It is also possible to create a Causal Cluster consisting of only two Cores. However, that cluster will not be fault-tolerant. If one of the two servers fails, the remaining server will become read-only.

Read Replicas

The main responsibility of Read-Replicas is to scale out graph workloads. Read-Replicas act like caches for the graph data that the Core Servers safeguard and are fully capable of executing arbitrary (read-only) queries and procedures.

Read-Replicas are asynchronously replicated from Core Servers via transaction log shipping. They will periodically poll an upstream server for new transactions and have these shipped over. Many Read Replicas can be fed data from a relatively small number of Core Servers, allowing for a large fan out of the query workload for scale.

Read-Replicas should typically be run in relatively large numbers and treated as disposable. Losing a Read Replica does not impact the cluster’s availability, aside from the loss of its fraction of graph query throughput. It does not affect the fault tolerance capabilities of the cluster.

Configure a Core-only cluster

The following configuration settings are important to consider when deploying a new Causal Cluster.

Important settings for a new Causal Cluster

dbms.default_listen_address: The address or network interface this machine uses to listen for incoming messages. Setting this value to 0.0.0.0 makes Neo4j bind to all available network interfaces.

dbms.default_advertised_address: The address that other machines are told to connect to. In the typical case, this should be set to the fully qualified domain name or the IP address of this server.

dbms.mode: The operating mode of a single server instance. For Causal Clustering, there are two possible modes: CORE or READ_REPLICA.

causal_clustering.minimum_core_cluster_size_at_formation: The minimum number of Core machines in the cluster at formation. A cluster will not form without the number of Cores defined by this setting, and this should in general be configured to the full and fixed amount.

causal_clustering.minimum_core_cluster_size_at_runtime: The minimum number of Core instances which will exist in the consensus group.

causal_clustering.initial_discovery_members: The network addresses of an initial set of Core cluster members that are available to bootstrap this Core or Read Replica instance. In the default case, the initial discovery members are given as a comma-separated list of address/port pairs, and the default port for the discovery service is :5000. It is good practice to set this parameter to the same value on all Core Servers.

neo4j.conf on CORE 1:

dbms.default_listen_address=core1_ip_address dbms.default_advertised_address=core1_ip_address dbms.mode=CORE causal_clustering.initial_discovery_members=core1_ip_address:5000,core2_ip_address:5000,core3_ip_address:5000

neo4j.conf on CORE 2:

dbms.default_listen_address=core2_ip_address dbms.default_advertised_address=core2_ip_address dbms.mode=CORE causal_clustering.initial_discovery_members=core1_ip_address:5000,core2_ip_address:5000,core3_ip_address:5000

neo4j.conf on CORE 3:
dbms.default_listen_address=core3_ip_address dbms.default_advertised_address=core3_ip_address dbms.mode=CORE causal_clustering.initial_discovery_members=core1_ip_address:5000,core2_ip_address:5000,core3_ip_address:5000

Now we are ready to start the Neo4j servers. The startup order does not matter.

After the cluster has started, we can connect to any of the instances and run CALL dbms.cluster.overview() to check the status of the cluster. This will show information about each member of the cluster.

We now have a Neo4j Causal Cluster of three instances running.

Seed a cluster

We learned how to create a cluster with empty databases. However, regardless of whether we are just playing around with Neo4j or setting up a production environment, it is likely that we have some existing data that we wish to transfer into our cluster.

This section outlines how to create a Causal Cluster containing data either seeded from an existing online or offline Neo4j database or imported from some other data source using the import tool. The general steps to seed a cluster will follow the same pattern, regardless of which format our data is in:

Create a new Neo4j Core-only cluster.
Seed the cluster.
Start the cluster.

Taking Backup of the existing Neo4j node data and Restoring it into the cluster servers :

Steps:

Stop the running DB which has existing data using the command :
service neo4j stop
Go to it’s neo4j home directory i.e. /usr/share/neo4j/ and run the command :
bin/neo4j-admin dump --database=<database_name> --to=<destination-path>
Remote copy the backed-up file to the rest of the servers in the cluster :
scp <filename> user@server_ip_address:
After getting the backed-up copies in the respective servers, stop the servers using the command:
service neo4j stop
Now go to their neo4j config directory i.e. /etc/neo4j and restore the backup using the command:
neo4j-admin load --from=<backup-path> --database=<database> --force
This restores the backed-up DB in each of the servers in the cluster.
After this unwind the servers from the clusters using the command (do this step on each server in the cluster):
neo4j-admin unbind
Now start each server in the cluster using the command:
service neo4j start
Now wait until the remote interface is available, check the status of each server using the command:
service neo4j status
Once the remote interface of each server is available, you can hit the remote interface URL on your browser and check your data available on the server and try ingesting some test data.
This completes seeding your cluster with existing data.

…. Happy setting-up Neo4j Causal Cluster