Latest Posts


Benchmarking Neo4j and TigerGraph

Graph databases are certainly getting a lot of interest and gaining more widespread adoption across various industries. The most popular graph database today is Neo4j, ranked #1 by DB-engine.

TigerGraph, claiming to be world's fastest and most scalable graph platform, after releasing its free developer edition, recently published their benchmark results on Amazone Neptune. This benchmark is an addition to already available report comparing performance of TigerGraph with Neo4j and TitanDB, which can also be found on their website. TigerGraph outperforms these graph databases by a large margin across all of the benchmark tests. Moreover, TigerGraph demonstrates more efficient storage usage, reducing the original data size as opposed to expanding it as it is the case for the other databases

Needles to say, these outstanding benchmarking results got me interested in TigerGraph and that's why I've decided to run benchmarking tests myself comparing Neo4j and TigerGraph. For summary of benchmark results skip to the end, for detailed description of the benchmark read the entire blog post.

This blog post covers the following topics:

Benchmark Setup

Hardware

Both Neo4j and TigerGraph tests were performed on EC2 instances with the following characteristics:

EC2 Instance vCPU Memory Disk Size IOPS Volume Type OS
r4.4x large 16 122 GiB 300 GiB 900/3000 General Purpose SSD (GP2) Ubuntu 14.04

Software

To get the best Neo4j performance the memory configuration properties were tuned: increased the size of heap and page cache.

Dataset

I used Friendster Social Network dataset provided by the Stanford SNAP. Friendster network contains users connected with each other via friendship edge. The size of the dataset is 31 GB, and it has a format of a tab separated edge list.

Friendster statistics:

Description of Tests and Results

Loading Time

Here the time to load dataset into database is measured. The loading methods used in this benchmark are as follows.

Table 1 - Data loading time
Neo4j TigerGraph
Load time 2146.385 s 3026.33 s
Node file preparation time 1912.664 s -
Total 4059.049 s 3026.33 s

The loading time of TigerGraph is not substantially better than Neo4j's time. However TigerGraph does have one advantage over Neo4j, that is it doesn't require preprocessing of data before load (e.g. node file preparation). Moreover, since Neo4j doesn't automatically index data during load, this implies that additional time to perform indexing is required before data is ready to be used. Neo4j indexing time is not included into total loading time here.

Storage Size

Here I compare the storage sizes of the loaded data against original dataset size. Neo4j storage size was measured after index on node ids was created.

Table 2 - Storage size of loaded data
Dataset Original Neo4j TigerGraph
Friendster 31 GB 62 GB 29 GB

This shows that TigerGraph does use efficient compression during data ingestion, which reduces the graph size loaded into database compared to its original size.

Query Performance

This part of benchmark captures the query execution times for k-step neighborhood queries. The k-step neighborhood query given the start node counts the number of its neighbors within k steps including the start node itself.

Query performance test is conducted on the following queries, capturing average time over 10 runs, where each run uses randomly selected start node:

All query performance tests use the same file with randomly selected 10 start vertices, and the average time is measured over 10 query runs for each of the tests. The query timeout for 1-step neighborhood was set to 180 seconds, and for 3-step and 6-step queries 9000 seconds (2.5 hours) timeout was used, i.e. if after given timeout query did not complete, then it's terminated and computation proceeds to the next query.

Here is a k-step neighborhood query written in Cypher (Neo4j), where node type is User and edge type is Friendship:

MATCH (n1:User)-[:Friendship*0..{k}]-(n2:User)
WHERE n1.id={start_node}
RETURN count(distinct n2);

And GSQL(TigerGraph) equivalent of the query:

CREATE QUERY kstep(VERTEX< User > start_node, INT k) for GRAPH friendster {
            int i = 0;
            Result = {start_node};
            Start = {start_node};

            WHILE (i < k) DO
                Start = SELECT v
                        FROM Start:u - (Friendship:e)->:v;
                Result = Result UNION Start;
                i = i + 1;
            END;
 
            PRINT Result.size();
}

The neighborhood sizes for each start vertex computed by Neo4j and TigerGraph were compared to make sure results are consistent and both discovering the same size neighborhoods.

Table 3 - Average Time for k-step Neighborhood
Neo4j TigerGraph
1-step 39.07 ms 4.827 ms
3-step 347.391 s 0.377 s
6-step N/A (9/10 timeout) 153.749 s

TigerGraph greatly outperforms Neo4j on k-step neighborhood queries, finishing all the queries within set timeout. Neo4j is able to complete 1-step and 3-step queries within the set query timeout as wel, although far behind TigerGraph. For 6-step neighborhood 9 queries out of 10 timed out, i.e. could not complete within the timeout of 9000 seconds. Since only one query completed it's not reasonable to provide its execution time as an average as it doesn't reflect the average value at all. This query with start node 5,832,221 completed in 23.221 seconds, in comparison TigerGraph query with this start node completed in just 0.517 seconds. Moreover, start node 5,832,221 has the smallest 6-step neighborhood among other start nodes, which explains why this was the only query that was able to complete in time less than 9000 seconds. Clearly, TigerGraph is faster than Neo4j on 6-step neighborhood query as well.

Conclusion

This benchmark test clearly shows the advantage of TigerGraph over Neo4j. TigerGraph provides better performance in terms of loading time and running graph traversal queries while utilizing far less storage space. TigerGraph's native language is powerful enough to express most (if not all) of the graph traversal and graph analytics queries.

Summary of benchmark results:

I would strongly recommend to checkout TigerGraph developer edition and see it for yourself.