Skip to content

Commit a24bbae

Browse files
committed
Merged last version
2 parents 16a039d + 28bb794 commit a24bbae

36 files changed

Lines changed: 841 additions & 415 deletions

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
target/*
22
*.class
33
/target
4-
/data
4+
/data
5+
/.project
6+
/.gitignore
7+
/.settings
8+
/.classpath

README.md

Lines changed: 101 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,123 @@
11
graphdb-benchmarks
22
==================
3-
The project graphdb-benchmarks is a benchmark between three popular graph dataases, Titan, OrientDB and Neo4j.The purpose of this benchmark is to examine the performance of each graph database both in terms of execution time and memory consumption. The benchmark is composed of four workloads, Clustering, Massive Insertion, Single Insertion and Query Workload. Every workload has been designed to simulate common operations in graph database systems
3+
The project graphdb-benchmarks is a benchmark between popular graph dataases. Currently the framework supports [Titan](http://thinkaurelius.github.io/titan/), [OrientDB](http://www.orientechnologies.com/orientdb/), [Neo4j](http://neo4j.com/) and [Sparksee](http://www.sparsity-technologies.com/). The purpose of this benchmark is to examine the performance of each graph database in terms of execution time. The benchmark is composed of four workloads, Clustering, Massive Insertion, Single Insertion and Query Workload. Every workload has been designed to simulate common operations in graph database systems.
44

55
- *Clustering Workload (CW)*: CW consists of a well-known community detection algorithm for modularity optimization, the Louvain Method. We adapt the algorithm on top of the benchmarked graph databases and employ cache techniques to take advantage of both graph database capabilities and in-memory execution speed. We measure the time the algorithm needs to converge.
66
- *Massive Insertion Workload (MIW)*: we create the graph database and configure it for massive loading, then we populate it with a particular dataset. We measure the time for the creation of the whole graph.
7-
- *Single Insertion Workload (SIW)*: we create the graph database and load it with a particular dataset. Every object insertion (node or edge) is committed directly and the graph is constructed incrementally. We measure the insertion time per block, which consists of one thousand nodes and the edges that appear during the insertion of these nodes.
7+
- *Single Insertion Workload (SIW)*: we create the graph database and load it with a particular dataset. Every object insertion (node or edge) is committed directly and the graph is constructed incrementally. We measure the insertion time per block, which consists of one thousand edges and the nodes that appear during the insertion of these edges.
88
- *Query Workload (QW)*: we execute three common queries:
99
* FindNeighbours (FN): finds the neighbours of all nodes.
1010
* FindAdjacentNodes (FA): finds the adjacent nodes of all edges.
1111
* FindShortestPath (FS): finds the shortest path between the first node and 100 randomly picked nodes.
1212

1313
Here we measure the execution time of each query.
1414

15-
For our evaluation we use both synthetic and real data. More specifically, we execute MIW, SIW and QW with real data derived from the SNAP dataset collection ([Enron Dataset](http://snap.stanford.edu/data/email-Enron.html), [Amazon dataset](http://snap.stanford.edu/data/amazon0601.html), [Youtube dataset](http://snap.stanford.edu/data/com-Youtube.html) and [LiveJournal dataset](http://snap.stanford.edu/data/com-LiveJournal.html)). On the other hand, with the CW we use synthetic data generated with the [LFR-Benchmark generator](https://sites.google.com/site/andrealancichinetti/files) which produces networks with power-law
16-
degree distribution and implanted communities within the network.
15+
For our evaluation we use both synthetic and real data. More specifically, we execute MIW, SIW and QW with real data derived from the SNAP dataset collection ([Enron Dataset](http://snap.stanford.edu/data/email-Enron.html), [Amazon dataset](http://snap.stanford.edu/data/amazon0601.html), [Youtube dataset](http://snap.stanford.edu/data/com-Youtube.html) and [LiveJournal dataset](http://snap.stanford.edu/data/com-LiveJournal.html)). On the other hand, with the CW we use synthetic data generated with the [LFR-Benchmark generator](https://sites.google.com/site/andrealancichinetti/files) which produces networks with power-law degree distribution and implanted communities within the network. The synthetic data can be downloaded form [here](http://figshare.com/articles/Synthetic_Data_for_graphdb_benchmark/1221760).
16+
17+
For further information about the study please refer to the [published paper](http://link.springer.com/chapter/10.1007/978-3-319-10518-5_1) on Springer site and the presentation on [Slideshare](http://www.slideshare.net/sympapadopoulos/adbis2014-presentation).
18+
19+
**Note 1:** The published paper contains the experimental study of Titan, OrientDB and Neo4j. After the publication we included the Sparksee graph database.
20+
21+
**Note 2:** After the very useful comments and contributions of OrientDB developers, we updated the benchmark implementations and re-run the experiments. We have updated the initial presentation with the new results and uploaded a new version of the paper in the following [link](http://mklab.iti.gr/files/beis_adbis2014_corrected.pdf).
1722

1823
Instructions
1924
------------
20-
To run the project firstly you should download one of the above datasets. You can download any dataset you want, but because there is not any utility class το convert the dataset in the appropriate format (for now), the format of the data must be identical with the tested datasets. From the config/input.properties file you should choose the dataset (aslo the dataset path) and the workload you want to run. Moreover you should specify the path you want to write the results. For the CW the cache values should be specified from the properties file. When the configuration is done open the GraphDatabaseBenchmark class, which is the main class and run the benchmark. For more details about the code, please check the comments in the code.
25+
To run the project at first you have to choose one of the aforementioned datasets. Of course you can select any dataset, but because there is not any utility class το convert the dataset in the appropriate format (for now), the format of the data must be identical with the tested datasets. The input parametes are configured from the config/input.properties file. Please follow the intructions in this file to select the correct parameters.
2126

2227
Results
2328
-------
24-
Below we list the results of MIW and QW for each dataset. The time is measured in seconds.
25-
26-
| Dataset | Workload | Titan | OrientDB | Neo4j |
27-
| ------- | -------- | ----- | -------- | ----- |
28-
| EN | MIW |9.36 |62.77 |**6.77**|
29-
| AM | MIW |34.00 |97.00 |**10.61**|
30-
| YT | MIW |104.27 |252.15 |**24.69**|
31-
| LJ | MIW | | | |
32-
| |
33-
| EN | QW-FN |2.75 |1.15 |**0.61** |
34-
| AM | QW-FN |8.56 |6.63 |**1.74** |
35-
| YT | QW-FN |29.56 |21.32 |**5.98** |
36-
| LJ | QW-FN | | | |
37-
| |
38-
| EN | QW-FA | | | |
39-
| AM | QW-FA | | | |
40-
| YT | QW-FA | | | |
41-
| LJ | QW-FA | | | |
42-
| |
43-
| EN | QW-FS | | | |
44-
| AM | QW-FS | | | |
45-
| YT | QW-FS |9.21 |15.33 |**0.31** |
46-
| LJ | QW-FS | | | |
47-
48-
Below we list the results of the CW for graphs with 1,000, 5,000 and 10,0000 nodes. Here the time is also measured in seconds.
49-
50-
| Graph-Cache | Titan | OrientDB | Neo4j |
51-
| ----------- | ----- | -------- | ----- |
52-
|Graph1000-5% |2.49 |**0.91** |2.88 |
53-
|Graph1000-10% |1.48 |**0.61** |2.12 |
54-
|Graph1000-15% |1.35 |**0.57** |2.03 |
55-
|Graph1000-20% |1.32 |**0.52** |1.91 |
56-
|Graph1000-25% |1.30 |**0.50** |1.69 |
57-
| |
58-
|Graph5000-5% |16.62 |**5.85** |14.06 |
59-
|Graph5000-10% |15.84 |**5.63** |13.18 |
60-
|Graph5000-15% |15.15 |**4.78** |12.96 |
61-
|Graph5000-20% |14.24 |**4.51** |12.89 |
62-
|Graph5000-25% |14.10 |**4.60** |12.19 |
63-
| |
64-
|Graph10000-5% |49.45 |**18.26** |37.37 |
65-
|Graph10000-10% |46.97 |**17.73** |35.50 |
66-
|Graph10000-15% |47.84 |**17.47** |34.70 |
67-
|Graph10000-20% |44.86 |**17.03** |37.62 |
68-
|Graph10000-25% |44.01 |**16.87** |33.18 |
29+
This section contains the results of each benchmark. All the measurements are in seconds.
30+
31+
32+
####CW results
33+
Below we list the results of the CW for graphs with 1,000, 5,000, 10,0000, 20,000, 30,000, 40,000, 50,000 nodes.
34+
35+
| Graph-Cache | Titan | OrientDB | Neo4j |
36+
| ----------- | ----- | -------- | ----- |
37+
|Graph1k-5% |2.39 |**0.92** |2.46 |
38+
|Graph1k-10% |1.45 |**0.59** |2.07 |
39+
|Graph1k-15% |1.30 |**0.58** |1.88 |
40+
|Graph1k-20% |1.25 |**0.55** |1.72 |
41+
|Graph1k-25% |1.19 |**0.49** |1.67 |
42+
|Graph1k-30% |1.15 |**0.48** |1.55 |
43+
| |
44+
|Graph5k-5% |16.01 |**5.88** |12.80 |
45+
|Graph5k-10% |15.10 |**5.67** |12.13 |
46+
|Graph5k-15% |14.63 |**4.81** |11.91 |
47+
|Graph5k-20% |14.16 |**4.62** |11.68 |
48+
|Graph5k-25% |13.76 |**4.51** |11.31 |
49+
|Graph5k-30% |13.38 |**4.45** |10.94 |
50+
| |
51+
|Graph10k-5% |46.06 |**18.20** |34.05 |
52+
|Graph10k-10% |44.59 |**17.92** |32.88 |
53+
|Graph10k-15% |43.68 |**17.31** |31.91 |
54+
|Graph10k-20% |42.48 |**16.88** |31.01 |
55+
|Graph10k-25% |41.32 |**16.58** |30.74 |
56+
|Graph10k-30% |39.98 |**16.34** |30.13 |
57+
| |
58+
|Graph20k-5% |140.46 |**54.01** |87.04 |
59+
|Graph20k-10% |138.10 |**52.51** |85.49 |
60+
|Graph20k-15% |137.25 |**52.12** |82.88 |
61+
|Graph20k-20% |133.11 |**51.68** |82.16 |
62+
|Graph20k-25% |122.48 |**50.79** |79.87 |
63+
|Graph20k-30% |120.94 |**50.49** |78.81 |
64+
| |
65+
|Graph30k-5% |310.25 |**96.38** |154.60 |
66+
|Graph30k-10% |301.80 |**94.98** |151.81 |
67+
|Graph30k-15% |299.27 |**94.85** |151.12 |
68+
|Graph30k-20% |296.43 |**94.67** |146.25 |
69+
|Graph30k-25% |294.33 |**92.62** |144.08 |
70+
|Graph30k-30% |288.50 |**90.13** |142.33 |
71+
| |
72+
|Graph40k-5% |533.29 |**201.19**|250.79 |
73+
|Graph40k-10% |505.91 |**199.18**|244.79 |
74+
|Graph40k-15% |490.39 |**194.34**|242.55 |
75+
|Graph40k-20% |478.31 |**183.14**|241.47 |
76+
|Graph40k-25% |467.18 |**177.55**|237.29 |
77+
|Graph40k-30% |418.07 |**174.65**|229.65 |
78+
| |
79+
|Graph50k-5% |642.42 |**240.58**|348.33 |
80+
|Graph50k-10% |624.36 |**238.35**|344.06 |
81+
|Graph50k-15% |611.70 |**237.65**|340.20 |
82+
|Graph50k-20% |610.40 |**230.76**|337.36 |
83+
|Graph50k-25% |596.29 |**230.03**|332.01 |
84+
|Graph50k-30% |580.44 |**226.31**|325.88 |
85+
86+
87+
####MIW & QW results
88+
Below we list the results of MIW and QW for each dataset.
89+
90+
| Dataset | Workload | Titan | OrientDB | Neo4j |
91+
| ------- | -------- | ----- | -------- | ----- |
92+
| EN | MIW |9.36 |62.77 |**6.77** |
93+
| AM | MIW |34.00 |97.00 |**10.61** |
94+
| YT | MIW |104.27 |252.15 |**24.69** |
95+
| LJ | MIW |663.03 |9416.74 |**349.55**|
96+
| |
97+
| EN | QW-FN |1.87 |**0.56** |0.95 |
98+
| AM | QW-FN |6.47 |3.50 |**1.85** |
99+
| YT | QW-FN |20.71 |9.34 |**4.51** |
100+
| LJ | QW-FN |213.41 |303.09 |**47.07** |
101+
| |
102+
| EN | QW-FA |3.78 |0.71 |**0.16** |
103+
| AM | QW-FA |13.77 |2.30 |**0.36** |
104+
| YT | QW-FA |42.82 |6.15 |**1.46** |
105+
| LJ | QW-FA |460.25 |518.12 |**16.53** |
106+
| |
107+
| EN | QW-FS |1.63 |3.09 |**0.16** |
108+
| AM | QW-FS |0.12 |83.29 |**0.302** |
109+
| YT | QW-FS |24.87 |23.47 |**0.08** |
110+
| LJ | QW-FS |123.50 |86.87 |**18.13** |
111+
112+
113+
####SIW results
114+
Below we list the results of SIW for each dataset.
115+
![alt text](https://raw.githubusercontent.com/socialsensor/graphdb-benchmarks/master/images/SIWEnron.png "Logo Title Text 1")
116+
![alt text](https://raw.githubusercontent.com/socialsensor/graphdb-benchmarks/master/images/SIWAmazon.png "Logo2 Title Text 1")
117+
![alt text](https://raw.githubusercontent.com/socialsensor/graphdb-benchmarks/master/images/SIWYoutube.png "Logo Title Text 1")
118+
![alt text](https://raw.githubusercontent.com/socialsensor/graphdb-benchmarks/master/images/SIWLivejournal.png "Logo4 Title Text 1")
69119

70120

71121
Contact
72122
-------
73-
For more information or support, please contact: sotbeis@iti.gr or sot.beis@gmail.com
123+
For more information or support, please contact: sotbeis@iti.gr, sot.beis@gmail.com or papadop@iti.gr.

pom.xml

Lines changed: 109 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,54 @@
11
<?xml version="1.0" encoding="utf-8"?>
2-
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3-
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
2+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
3+
44
<modelVersion>4.0.0</modelVersion>
55
<groupId>eu.socialsensor</groupId>
66
<artifactId>graphdb-benchmarks</artifactId>
7-
<version>0.0.1-SNAPSHOT</version>
7+
<version>0.1-SNAPSHOT</version>
8+
89
<name>graphdb-benchmarks</name>
910
<url>https://github.com/socialsensor/graphdb-benchmarks</url>
1011
<description>Performance benchmark between popular graph databases.</description>
12+
13+
<parent>
14+
<groupId>org.sonatype.oss</groupId>
15+
<artifactId>oss-parent</artifactId>
16+
<version>7</version>
17+
</parent>
18+
1119
<organization>
1220
<name>SocialSensor</name>
1321
<url>http://www.socialsensor.eu/</url>
1422
</organization>
23+
1524
<developers>
1625
<developer>
1726
<id>sarovios</id>
1827
<name>Sotiris Beis</name>
1928
<email>sotbeis@iti.gr</email>
2029
</developer>
2130
</developers>
31+
2232
<licenses>
2333
<license>
2434
<name>The Apache Software License, Version 2.0</name>
2535
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
2636
<distribution>repo</distribution>
2737
</license>
2838
</licenses>
29-
<!-- required to include Tinkerpop SNAPSHOT dependencies -->
30-
<repositories>
31-
<repository>
32-
<id>sonatype-nexus-snapshots</id>
33-
<name>Sonatype Nexus Snapshots</name>
34-
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
35-
<releases>
36-
<enabled>false</enabled>
37-
</releases>
38-
<snapshots>
39-
<enabled>true</enabled>
40-
</snapshots>
41-
</repository>
42-
</repositories>
39+
40+
<scm>
41+
<connection>scm:git:git@github.com:socialsensor/graphdb-benchmarks.git</connection>
42+
<developerConnection>scm:git:git@github.com:socialsensor/graphdb-benchmarks.git</developerConnection>
43+
<url>git@github.com:socialsensor/graphdb-benchmarks.git</url>
44+
<tag>graphdb-benchmarks-0.1</tag>
45+
</scm>
46+
4347
<properties>
4448
<blueprints.version>2.6.0</blueprints.version>
4549
<orientdb.version>2.0-SNAPSHOT</orientdb.version>
4650
</properties>
51+
4752
<dependencies>
4853
<dependency>
4954
<groupId>colt</groupId>
@@ -146,11 +151,11 @@
146151
<!-- <artifactId>neo4j-lucene-index</artifactId> -->
147152
<!-- <version>2.1.0-M01</version> -->
148153
<!-- </dependency> -->
149-
<dependency>
150-
<groupId>org.neo4j</groupId>
151-
<artifactId>neo4j</artifactId>
152-
<version>2.1.3</version>
153-
</dependency>
154+
<!-- <dependency> -->
155+
<!-- <groupId>org.neo4j</groupId> -->
156+
<!-- <artifactId>neo4j</artifactId> -->
157+
<!-- <version>2.1.3</version> -->
158+
<!-- </dependency> -->
154159
<dependency>
155160
<groupId>com.orientechnologies</groupId>
156161
<artifactId>orient-commons</artifactId>
@@ -221,18 +226,97 @@
221226
<artifactId>sparkseejava</artifactId>
222227
<version>5.0.0</version>
223228
</dependency>
229+
230+
231+
<dependency>
232+
<groupId>org.neo4j</groupId>
233+
<artifactId>neo4j</artifactId>
234+
<version>2.1.3</version>
235+
</dependency>
224236
</dependencies>
237+
238+
225239
<build>
226-
<sourceDirectory>src</sourceDirectory>
240+
241+
<pluginManagement>
242+
<plugins>
243+
<plugin>
244+
<groupId>org.apache.maven.plugins</groupId>
245+
<artifactId>maven-release-plugin</artifactId>
246+
<version>2.5</version>
247+
<configuration>
248+
<useReleaseProfile>false</useReleaseProfile>
249+
<releaseProfiles>release</releaseProfiles>
250+
<goals>deploy</goals>
251+
</configuration>
252+
</plugin>
253+
</plugins>
254+
</pluginManagement>
255+
227256
<plugins>
257+
258+
<plugin>
259+
<groupId>org.sonatype.plugins</groupId>
260+
<artifactId>nexus-staging-maven-plugin</artifactId>
261+
<version>1.6.3</version>
262+
<extensions>true</extensions>
263+
<configuration>
264+
<serverId>sonatype-nexus-staging</serverId>
265+
<nexusUrl>https://oss.sonatype.org/</nexusUrl>
266+
<autoReleaseAfterClose>true</autoReleaseAfterClose>
267+
</configuration>
268+
</plugin>
269+
228270
<plugin>
229271
<artifactId>maven-compiler-plugin</artifactId>
230272
<version>3.1</version>
231273
<configuration>
232-
<source>1.7</source>
233-
<target>1.7</target>
274+
<source>1.6</source>
275+
<target>1.6</target>
234276
</configuration>
235277
</plugin>
278+
279+
<plugin>
280+
<groupId>org.apache.maven.plugins</groupId>
281+
<artifactId>maven-source-plugin</artifactId>
282+
<executions>
283+
<execution>
284+
<id>attach-sources</id>
285+
<goals>
286+
<goal>jar-no-fork</goal>
287+
</goals>
288+
</execution>
289+
</executions>
290+
</plugin>
291+
292+
<plugin>
293+
<groupId>org.apache.maven.plugins</groupId>
294+
<artifactId>maven-javadoc-plugin</artifactId>
295+
<executions>
296+
<execution>
297+
<id>attach-javadocs</id>
298+
<goals>
299+
<goal>jar</goal>
300+
</goals>
301+
</execution>
302+
</executions>
303+
</plugin>
304+
305+
<plugin>
306+
<groupId>org.apache.maven.plugins</groupId>
307+
<artifactId>maven-gpg-plugin</artifactId>
308+
<version>1.5</version>
309+
<executions>
310+
<execution>
311+
<id>sign-artifacts</id>
312+
<phase>verify</phase>
313+
<goals>
314+
<goal>sign</goal>
315+
</goals>
316+
</execution>
317+
</executions>
318+
</plugin>
319+
236320
</plugins>
237321
</build>
238322
</project>
File renamed without changes.

src/eu/socialsensor/benchmarks/ClusteringBenchmark.java renamed to src/main/java/eu/socialsensor/benchmarks/ClusteringBenchmark.java

File renamed without changes.

src/eu/socialsensor/benchmarks/FindNeighboursOfAllNodesBenchmark.java renamed to src/main/java/eu/socialsensor/benchmarks/FindNeighboursOfAllNodesBenchmark.java

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,13 @@ public void startBenchmark() {
5252
try {
5353
permutation.invoke(this, null);
5454
utils.clearGC();
55-
} catch (IllegalAccessException | IllegalArgumentException
56-
| InvocationTargetException e) {
55+
} catch (IllegalAccessException e) {
56+
e.printStackTrace();
57+
}
58+
catch (IllegalArgumentException e) {
59+
e.printStackTrace();
60+
}
61+
catch (InvocationTargetException e) {
5762
e.printStackTrace();
5863
}
5964

0 commit comments

Comments
 (0)