Adding triples to MDB store scales in exponential time #8

Open
opened 2026-01-29 03:00:57 +00:00 by scossu · 2 comments
Owner

Using a simple benchmark test that inserts triples in a MDB backed graph, time per triple increases with the number of triples inserted.

The same method using a hashmap has a linear growth. Also, inserting random keys and values in a bare MDB store has linear growth. The latter process also keeps a consistent memory usage, while the MDB store graph keeps growing.

$ ./mdbperf_test_add_triples 10000
10000 records entered in 0.051866 s (192804.534763 records/s)

$ ./mdbperf_test_add_triples 100000
100000 records entered in 0.816743 s (122437.535430 records/s)

$ ./mdbperf_test_add_triples 1000000
1000000 records entered in 12.517030 s (79891.156289 records/s)

$ ./mdbperf_test_add_triples 10000000
10000000 records entered in 513.634070 s (19469.113488 records/s)

This becomes more severe when decoding very large TTL graphs, to the point that larger graphs (9M+ triples) take an order of magnitude more time per triple than ones in the 10-100K range.

There may be room for investigating the cause of this performance degradation and trying to remediate it.

Using a simple benchmark test that inserts triples in a MDB backed graph, time per triple increases with the number of triples inserted. The same method using a hashmap has a linear growth. Also, inserting random keys and values in a bare MDB store has linear growth. The latter process also keeps a consistent memory usage, while the MDB store graph keeps growing. ``` $ ./mdbperf_test_add_triples 10000 10000 records entered in 0.051866 s (192804.534763 records/s) $ ./mdbperf_test_add_triples 100000 100000 records entered in 0.816743 s (122437.535430 records/s) $ ./mdbperf_test_add_triples 1000000 1000000 records entered in 12.517030 s (79891.156289 records/s) $ ./mdbperf_test_add_triples 10000000 10000000 records entered in 513.634070 s (19469.113488 records/s) ``` This becomes more severe when decoding very large TTL graphs, to the point that larger graphs (9M+ triples) take an order of magnitude more time per triple than ones in the 10-100K range. There may be room for investigating the cause of this performance degradation and trying to remediate it.
Author
Owner
Script is at https://git.knowledgetx.com/scossu/volksdata/src/branch/master/test/debug/mdbperf_test_add_triples.c
Author
Owner

Adding a small dataset to an existing MDB store with a large dataset already in it or adding it to an empty store doesn't change the timing of the small dataset insertion.

A long-running transaction may be the cause for this, but also it doesn't explain why adding raw data to a generic LMDB store in one large transaction does not take exponentially longer.

Adding a small dataset to an existing MDB store with a large dataset already in it or adding it to an empty store doesn't change the timing of the small dataset insertion. A long-running transaction may be the cause for this, but also it doesn't explain why adding raw data to a generic LMDB store in one large transaction does not take exponentially longer.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
scossu/volksdata#8
No description provided.