Adding triples to MDB store scales in exponential time #8

New issue

Open

opened 2026-01-29 03:00:57 +00:00 by scossu · 2 comments

scossu commented

2026-01-29 03:00:57 +00:00

Owner

Using a simple benchmark test that inserts triples in a MDB backed graph, time per triple increases with the number of triples inserted.

The same method using a hashmap has a linear growth. Also, inserting random keys and values in a bare MDB store has linear growth. The latter process also keeps a consistent memory usage, while the MDB store graph keeps growing.

$ ./mdbperf_test_add_triples 10000
10000 records entered in 0.051866 s (192804.534763 records/s)

$ ./mdbperf_test_add_triples 100000
100000 records entered in 0.816743 s (122437.535430 records/s)

$ ./mdbperf_test_add_triples 1000000
1000000 records entered in 12.517030 s (79891.156289 records/s)

$ ./mdbperf_test_add_triples 10000000
10000000 records entered in 513.634070 s (19469.113488 records/s)

This becomes more severe when decoding very large TTL graphs, to the point that larger graphs (9M+ triples) take an order of magnitude more time per triple than ones in the 10-100K range.

There may be room for investigating the cause of this performance degradation and trying to remediate it.

Using a simple benchmark test that inserts triples in a MDB backed graph, time per triple increases with the number of triples inserted. The same method using a hashmap has a linear growth. Also, inserting random keys and values in a bare MDB store has linear growth. The latter process also keeps a consistent memory usage, while the MDB store graph keeps growing. ``` $ ./mdbperf_test_add_triples 10000 10000 records entered in 0.051866 s (192804.534763 records/s) $ ./mdbperf_test_add_triples 100000 100000 records entered in 0.816743 s (122437.535430 records/s) $ ./mdbperf_test_add_triples 1000000 1000000 records entered in 12.517030 s (79891.156289 records/s) $ ./mdbperf_test_add_triples 10000000 10000000 records entered in 513.634070 s (19469.113488 records/s) ``` This becomes more severe when decoding very large TTL graphs, to the point that larger graphs (9M+ triples) take an order of magnitude more time per triple than ones in the 10-100K range. There may be room for investigating the cause of this performance degradation and trying to remediate it.

scossu added the

enhancement

label

2026-01-29 03:00:57 +00:00

scossu commented

2026-01-29 03:06:32 +00:00

Author

Owner

Script is at https://git.knowledgetx.com/scossu/volksdata/src/branch/master/test/debug/mdbperf_test_add_triples.c

scossu commented

2026-01-29 03:11:41 +00:00

Author

Owner

Adding a small dataset to an existing MDB store with a large dataset already in it or adding it to an empty store doesn't change the timing of the small dataset insertion.

A long-running transaction may be the cause for this, but also it doesn't explain why adding raw data to a generic LMDB store in one large transaction does not take exponentially longer.

Adding a small dataset to an existing MDB store with a large dataset already in it or adding it to an empty store doesn't change the timing of the small dataset insertion. A long-running transaction may be the cause for this, but also it doesn't explain why adding raw data to a generic LMDB store in one large transaction does not take exponentially longer.