Adding triples to MDB store scales in exponential time #8
Labels
No labels
t
bug
t
documentation
t
enhancement
t
feature
t
test
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
scossu/volksdata#8
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Using a simple benchmark test that inserts triples in a MDB backed graph, time per triple increases with the number of triples inserted.
The same method using a hashmap has a linear growth. Also, inserting random keys and values in a bare MDB store has linear growth. The latter process also keeps a consistent memory usage, while the MDB store graph keeps growing.
This becomes more severe when decoding very large TTL graphs, to the point that larger graphs (9M+ triples) take an order of magnitude more time per triple than ones in the 10-100K range.
There may be room for investigating the cause of this performance degradation and trying to remediate it.
Script is at https://git.knowledgetx.com/scossu/volksdata/src/branch/master/test/debug/mdbperf_test_add_triples.c
Adding a small dataset to an existing MDB store with a large dataset already in it or adding it to an empty store doesn't change the timing of the small dataset insertion.
A long-running transaction may be the cause for this, but also it doesn't explain why adding raw data to a generic LMDB store in one large transaction does not take exponentially longer.