storage.rst 3.8 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
  1. Storage Implementation
  2. ======================
  3. Lakesuperior stores non-RDF (“binary”) data in the filesystem and RDF
  4. data in an embedded key-value store, `LMDB <https://symas.com/lmdb/>`__.
  5. RDF Storage design
  6. ------------------
  7. LMDB is a very fast, very lightweight C library. It is inspired by
  8. BerkeleyDB but introduces significant improvements in terms of
  9. efficiency and stability.
  10. The Lakesuperior RDF store consists of two files: the main data store
  11. and the indices (plus two lock files that are generated at runtime). A
  12. good amount of effort has been put to develop an indexing strategy that
  13. is balanced between write performance, read performance, and data size,
  14. with no compromise made on consistency.
  15. The main data store is the one containing the preservation-worthy data.
  16. While the indices are necessary for Lakesuperior to function, they can
  17. be entirely rebuilt from the main data store in case of file corruption
  18. (recovery tools are on the TODO list).
  19. Detailed notes about the various strategies researched can be found
  20. `here <indexing_strategy.md>`__.
  21. Scalability
  22. -----------
  23. Since Lakesuperior is focused on design simplicity, efficiency and
  24. reliability, its RDF store is embedded and not horizontally scalable.
  25. However, Lakesuperior is quite frugal with disk space. About 55 million
  26. triples can be stored in 8Gb of space (mileage can vary depending on how
  27. heterogeneous the triples are). This makes it easier to use expensive
  28. SSD drives for the RDF store, in order to improve performance. A single
  29. LMDB environment can reportedly scale up to 128 terabytes.
  30. Maintenance
  31. -----------
  32. LMDB has a very simple configuration, and all options are hardcoded in
  33. LAKESuperior in order to exploit its features. A database automatically
  34. recovers from a crash.
  35. The Lakesuperior RDF store abstraction maintains a registry of unique
  36. terms. These terms are not deleted if a triple is deleted, even if no
  37. triple is using them, because it would be too expesive to look up for
  38. orphaned terms during a delete request. While these terms are relatively
  39. lightweight, it would be good to run a periodical clean-up job. Tools
  40. will be developed in the near future to facilitate this maintenance
  41. task.
  42. Consistency
  43. -----------
  44. Lakesuperior wraps each LDP operation in a transaction. The indices are
  45. updated synchronously within the same transaction in order to guarantee
  46. consistency. If a system loses power or crashes, only the last
  47. transaction is lost, and the last successful write will include primary
  48. and index data.
  49. Concurrency
  50. -----------
  51. LMDB employs
  52. `MVCC <https://en.wikipedia.org/wiki/Multiversion_concurrency_control>`__
  53. to achieve fully ACID transactions. This implies that during a write,
  54. the whole database is locked. Multiple writes can be initiated
  55. concurrently, but the performance gain of doing so may be little because
  56. only one write operation can be performed at a time. Reasonable efforts
  57. have been put to make write transactions as short as possible (and more
  58. can be done). Also, this excludes a priori the option to implement
  59. long-running atomic operations, unless one is willing to block writes on
  60. the application for an indefinite length of time. On the other hand,
  61. write operations never block and are never blocked, so an application
  62. with a high read-to-write ratio may still benefit from multi-threaded
  63. requests.
  64. Performance
  65. -----------
  66. The `Performance Benchmark Report <performance.md>`__ contains benchmark
  67. results.
  68. Write performance is lower than Modeshape/Fedora4; this may be mostly
  69. due to the fact that indices are written synchronously in a blocking
  70. transaction; also, the LMDB B+Tree structure is optimized for read
  71. performance rather than write performance. Some optimizations on the
  72. application layer could be made.
  73. Reads are faster than Modeshape/Fedora.
  74. All tests so far have been performed in a single thread.