# `lsup_rdf` **This project is work in progress.** Embedded RDF (and maybe later, generic graph) store and manipulation library. ## Purpose The goal of this library is to provide efficient and compact handling of RDF data. At least a complete C API and Python bindings are planned. This library can be thought of as SQLite or BerkeleyDB for graphs. It can be embedded directly in a program and store persistent data without the need of running a server. In addition, `lsup_rdf` can perform in-memory graph operations such as validation, de/serialization, boolean operations, lookup, etc. Two graph back ends are available: a memory one based on hash maps and a disk-based one based on [LMDB](https://symas.com/lmdb/), an extremely fast and compact embedded key-store value. Graphs can be created independently with either back end within the same program. Triples in the persistent back end are fully indexed and optimized for a balance of lookup speed, data compactness, and write performance (in order of importance). The code offers an interface to write a custom back end implementation with minimal changes to the core. More documentation on the topic will follow. This library was initially meant to replace RDFLib dependency and Cython code in [Lakesuperior](https://notabug.org/scossu/lakesuperior) in an effort to reduce code clutter and speed up RDF handling; it is now a project for an independent RDF library, but unless the contributor base expands, it will remain focused on serving Lakesuperior. ## Development Status **Alpha.** Considered feature-complete from an MVP standpoint. The API may still change significantly. The code may not compile, or throw a fit when run. The Python API may be behind the C API and not work. Test coverage is not sufficient. Documentation is fairly extensive but needs reformatting. This code is being integrated in the higher-level `lsup_repo` project and is being improved as issues arise. The status will move to beta as soon as `lsup_repo` covers a significant range of `lsup_rdf` features. This is also my first stab at writing a C library (coming from Python) and an unpaid fun project, so don't be surprised if you find some gross stuff. ## Road Map ### In Scope – Short Term The short-term goal is to support usage in Lakesuperior and a workable set of features as a standalone library: - ✓ Handling of graphs, triples, terms - ✓ Memory- and disk-backed (persistent) graph storage - ✓ Contexts (disk-backed only) - ⚒ Handling of blank nodes - ✓ Namespace prefixes - ✓ Validation of literal and URI terms - ✓ Validation of RDF triples - ✓ Fast graph lookup using matching patterns - ✓ Graph boolean operations - ✓ Serialization and de-serialization to/from N-Triples and N-Quads - ✓ Serialization and de-serialization to/from Turtle and TriG - ✓ Compile-time configuration of max graph size (efficiency vs. capacity) - ⚒ Python bindings - ⚒ Basic command line utilities - ⚒ Store interface for custom back end implementations ### Possibly In scope – Long Term - Binary serialization and hashing of graphs - Binary protocol for synchronizing remote replicas - True plug-in architecture for 3rd-party store implementations ### Likely Out of Scope (Unless provided and maintained by external contributors) - JSON-LD de/serialization - SPARQL support ## Building ### Requirements - It is recommended to build and run `lsup_rdf` on a Linux system. No other OS has been tested so far. - A C compiler. This has been only tested with `gcc` so far. - [LMDB](https://symas.com/lmdb/) libraries and headers. - [XXHash](https://github.com/Cyan4973/xxHash) >=0.8 libraries and headers. - [re2c](https://re2c.org/) to build the RDF language lexers. - [cinclude2dot](https://www.flourish.org/cinclude2dot) and [Graphviz](https://graphviz.org/) for generating dependency graph (optional). ### `make` commands The default `make` command compiles the library. Enter `make help` to get an overview of the other available commands. `make install` installs libraries and headers in the directories set by the environment variable `$PREFIX`. If this is unset, the default `/usr/local` prefix is used. Options to compile with debug symbols are available. ### Compile-Time defines (`-D[...]`) `DEBUG`: Set debug mode: memory map is at reduced size, logging is forced to TRACE level, etc. `LSUP_RDF_STREAM_CHUNK_SIZE`: Size of RDF decoding buffer, i.e., maximum size of a chunk of RDF data fed to the parser when decoding a RDF file into a graph. This should be larger than the maximum expected size of a single term in your RDF source. The default value is 8192, which is mildly conservative. If you experience parsing errors on decoding, and they happen to be on a term such a very long string literal, try recompiling the library with a larger value. ## Embedding The generated `liblsuprdf.so` and `liblsuprdf.a` libraries can be linked dynamically or statically to your code. Only the `lsup_rdf.h` header, which recursively includes other headers in the `include` directory, needs to be `#include`d in the embedding code. Environment variables and/or compiler options might have to be set in order to find the dynamic libraries and headers in their install locations. For compilation and linking examples, refer to `test`, `memtest`, `perftest` and other actions in the current Makefile. ### Environment Variables `LSUP_MDB_STORE_PATH`: The file path for the persistent store back end. For production use it is strongly recommended to set this to a permanent location on the fastest storage volume available. If unset, the current directory will be used. The directory must exist. `LSUP_LOGLEVEL`: A number between 0 and 5, corresponding to: - 0: `TRACE` - 1: `DEBUG` - 2: `INFO` - 3: `WARN` - 4: `ERROR` - 5: `FATAL` If unspecified, it is set to 3. `LSUP_MDB_MAPSIZE` Virtual memory map size. It is recommended to leave this alone. By default, it is set to 1Tb for 64-bit systems and 4Gb for 32-bit systems. The map size by itself does not use up any extra resources. ### C API Documentation Almost all header files are documented. Run `doxygen` (see [Doxygen](https://www.doxygen.nl/index.html)) to generate HTML documentation in `docs/html`. ### Python API Documentation *TODO*