C and Python RDF library. ALPHA
scossu 39aafd8150 Fix some docstrings. | hace 7 meses | |
---|---|---|
bin | hace 2 años | |
cpython | hace 1 año | |
docs | hace 8 meses | |
ext | hace 2 años | |
include | hace 7 meses | |
src | hace 7 meses | |
test | hace 8 meses | |
.gitignore | hace 2 años | |
.gitmodules | hace 2 años | |
CODE_OF_CONDUCT | hace 4 años | |
Dockerfile | hace 1 año | |
Doxyfile | hace 8 meses | |
LICENSE | hace 2 años | |
MANIFEST.in | hace 2 años | |
Makefile | hace 9 meses | |
README.md | hace 8 meses | |
TODO.md | hace 2 años | |
profile.c | hace 1 año | |
pyproject.toml | hace 2 años | |
setup.py | hace 8 meses | |
test.c | hace 2 años | |
valgrind-python.supp | hace 4 años |
lsup_rdf
This project is work in progress.
Embedded RDF (and maybe later, generic graph) store and manipulation library.
The goal of this library is to provide efficient and compact handling of RDF data. At least a complete C API and Python bindings are planned.
This library can be thought of as SQLite or BerkeleyDB for graphs. It can be
embedded directly in a program and store persistent data without the need of
running a server. In addition, lsup_rdf
can perform in-memory graph
operations such as validation, de/serialization, boolean operations, lookup,
etc.
Two graph back ends are available: a memory one based on hash maps and a disk-based one based on LMDB, an extremely fast and compact embedded key-store value. Graphs can be created independently with either back end within the same program. Triples in the persistent back end are fully indexed and optimized for a balance of lookup speed, data compactness, and write performance (in order of importance).
The code offers an interface to write a custom back end implementation with minimal changes to the core. More documentation on the topic will follow.
This library was initially meant to replace RDFLib dependency and Cython code in Lakesuperior in an effort to reduce code clutter and speed up RDF handling; it is now a project for an independent RDF library, but unless the contributor base expands, it will remain focused on serving Lakesuperior.
Alpha. Considered feature-complete from an MVP standpoint. The API may
still change significantly. The code may not compile, or throw a fit when run.
The Python API may be behind the C API and not work. Test coverage is not
sufficient. Documentation is fairly extensive but needs reformatting. This code
is being integrated in the higher-level lsup_repo
project and is being
improved as issues arise. The status will move to beta as soon as lsup_repo
covers a significant range of lsup_rdf
features.
This is also my first stab at writing a C library (coming from Python) and an unpaid fun project, so don't be surprised if you find some gross stuff.
The short-term goal is to support usage in Lakesuperior and a workable set of features as a standalone library:
(Unless provided and maintained by external contributors)
lsup_rdf
on a Linux system. No other
OS has been tested so far.gcc
so far.make
commandsThe default make
command compiles the library. Enter make help
to get an
overview of the other available commands.
make install
installs libraries and headers in the directories set by the
environment variable $PREFIX
. If this is unset, the default /usr/local
prefix is used.
Options to compile with debug symbols are available.
-D[...]
)DEBUG
: Set debug mode: memory map is at reduced size, logging is forced to
TRACE level, etc.
LSUP_RDF_STREAM_CHUNK_SIZE
: Size of RDF decoding buffer, i.e., maximum size
of a chunk of RDF data fed to the parser when decoding a RDF file into a graph.
This should be larger than the maximum expected size of a single term in your
RDF source. The default value is 8192, which is mildly conservative. If you
experience parsing errors on decoding, and they happen to be on a term such a
very long string literal, try recompiling the library with a larger value.
The generated liblsuprdf.so
and liblsuprdf.a
libraries can be linked
dynamically or statically to your code. Only the lsup_rdf.h
header, which
recursively includes other headers in the include
directory, needs to be
#include
d in the embedding code.
Environment variables and/or compiler options might have to be set in order to find the dynamic libraries and headers in their install locations.
For compilation and linking examples, refer to test
, memtest
, perftest
and other actions in the current Makefile.
LSUP_MDB_STORE_PATH
: The file path for the persistent store back end. For
production use it is strongly recommended to set this to a permanent location
on the fastest storage volume available. If unset, the current directory will
be used. The directory must exist.
LSUP_LOGLEVEL
: A number between 0 and 5, corresponding to:
TRACE
DEBUG
INFO
WARN
ERROR
FATAL
If unspecified, it is set to 3.
LSUP_MDB_MAPSIZE
Virtual memory map size. It is recommended to leave this
alone, unless you are running Valgrind or other tools that limit memory usage.
The map size by itself does not preallocate any resources and is safe to
increase beyond the physical capacity of the host system. By default, it is set
to 1Tb for 64-bit systems and 4Gb for 32-bit systems.
Run doxygen
(see Doxygen) to generate
HTML documentation in docs/html
.
TODO