C and Lua RDF library. ALPHA

Stefano Cossu e92f9e7363 Fix triple generator.		4 years ago
cpython	e92f9e7363 Fix triple generator.	4 years ago
ext	ff83a788bb Logging functions; environment module.	4 years ago
include	99a7057b0e Initial (faulty) Python graph from lookup method.	4 years ago
src	e92f9e7363 Fix triple generator.	4 years ago
test	2486b0c3bf Store graph; get contexts or MDB match patterns.	4 years ago
.gitignore	c27f43a402 Integrate further with framework; unescape UTF-8.	4 years ago
.gitmodules	ff83a788bb Logging functions; environment module.	4 years ago
CODE_OF_CONDUCT	5e1c8e5fa6 Fix Makefile; add docs.	4 years ago
LICENSE	5e1c8e5fa6 Fix Makefile; add docs.	4 years ago
Makefile	c67d41ad1f Fix mdb lookup_0bound; fix profiler tool.	4 years ago
README.md	6b11b4252c Various improvements:	4 years ago
TODO.md	6b11b4252c Various improvements:	4 years ago
profile.c	c67d41ad1f Fix mdb lookup_0bound; fix profiler tool.	4 years ago
setup.py	e92f9e7363 Fix triple generator.	4 years ago
test.c	2486b0c3bf Store graph; get contexts or MDB match patterns.	4 years ago
valgrind-python.supp	e92f9e7363 Fix triple generator.	4 years ago

`lsup_rdf`

This project is work in progress.

Embedded RDF (and maybe later, generic graph) store and manipulation library.

Purpose

The goal of this library is to provide extremely efficient and compact handling of RDF data. At least a C API and Python bindings are planned.

This library can be thought of as SQLite or BerkeleyDB for graphs. It can be embedded directly in a program and store persistent data without the need of running a server.

Two graph back ends are available: a memory one based on hash maps and a disk-based one based on LMDB, an extremely fast and compact embedded key-store value. Graphs can be created independently with either back end within the same program. Triples in the persistent back end are fully indexed and optimized for a balance of lookup speed, data compactness, and write performance (in order of importance).

This library was initially meant to replace RDFLib dependency and Cython code in Lakesuperior in an effort to reduce code clutter and speed up RDF handling; it is now a project for an independent RDF library, but unless the contributor base expands, it will remain focused on serving Lakesuperior.

Development Status

Pre-alpha. The API is not yet defined and may change radically. The code may not compile, or throw a fit when run. At the moment this project is only intended for curious developers and researchers.

This is also my first stab at writing a C library (coming from Python) and an unpaid fun project, so don't be surprised if you find some gross stuff.

Road Map

In Scope – Short Term

The short-term goal is to support usage in Lakesuperior and a workable set of features as a standalone library:

Handling of graphs, triples, terms
Memory- and disk-backed (persistent) graph storage
Contexts (disk-backed only)
Handling of blank nodes
Validation of literal and URI terms
Validation of RDF triples
Fast graph Lookup using matching patterns
Graph boolean operations
Serialization and de-serialization to/from N-Triples and N-Quads
Serialization and de-serialization to/from Turtle and TriG
Compile-time configuration of max graph size (efficiency vs. capacity)
Python bindings
Basic command line utilities

Possibly In scope – Long Term

Binary serialization and hashing of graphs
Binary protocol for synchronizing remote replicas
Lua bindings

Likely Out of Scope

(Unless provided and maintained by external contributors)

C++ bindings
JSON-LD de/serialization
SPARQL queries (We'll see... Will definitely need help)

Usage

Compile-Time Constants

DEBUG: Set debug mode: memory map is at reduced size, logging is forced to TRACE level, etc.

Environment Variables

LSUP_MDB_STORE_PATH: The file path for the persistent store back end. For production use it is strongly recommended to set this to a permanent location on the fastest storage volume available. If unset, the current directory will be used. The directory must exist.

LSUP_LOGLEVEL: A number between 0 and 5, corresponding to:

0: TRACE
1: DEBUG
2: INFO
3: WARN
4: ERROR
5: FATAL

If unspecified, it is set to 3.

LSUP_MDB_MAPSIZE Virtual memory map size. It is recommended to leave this unset and let the software adjust it to the hardware architecture. By default, it is set to 1Tb for 64-bit systems and 4Gb for 32-bit systems. The map size by itself does not use up any extra resources.

LSUP_RDF_STREAM_CHUNK_SIZE: Size of RDF decoding buffer, i.e., maximum size of a chunk of RDF data fed to the parser when decoding a RDF file into a graph. This should be larger than the maximum expected size of a single term in your RDF source. The default value is 8192, which is mildly conservative. If you experience parsing errors on decoding, and they happen to be on a term such a very long string literal, try recompiling the library with a larger value.

README.md

lsup_rdf