# `lsup_rdf`

**This project is work in progress.**

Embedded RDF (and maybe later, generic graph) store and manipulation library.

## Purpose

The goal of this library is to provide extremely efficient and compact
handling of RDF data. At least a C API and Python bindings are planned.

This library can be thought of as SQLite or BerkeleyDB for graphs. It can be
embedded directly in a program and store persistent data without the need of
running a server.

Two graph back ends are available: a memory one based on hash maps and a
disk-based one based on [LMDB](https://symas.com/lmdb/), an extremely fast and
compact embedded key-store value. Graphs can be created independently with
either back end within the same program. Triples in the persistent back end are
fully indexed and optimized for a balance of lookup speed, data compactness,
and write performance (in order of importance).

This library was initially meant to replace RDFLib dependency and Cython code
in [Lakesuperior](https://notabug.org/scossu/lakesuperior) in an effort to
reduce code clutter and speed up RDF handling; it is now a project for an
independent RDF library, but unless the contributor base expands, it will
remain focused on serving Lakesuperior.


## Development Status

**Pre-alpha.** The API is not yet defined and may change radically. The code
may not compile, or throw a fit when run. At the moment this project is only
intended for curious developers and researchers.

This is also my first stab at writing a C library (coming from Python) and an
unpaid fun project, so don't be surprised if you find some gross stuff.


## Road Map

### In Scope – Short Term

The short-term goal is to support usage in Lakesuperior and a workable set
of features as a standalone library:

- Handling of graphs, triples, terms
- Memory- and disk-backed (persistent) graph storage
- Contexts (disk-backed only)
- Handling of blank nodes
- Validation of literal and URI terms
- Validation of RDF triples
- Fast graph Lookup using matching patterns
- Graph boolean operations
- Serialization and de-serialization to/from N-Triples and N-Quads
- Serialization and de-serialization to/from Turtle and TriG
- Compile-time configuration of max graph size (efficiency vs. capacity)
- Python bindings
- Basic command line utilities

### Possibly In scope – Long Term

- Binary serialization and hashing of graphs
- Binary protocol for synchronizing remote replicas
- Lua bindings

### Likely Out of Scope

(Unless provided and maintained by external contributors)

- C++ bindings
- JSON-LD de/serialization
- SPARQL queries (We'll see... Will definitely need help)