%!s(int64=3) %!d(string=hai) anos · 45e395a356
--- a/README.md
+++ b/README.md
@@ -0,0 +1,160 @@
 
				+# `lsup_rdf`
			
 
				+
			
 
				+**This project is work in progress.**
			
 
				+
			
 
				+Embedded RDF (and maybe later, generic graph) store and manipulation library.
			
 
				+
			
 
				+## Purpose
			
 
				+
			
 
				+The goal of this library is to provide efficient and compact handling of RDF
			
 
				+data. At least a complete C API and Python bindings are planned.
			
 
				+
			
 
				+This library can be thought of as SQLite or BerkeleyDB for graphs. It can be
			
 
				+embedded directly in a program and store persistent data without the need of
			
 
				+running a server. In addition, `lsup_rdf` can perform in-memory graph
			
 
				+operations such as validation, de/serialization, boolean operations, lookup,
			
 
				+etc.
			
 
				+
			
 
				+Two graph back ends are available: a memory one based on hash maps and a
			
 
				+disk-based one based on [LMDB](https://symas.com/lmdb/), an extremely fast and
			
 
				+compact embedded key-store value. Graphs can be created independently with
			
 
				+either back end within the same program. Triples in the persistent back end are
			
 
				+fully indexed and optimized for a balance of lookup speed, data compactness,
			
 
				+and write performance (in order of importance).
			
 
				+
			
 
				+This library was initially meant to replace RDFLib dependency and Cython code
			
 
				+in [Lakesuperior](https://notabug.org/scossu/lakesuperior) in an effort to
			
 
				+reduce code clutter and speed up RDF handling; it is now a project for an
			
 
				+independent RDF library, but unless the contributor base expands, it will
			
 
				+remain focused on serving Lakesuperior.
			
 
				+
			
 
				+
			
 
				+## Development Status
			
 
				+
			
 
				+**Alpha.** The API structure is not yet stable and may change radically. The
			
 
				+code may not compile, or throw a fit when run. Testing is minimal. At the
			
 
				+moment this project is only intended for curious developers and researchers.
			
 
				+
			
 
				+This is also my first stab at writing a C library (coming from Python) and an
			
 
				+unpaid fun project, so don't be surprised if you find some gross stuff.
			
 
				+
			
 
				+
			
 
				+## Road Map
			
 
				+
			
 
				+### In Scope – Short Term
			
 
				+
			
 
				+The short-term goal is to support usage in Lakesuperior and a workable set
			
 
				+of features as a standalone library:
			
 
				+
			
 
				+- Handling of graphs, triples, terms
			
 
				+- Memory- and disk-backed (persistent) graph storage
			
 
				+- Contexts (disk-backed only)
			
 
				+- Handling of blank nodes
			
 
				+- Namespace prefixes
			
 
				+- Validation of literal and URI terms
			
 
				+- Validation of RDF triples
			
 
				+- Fast graph lookup using matching patterns
			
 
				+- Graph boolean operations
			
 
				+- Serialization and de-serialization to/from N-Triples and N-Quads
			
 
				+- Serialization and de-serialization to/from Turtle and TriG
			
 
				+- Compile-time configuration of max graph size (efficiency vs. capacity)
			
 
				+- Python bindings
			
 
				+- Basic command line utilities
			
 
				+
			
 
				+### Possibly In scope – Long Term
			
 
				+
			
 
				+- Binary serialization and hashing of graphs
			
 
				+- Binary protocol for synchronizing remote replicas
			
 
				+- Backend for massive distributed storage (possibly Ceph)
			
 
				+- Lua bindings
			
 
				+
			
 
				+### Likely Out of Scope
			
 
				+
			
 
				+(Unless provided and maintained by external contributors)
			
 
				+
			
 
				+- C++ bindings
			
 
				+- JSON-LD de/serialization
			
 
				+- SPARQL queries (We'll see... Will definitely need help)
			
 
				+
			
 
				+## Building
			
 
				+
			
 
				+### Requirements
			
 
				+
			
 
				+- It is recommended to build and run LSUP_RDF on a Linux system. No other
			
 
				+  OS has been tested so far.
			
 
				+- A C compiler. This has been only tested with `gcc` so far.
			
 
				+- [re2c](https://re2c.org/) and [Lemon](https://www.sqlite.org/lemon.html) to
			
 
				+  build the RDF language parsers.
			
 
				+- [cinclude2dot](https://www.flourish.org/cinclude2dot) and
			
 
				+  [Graphviz](https://graphviz.org/) for generating dependency graph (optional).
			
 
				+
			
 
				+
			
 
				+### `make` commands
			
 
				+
			
 
				+The default `make` command compiles the library. Enter `make help` to get an
			
 
				+overview of the other available commands.
			
 
				+
			
 
				+`make install` installs libraries and headers in the directories set by the
			
 
				+environment variable `$PREFIX`. If this is unset, the default `/usr/local`
			
 
				+prefix is used.
			
 
				+
			
 
				+Options to compile with debug symbols are available.
			
 
				+
			
 
				+
			
 
				+### Compile-Time Constants
			
 
				+
			
 
				+`DEBUG`: Set debug mode: memory map is at reduced size, logging is forced to
			
 
				+TRACE level, etc.
			
 
				+
			
 
				+`LSUP_RDF_STREAM_CHUNK_SIZE`: Size of RDF decoding buffer, i.e., maximum size
			
 
				+of a chunk of RDF data fed to the parser when decoding a RDF file into a graph.
			
 
				+This should be larger than the maximum expected size of a single term in your
			
 
				+RDF source. The default value is 8192, which is mildly conservative. If you
			
 
				+experience parsing errors on decoding, and they happen to be on a term such a
			
 
				+very long string literal, try recompiling the library with a larger value.
			
 
				+
			
 
				+## Embedding
			
 
				+
			
 
				+The generated `liblsuprdf.so` and `liblsuprdf.a` libraries can be linked
			
 
				+dynamically or statically to your code. Only the `lsup_rdf.h` header, which
			
 
				+recursively includes other headers in the `include` directory, needs to be
			
 
				+`#include`d in the embedding code.
			
 
				+
			
 
				+Environment variables and/or compiler options might have to be set in order to
			
 
				+find the dynamic libraries and headers in their install locations.
			
 
				+
			
 
				+For compilation and linking examples, refer to `test`, `memtest`, `perftest`
			
 
				+and other actions in the current Makefile.
			
 
				+
			
 
				+
			
 
				+### Environment Variables
			
 
				+
			
 
				+`LSUP_MDB_STORE_PATH`: The file path for the persistent store back end. For
			
 
				+production use it is strongly recommended to set this to a permanent location
			
 
				+on the fastest storage volume available. If unset, the current directory will
			
 
				+be used. The directory must exist.
			
 
				+
			
 
				+`LSUP_LOGLEVEL`: A number between 0 and 5, corresponding to:
			
 
				+
			
 
				+- 0: `TRACE`
			
 
				+- 1: `DEBUG`
			
 
				+- 2: `INFO`
			
 
				+- 3: `WARN`
			
 
				+- 4: `ERROR`
			
 
				+- 5: `FATAL`
			
 
				+
			
 
				+If unspecified, it is set to 3.
			
 
				+
			
 
				+`LSUP_MDB_MAPSIZE` Virtual memory map size. It is recommended to leave this
			
 
				+alone. By default, it is set to 1Tb for 64-bit systems and 4Gb for 32-bit
			
 
				+systems. The map size by itself does not use up any extra resources.
			
 
				+
			
 
				+
			
 
				+### C API Documentation
			
 
				+
			
 
				+*TODO*  Almost all header files are documented. Need a doc generator.
			
 
				+
			
 
				+
			
 
				+### Python API Documentation
			
 
				+
			
 
				+*TODO*
			
--- a/TODO.md
+++ b/TODO.md
@@ -0,0 +1,47 @@
 
				+# Quick TODO list
			
 
				+
			
 
				+*P* = pending; *W* = working on it; *D* = done.
			
 
				+
			
 
				+## Critical for MVP
			
 
				+
			
 
				+- *D* LMDB back end
			
 
				+- *D* Hash table back end
			
 
				+- *D* Namespace manager
			
 
				+- *D* N3 serialization / deserialization
			
 
				+- *D* Environment
			
 
				+- *D* Better error handling
			
 
				+- *D* Logging
			
 
				+- *D* Store graph
			
 
				+- *D* Python bindings
			
 
				+    - *D* Basic module framework
			
 
				+    - *D* term, triple, graph modules
			
 
				+    - *D* Codec integration
			
 
				+    - *D* Graph remove and lookup ops
			
 
				+    - *D* Namespace module
			
 
				+    - *D* Tests (basic)
			
 
				+    - *D* Subclass term types
			
 
				+- *D* Namespaced IRIs
			
 
				+- *D* Relative IRIs
			
 
				+- *D* Flexible store interface
			
 
				+- *D* Transaction control
			
 
				+- *W* Turtle serialization / deserialization
			
 
				+- *P* Full UTF-8 support
			
 
				+- *P* Extended tests
			
 
				+    - *P* C API
			
 
				+    - *P* Python API
			
 
				+
			
 
				+
			
 
				+## Non-critical for MVP
			
 
				+
			
 
				+- Term and triple validation
			
 
				+- Enhanced graph operations
			
 
				+    - Extract unique terms and 2-term tuples
			
 
				+- NQ codec
			
 
				+- TriG codec
			
 
				+- UTF-16 support
			
 
				+
			
 
				+
			
 
				+## Long-term
			
 
				+
			
 
				+- Generic graph (non RDF constrained)
			
 
				+- Lua bindings