LAKEsuperior
LAKEsuperior is an experimental Fedora Repository
implementation.
Guiding Principles
LAKEsuperior aims at being an uncomplicated, efficient Fedora 4 implementation.
Its main goals are:
- Simplicity of design: LAKEsuperior relies on LMDB,
an embedded, high-performance key-value store, for storing metadata and on
the filesystem to store binaries.
- Efficiency: while raw speed is important, LAKEsuperior also aims at being
conservative with resources. Its memory and CPU footprint are small. Python C
extensions are used where possible to improve performance.
- Reliability: fully ACID-compliant writes guarantee consistency of data.
- Ease of management: Contents can be queried directly via term search or
SPARQL without the aid of external indices. Scripts and interfaces for
repository administration and monitoring are shipped with the standard release.
- Portability: aims at maintaining a minimal set of dependencies.
Key features
- Drop-in replacement for Fedora4 (with some caveats: see
Delta document)—currently being tested with
Hyrax 2
- Term-based search (planned) and SPARQL Query API + UI
- No performance penalty for storing many resources under the same container; no
kudzu
pairtree segmentation 1
- Constant performance writing to a resource with
many children or members; option to omit children in retrieval
- Migration tools (planned)
- Python API (planned): Authors of Python clients can use LAKEsuperior as an
embedded repository with no HTTP traffic or interim RDF serialization &
de-serialization involved.
- Fits in a pocket: you can carry over 50M triples in an 8Gb memory stick.
Implementation of the official Fedora API specs
(Fedora 5.x and beyond) is not
foreseen in the short term, however it would be a natural evolution of this
project if it gains support.
Please make sure you read the Delta document for
divergences with the official Fedora4 implementation.
Installation
Dependencies
- Python 3.5 or greater.
- The LMDB database library. It should be included
in most Linux distributions' standard package repositories.
- A message broker supporting the STOMP protocol. For testing and evaluation
purposes, Coilmq is included with the
dependencies and should be automatically installed.
Installation steps
- Install dependencies as indicated above
- Create a virtualenv in a project folder:
virtualenv -p <python 3.5+ exec path> <virtualenv folder>
- Initialize the virtualenv:
source <path_to_virtualenv>/bin/activate
- Clone this repo
cd
into repo folder
- Install dependencies:
pip install -r requirements.txt
- Copy the
etc.skeleton
folder to a separate location
- Set the configuration folder location in the environment:
export FCREPO_CONFIG_DIR=<your config dir location>
(alternatively you can
add this line to your virtualenv activate
script)
- Configure the application
- Start your STOMP broker, e.g.:
coilmq &
- Run
util/bootstrap.py
to initialize the binary and graph stores
- Run
./fcrepo
for a single-threaded server (Bjoern) or ./fcrepo-mt
for a
multi-threaded development server (GUnicorn).
Production deployment
If you like fried repositories for lunch, deploy before 11AM.
Status and development
LAKEsuperior is in alpha status. Please see the TODO list
for a rudimentary road map and status.
Technical documentation
Architecture Overview
Content Model
Storage Implementation
Performance Benchmarks
TODO list
1 However if your client splits pairtrees upstream, such as
Hyrax does, that obviously needs to change to get rid of the path
segments. ↩