Legacy Lakesuperior code.

Stefano Cossu ad9f67b4bf Move bootstrap to admin CLI; add other method stubs. 7 лет назад
data dc50b0a51d Make extract_imr compatible with bdb back end; add RDF types for resource graphs. 7 лет назад
doc 212786dac9 Misc additions: some cosmetics to HTML pages, random docs, stub 7 лет назад
etc.skeleton 6498205eb5 Move stuff for Python API; lots of cleanup here and there. 7 лет назад
lakesuperior 212786dac9 Misc additions: some cosmetics to HTML pages, random docs, stub 7 лет назад
static 4090e51570 SPARQL query UI and API. 7 лет назад
tests e9f2e4fd85 Include leading slash in UIDs (á la filesystem path). 7 лет назад
util ad9f67b4bf Move bootstrap to admin CLI; add other method stubs. 7 лет назад
.gitignore 2fdc1b902e Initial commit: some boilerplate borrowed from Combine, basic folder structure and documentation. 7 лет назад
LICENSE 2fdc1b902e Initial commit: some boilerplate borrowed from Combine, basic folder structure and documentation. 7 лет назад
README.md 2c75f8d877 New and updated documentation (incomplete). 7 лет назад
conftest.py 6980366c72 Separate environments between inside and outside app context. 7 лет назад
fcrepo 46f5e63e42 Various startup scripts. 7 лет назад
fcrepo_mt 46f5e63e42 Various startup scripts. 7 лет назад
lsup-admin ad9f67b4bf Move bootstrap to admin CLI; add other method stubs. 7 лет назад
profiler.py 8554f845a3 Adapt profiler script to multi-modal access. 7 лет назад
requirements.txt d2a4d67889 Update requirements.txt. 7 лет назад
server.py ad9f67b4bf Move bootstrap to admin CLI; add other method stubs. 7 лет назад

README.md

LAKEsuperior

LAKEsuperior is an experimental Fedora Repository implementation.

Guiding Principles

LAKEsuperior aims at being an uncomplicated, efficient Fedora 4 implementation.

Its main goals are:

  • Simplicity of design: LAKEsuperior relies on LMDB, an embedded, high-performance key-value store, for storing metadata and on the filesystem to store binaries.
  • Efficiency: while raw speed is important, LAKEsuperior also aims at being conservative with resources. Its memory and CPU footprint are small. Python C extensions are used where possible to improve performance.
  • Reliability: fully ACID-compliant writes guarantee consistency of data.
  • Ease of management: Contents can be queried directly via term search or SPARQL without the aid of external indices. Scripts and interfaces for repository administration and monitoring are shipped with the standard release.
  • Portability: aims at maintaining a minimal set of dependencies.

Key features

  • Drop-in replacement for Fedora4 (with some caveats: see Delta document)—currently being tested with Hyrax 2
  • Term-based search (planned) and SPARQL Query API + UI
  • No performance penalty for storing many resources under the same container; no kudzu pairtree segmentation 1
  • Constant performance writing to a resource with many children or members; option to omit children in retrieval
  • Migration tools (planned)
  • Python API (planned): Authors of Python clients can use LAKEsuperior as an embedded repository with no HTTP traffic or interim RDF serialization & de-serialization involved.
  • Fits in a pocket: you can carry over 50M triples in an 8Gb memory stick.

Implementation of the official Fedora API specs (Fedora 5.x and beyond) is not foreseen in the short term, however it would be a natural evolution of this project if it gains support.

Please make sure you read the Delta document for divergences with the official Fedora4 implementation.

Installation

Dependencies

  1. Python 3.5 or greater.
  2. The LMDB database library. It should be included in most Linux distributions' standard package repositories.
  3. A message broker supporting the STOMP protocol. For testing and evaluation purposes, Coilmq is included with the dependencies and should be automatically installed.

Installation steps

  1. Install dependencies as indicated above
  2. Create a virtualenv in a project folder: virtualenv -p <python 3.5+ exec path> <virtualenv folder>
  3. Initialize the virtualenv: source <path_to_virtualenv>/bin/activate
  4. Clone this repo
  5. cd into repo folder
  6. Install dependencies: pip install -r requirements.txt
  7. Copy the etc.skeleton folder to a separate location
  8. Set the configuration folder location in the environment: export FCREPO_CONFIG_DIR=<your config dir location> (alternatively you can add this line to your virtualenv activate script)
  9. Configure the application
  10. Start your STOMP broker, e.g.: coilmq &
  11. Run util/bootstrap.py to initialize the binary and graph stores
  12. Run ./fcrepo for a single-threaded server (Bjoern) or ./fcrepo-mt for a multi-threaded development server (GUnicorn).

Production deployment

If you like fried repositories for lunch, deploy before 11AM.

Status and development

LAKEsuperior is in alpha status. Please see the TODO list for a rudimentary road map and status.

Technical documentation

Architecture Overview

Content Model

Storage Implementation

Performance Benchmarks

TODO list


1 However if your client splits pairtrees upstream, such as Hyrax does, that obviously needs to change to get rid of the path segments.