Legacy Lakesuperior code.

Stefano Cossu ecca1777ea More docs. 7 年 前
data dc50b0a51d Make extract_imr compatible with bdb back end; add RDF types for resource graphs. 7 年 前
doc ecca1777ea More docs. 7 年 前
etc.skeleton 0db6d2b584 Remove deadlock by using async workers in GUnicorn. 7 年 前
lakesuperior 42ab1a3efb Remove pairtrees; only use full LDPC for intermediate paths. 7 年 前
static 4090e51570 SPARQL query UI and API. 7 年 前
tests 610c21b057 Update documentation: README, benchmarks and storage. 7 年 前
util f6cceb8da1 Close store outside of store layout bootstrap() method 7 年 前
.gitignore 2fdc1b902e Initial commit: some boilerplate borrowed from Combine, basic folder structure and documentation. 7 年 前
LICENSE 2fdc1b902e Initial commit: some boilerplate borrowed from Combine, basic folder structure and documentation. 7 年 前
README.md ecca1777ea More docs. 7 年 前
conftest.py 73f0d191e0 Put test_delete_from_ctx on hold (inconsistent results). 7 年 前
fcrepo d49241b86b GUnicorn. 7 年 前
profiler.py 854b6e4296 Update performance doc. 7 年 前
requirements.txt de3e8dbc03 Build minimal requirements list with pipdeptree. 7 年 前
server.py 653a629afc Bare-bones Web UI to browse resources. 7 年 前

README.md

LAKEsuperior

LAKEsuperior is an experimental Fedora Repository implementation.

Guiding Principles

LAKEsuperior aims at being an uncomplicated, efficient Fedora 4 implementation.

Its main goals are:

  • Simplicity of design: LAKEsuperior relies on LMDB, an embedded, high-performance key-value store, for storing metadata and on the filesystem to store binaries.
  • Efficiency: while raw speed is important, LAKEsuperior also aims at being conservative with resources. Its memory and CPU footprint are small. Python C extensions are used where possible to improve performance.
  • Reliability: fully ACID-compliant writes guarantee consistency of data.
  • Ease of management: Contents can be queried directly via term search or SPARQL without the aid of external indices. Scripts and interfaces for repository administration and monitoring are shipped with the standard release.
  • Portability: aims at maintaining a minimal set of dependencies.

Key features

  • Drop-in replacement for Fedora4 (with some caveats: see Delta document)—currently being tested with Hyrax 2
  • Term-based search (planned) and SPARQL Query API + UI
  • No performance penalty for storing many resources under the same container; no kudzu pairtree segmentation 1
  • Constant performance writing to a resource with many children or members; option to omit children in retrieval
  • Migration tools (planned)
  • Python API (planned): Authors of Python clients can use LAKEsuperior as an embedded repository with no HTTP traffic or interim RDF serialization & de-serialization involved.
  • Fits in a pocket: you can carry over 50M triples in an 8Gb memory stick.

Implementation of the official Fedora API specs (Fedora 5.x and beyond) is not foreseen in the short term, however it would be a natural evolution of this project if it gains support.

Please make sure you read the Delta document for divergences with the official Fedora4 implementation.

Installation

Dependencies

  1. Python 3.5 or greater.
  2. The LMDB database library. It should be included in most Linux distributions' standard package repositories.
  3. A message broker supporting the STOMP protocol. For testing and evaluation purposes, Coilmq is included in the dependencies and should be automatically installed.

Installation steps

  1. Install dependencies as indicated above
  2. Create a virtualenv in a project folder: virtualenv -p <python 3.5+ exec path> <virtualenv folder>
  3. Initialize the virtualenv: source <path_to_virtualenv>/bin/activate
  4. Clone this repo
  5. cd into repo folder
  6. Install dependencies: pip install -r requirements.txt
  7. Copy the etc.skeleton folder to a separate location
  8. Set the configuration folder location in the environment: export FCREPO_CONFIG_DIR=<your config dir location> (alternatively you can add this line to your virtualenv activate script)
  9. Configure the application
  10. Start your STOMP broker
  11. Run util/bootstrap.py to initialize the binary and graph stores
  12. Run ./fcrepo for a multi-threaded server or flask run for a single-threaded development server

Production deployment

If you like fried repositories for lunch, deploy before 11AM.

Status and development

LAKEsuperior is in alpha status. Please see the TODO list for a rudimentary road map and status.

Technical documentation

Storage Implementation

Performance benchmarks

TODO list


1 However if your client splits pairtrees upstream, such as Hyrax does, that obviously needs to change to get rid of the path segments.