7 years ago · 00e4c33cc4
--- a/README.md
+++ b/README.md
@@ -3,24 +3,39 @@
 
															 LAKEsuperior is an experimental [Fedora Repository](http://fedorarepository.org)
														
 
															 implementation.
														
 
															-## Basic concepts
														
 
															+## Guiding Principles
														
 
															 LAKEsuperior aims at being an uncomplicated, efficient Fedora 4 implementation.
														
 
															-Key features:
														
 
															+Its main goals are:
														
 
															+
														
 
															+- *Simplicity of design:* LAKEsuperior relies on [LMDB](https://symas.com/lmdb/),
														
 
															+an embedded, high-performance key-value store, for storing metadata and on
														
 
															+the filesystem to store binaries.
														
 
															+- *Efficiency:* while raw speed is important, LAKEsuperior also aims at being
														
 
															+conservative with resources. Its memory and CPU footprint are small. Python C
														
 
															+extensions are used where possible to improve performance.
														
 
															+- *Reliability:* fully ACID-compliant writes guarantee consistency of data.
														
 
															+- *Ease of management:* Contents can be queried directly via term search or
														
 
															+SPARQL without the aid of external indices. Scripts and interfaces for
														
 
															+repository administration and monitoring are shipped with the standard release.
														
 
															+- *Portability:* aims at maintaining a minimal set of dependencies.
														
 
															+
														
 
															+## Key features
														
 
															 - Drop-in replacement for Fedora4 (with some caveats: see
														
 
															   [Delta document](doc/notes/fcrepo4_deltas.md))—currently being tested with
														
 
															   Hyrax 2
														
 
															-- Stores metadata in a graph store, binaries in filesystem
														
 
															-- Simple search and SPARQL Query API via back-end triplestore (planned)
														
 
															-- No performance issues storing many resources under the same container; no
														
 
															+- Term-based search (*planned*) and SPARQL Query API + UI
														
 
															+- No performance penalty for storing many resources under the same container; no
														
 
															   [kudzu](https://www.nature.org/ourinitiatives/urgentissues/land-conservation/forests/kudzu.xml)
														
 
															   pairtree segmentation <sup id="a1">[1](#f1)</sup>
														
 
															-- Mitigates "many member" issue: constant performance writing to a resource with
														
 
															+- Constant performance writing to a resource with
														
 
															   many children or members; option to omit children in retrieval
														
 
															-- Flexible back-end layouts: options to organize information in back end
														
 
															-- Migration tool (planned)
														
 
															+- Migration tools (*planned*)
														
 
															+- Python API (*planned*): Authors of Python clients can use LAKEsuperior as an
														
 
															+  embedded repository with no HTTP traffic or interim RDF serialization &
														
 
															+  de-serialization involved.
														
 
															 Implementation of the official [Fedora API specs](https://fedora.info/spec/)
														
 
															 (Fedora 5.x and beyond) is not
														
@@ -30,20 +45,16 @@ project if it gains support.
 
															 Please make sure you read the [Delta document](doc/notes/fcrepo4_deltas.md) for
														
 
															 divergences with the official Fedora4 implementation.
														
 
															-The application code strives to maintain a linear, intuitive code structure to
														
 
															-foster collaboration. *TODO link to tech overview and approach*
														
 
															-
														
 
															 ## Installation
														
 
															 ### Dependencies
														
 
															-1. A triplestore.
														
 
															-   [Fuseki](https://jena.apache.org/documentation/fuseki2/#download-fuseki)
														
 
															-   is the benchmark used so far in development. Other implementations are
														
 
															-   possible as long as they support RDF 1.1 and SPARQL over HTTP
														
 
															-1. A message broker supporting the STOMP protocol. If you have a separate
														
 
															-   instance of official Fedora listening to port 61613, that will do the job
														
 
															-1. Python 3.5 or greater
														
 
															+1. Python 3.5 or greater.
														
 
															+1. The [LMDB](https://symas.com/lmdb/) database library. It should be included
														
 
															+in most Linux distributions' standard package repositories.
														
 
															+1. A message broker supporting the STOMP protocol. For testing and evaluation
														
 
															+purposes, Coilmq is included in the dependencies and should be automatically
														
 
															+installed.
														
 
															 ### Installation steps
														
@@ -59,7 +70,7 @@ foster collaboration. *TODO link to tech overview and approach*
 
															    `export FCREPO_CONFIG_DIR=<your config dir location>` (alternatively you can
														
 
															    add this line to your virtualenv `activate` script)
														
 
															 1. Configure the application
														
 
															-1. Start your triplestore and STOMP broker
														
 
															+1. Start your STOMP broker
														
 
															 1. Run `util/bootstrap.py` to initialize the binary and graph stores
														
 
															 1. Run `./fcrepo` for a multi-threaded server or `flask run` for a
														
 
															    single-threaded development server
														
@@ -75,7 +86,10 @@ for a rudimentary road map and status.
 
															 ## Further documentation
														
 
															-The design documents are in the [doc/pdf](doc/pdf) folder. *@TODO needs update*
														
 
															+Miscellaneous documents are in the [doc](doc) folder. They will be organized
														
 
															+and linked better some day.
														
 
															+
														
 
															+---
														
 
															 <b id="f1">1</b> However if your client splits pairtrees upstream, such as
														
 
															 Hyrax does, that obviously needs to change to get rid of the path
														
--- a/lakesuperior/store_layouts/ldp_rs/lmdb_store.py
+++ b/lakesuperior/store_layouts/ldp_rs/lmdb_store.py
@@ -170,29 +170,26 @@ class LmdbStore(Store):
 
															     in the same environment due to complications in handling transaction
														
 
															     contexts.
														
 
															-    There are 3 main data sets (preservation worthy data):
														
 
															+    There are 4 main data sets (preservation worthy data):
														
 
															-    - tk:t (triple key: pickled triple; unique keys)
														
 
															-    - tk:c (Triple key: pickled context; multi-valued keys)
														
 
															+    - t:st (term key: serialized term; 1:1)
														
 
															+    - spo:c (joined S, P, O keys: context key; dupsort, dupfixed)
														
 
															+    - c: (context keys only, values are the empty bytestring; 1:1)
														
 
															     - pfx:ns (prefix: pickled namespace; unique)
														
 
															-    And 8 indices to optimize lookup for all possible bound/unbound term
														
 
															+    And 6 indices to optimize lookup for all possible bound/unbound term
														
 
															     combination in a triple:
														
 
															-    - c:tk (pickled context URI: triple key)
														
 
															-    - sk:tk (subject key: triple key)
														
 
															-    - pk:tk (pred key: triple key)
														
 
															-    - ok:tk (object key: triple key)
														
 
															-    - spk:tk (subject + predicate key: triple key)
														
 
															-    - sok:tk (subject + object key: triple key)
														
 
															-    - pok:tk (predicate + object key: triple key)
														
 
															+    - th:t (term hash: term key; 1:1)
														
 
															+    - s:po (S key: joined P, O keys; dupsort, dupfixed)
														
 
															+    - p:so (P key: joined S, O keys; dupsort, dupfixed)
														
 
															+    - o:sp (O key: joined S, P keys; dupsort, dupfixed)
														
 
															+    - c:spo (context → triple association; dupsort, dupfixed)
														
 
															     - ns:pfx (pickled namespace: prefix; unique)
														
 
															-    The above indices (except for ns:pfx) are all multi-valued and store
														
 
															-    fixed-length hash values referring to triples for economy's sake.
														
 
															-
														
 
															-    The search keys for terms are hashed on lookup. @TODO If this is too slow,
														
 
															-    we may want to index term hashes.
														
 
															+    These two data sets are stored in separate environments, i.e. separate
														
 
															+    files in the filesystem. The index could be recreated from the main data
														
 
															+    set in case of a disaster.
														
 
															     '''
														
 
															     context_aware = True
														
 
															     # This is a hassle to maintain for no apparent gain. If some use is devised
														
--- a/lakesuperior/store_layouts/ldp_rs/sparql_connector.py
+++ b/lakesuperior/store_layouts/ldp_rs/sparql_connector.py
@@ -1,58 +0,0 @@
 
															-import logging
														
 
															-
														
 
															-from abc import ABCMeta
														
 
															-from pprint import pformat
														
 
															-
														
 
															-from rdflib import Dataset
														
 
															-from rdflib.term import URIRef
														
 
															-from rdflib.plugins.stores.sparqlstore import SPARQLStore, SPARQLUpdateStore
														
 
															-from SPARQLWrapper.Wrapper import POST
														
 
															-
														
 
															-from lakesuperior.dictionaries.namespaces import ns_collection as nsc
														
 
															-from lakesuperior.store_layouts.ldp_rs.base_connector import BaseConnector
														
 
															-
														
 
															-
														
 
															-class SparqlConnector(BaseConnector):
														
 
															-    '''
														
 
															-    Handles the connection and dataset information.
														
 
															-
														
 
															-    This is indpendent from the application context (production/test) and can
														
 
															-    be passed any configuration options.
														
 
															-    '''
														
 
															-
														
 
															-    # N.B. This is Fuseki-specific.
														
 
															-    UNION_GRAPH_URI = URIRef('urn:x-arq:UnionGraph')
														
 
															-
														
 
															-    _logger = logging.getLogger(__name__)
														
 
															-
														
 
															-    def _init_connection(self, location, query_ep, update_ep=None,
														
 
															-            autocommit=False):
														
 
															-        '''
														
 
															-        Initialize the connection to the SPARQL endpoint.
														
 
															-
														
 
															-        If `update_ep` is not specified, the store is initialized as read-only.
														
 
															-        '''
														
 
															-        if update_ep:
														
 
															-            self.store = SPARQLUpdateStore(
														
 
															-                    queryEndpoint=location + query_ep,
														
 
															-                    update_endpoint=location + update_ep,
														
 
															-                    autocommit=autocommit,
														
 
															-                    dirty_reads=not autocommit)
														
 
															-
														
 
															-            self.readonly = False
														
 
															-        else:
														
 
															-            self.store = SPARQLStore(
														
 
															-                    location + query_ep, default_query_method=POST)
														
 
															-            self.readonly = True
														
 
															-
														
 
															-        self.ds = Dataset(self.store, default_union=True)
														
 
															-
														
 
															-
														
 
															-    def optimize_edits(self):
														
 
															-        opt_edits = [
														
 
															-                l for l in self.store._edits
														
 
															-                if not l.strip().startswith('PREFIX')]
														
 
															-        #opt_edits = list(ns_pfx_sparql.values()) + opt_edits
														
 
															-        self.store._edits = opt_edits
														
 
															-        self._logger.debug('Changes to be committed: {}'.format(
														
 
															-            pformat(self.store._edits)))