7 years ago · 00e4c33cc4
--- a/README.md
+++ b/README.md
@@ -3,24 +3,39 @@
 
				 LAKEsuperior is an experimental [Fedora Repository](http://fedorarepository.org)
			
 
				 implementation.
			
 
				 
			
 
				-## Basic concepts
			
 
				+## Guiding Principles
			
 
				 
			
 
				 LAKEsuperior aims at being an uncomplicated, efficient Fedora 4 implementation.
			
 
				 
			
 
				-Key features:
			
 
				+Its main goals are:
			
 
				+
			
 
				+- *Simplicity of design:* LAKEsuperior relies on [LMDB](https://symas.com/lmdb/),
			
 
				+an embedded, high-performance key-value store, for storing metadata and on
			
 
				+the filesystem to store binaries.
			
 
				+- *Efficiency:* while raw speed is important, LAKEsuperior also aims at being
			
 
				+conservative with resources. Its memory and CPU footprint are small. Python C
			
 
				+extensions are used where possible to improve performance.
			
 
				+- *Reliability:* fully ACID-compliant writes guarantee consistency of data.
			
 
				+- *Ease of management:* Contents can be queried directly via term search or
			
 
				+SPARQL without the aid of external indices. Scripts and interfaces for
			
 
				+repository administration and monitoring are shipped with the standard release.
			
 
				+- *Portability:* aims at maintaining a minimal set of dependencies.
			
 
				+
			
 
				+## Key features
			
 
				 
			
 
				 - Drop-in replacement for Fedora4 (with some caveats: see
			
 
				   [Delta document](doc/notes/fcrepo4_deltas.md))—currently being tested with
			
 
				   Hyrax 2
			
 
				-- Stores metadata in a graph store, binaries in filesystem
			
 
				-- Simple search and SPARQL Query API via back-end triplestore (planned)
			
 
				-- No performance issues storing many resources under the same container; no
			
 
				+- Term-based search (*planned*) and SPARQL Query API + UI
			
 
				+- No performance penalty for storing many resources under the same container; no
			
 
				   [kudzu](https://www.nature.org/ourinitiatives/urgentissues/land-conservation/forests/kudzu.xml)
			
 
				   pairtree segmentation <sup id="a1">[1](#f1)</sup>
			
 
				-- Mitigates "many member" issue: constant performance writing to a resource with
			
 
				+- Constant performance writing to a resource with
			
 
				   many children or members; option to omit children in retrieval
			
 
				-- Flexible back-end layouts: options to organize information in back end
			
 
				-- Migration tool (planned)
			
 
				+- Migration tools (*planned*)
			
 
				+- Python API (*planned*): Authors of Python clients can use LAKEsuperior as an
			
 
				+  embedded repository with no HTTP traffic or interim RDF serialization &
			
 
				+  de-serialization involved.
			
 
				 
			
 
				 Implementation of the official [Fedora API specs](https://fedora.info/spec/)
			
 
				 (Fedora 5.x and beyond) is not
			
@@ -30,20 +45,16 @@ project if it gains support.
 
				 Please make sure you read the [Delta document](doc/notes/fcrepo4_deltas.md) for
			
 
				 divergences with the official Fedora4 implementation.
			
 
				 
			
 
				-The application code strives to maintain a linear, intuitive code structure to
			
 
				-foster collaboration. *TODO link to tech overview and approach*
			
 
				-
			
 
				 ## Installation
			
 
				 
			
 
				 ### Dependencies
			
 
				 
			
 
				-1. A triplestore.
			
 
				-   [Fuseki](https://jena.apache.org/documentation/fuseki2/#download-fuseki)
			
 
				-   is the benchmark used so far in development. Other implementations are
			
 
				-   possible as long as they support RDF 1.1 and SPARQL over HTTP
			
 
				-1. A message broker supporting the STOMP protocol. If you have a separate
			
 
				-   instance of official Fedora listening to port 61613, that will do the job
			
 
				-1. Python 3.5 or greater
			
 
				+1. Python 3.5 or greater.
			
 
				+1. The [LMDB](https://symas.com/lmdb/) database library. It should be included
			
 
				+in most Linux distributions' standard package repositories.
			
 
				+1. A message broker supporting the STOMP protocol. For testing and evaluation
			
 
				+purposes, Coilmq is included in the dependencies and should be automatically
			
 
				+installed.
			
 
				 
			
 
				 ### Installation steps
			
 
				 
			
@@ -59,7 +70,7 @@ foster collaboration. *TODO link to tech overview and approach*
 
				    `export FCREPO_CONFIG_DIR=<your config dir location>` (alternatively you can
			
 
				    add this line to your virtualenv `activate` script)
			
 
				 1. Configure the application
			
 
				-1. Start your triplestore and STOMP broker
			
 
				+1. Start your STOMP broker
			
 
				 1. Run `util/bootstrap.py` to initialize the binary and graph stores
			
 
				 1. Run `./fcrepo` for a multi-threaded server or `flask run` for a
			
 
				    single-threaded development server
			
@@ -75,7 +86,10 @@ for a rudimentary road map and status.
 
				 
			
 
				 ## Further documentation
			
 
				 
			
 
				-The design documents are in the [doc/pdf](doc/pdf) folder. *@TODO needs update*
			
 
				+Miscellaneous documents are in the [doc](doc) folder. They will be organized
			
 
				+and linked better some day.
			
 
				+
			
 
				+---
			
 
				 
			
 
				 <b id="f1">1</b> However if your client splits pairtrees upstream, such as
			
 
				 Hyrax does, that obviously needs to change to get rid of the path
			
--- a/lakesuperior/store_layouts/ldp_rs/lmdb_store.py
+++ b/lakesuperior/store_layouts/ldp_rs/lmdb_store.py
@@ -170,29 +170,26 @@ class LmdbStore(Store):
 
				     in the same environment due to complications in handling transaction
			
 
				     contexts.
			
 
				 
			
 
				-    There are 3 main data sets (preservation worthy data):
			
 
				+    There are 4 main data sets (preservation worthy data):
			
 
				 
			
 
				-    - tk:t (triple key: pickled triple; unique keys)
			
 
				-    - tk:c (Triple key: pickled context; multi-valued keys)
			
 
				+    - t:st (term key: serialized term; 1:1)
			
 
				+    - spo:c (joined S, P, O keys: context key; dupsort, dupfixed)
			
 
				+    - c: (context keys only, values are the empty bytestring; 1:1)
			
 
				     - pfx:ns (prefix: pickled namespace; unique)
			
 
				 
			
 
				-    And 8 indices to optimize lookup for all possible bound/unbound term
			
 
				+    And 6 indices to optimize lookup for all possible bound/unbound term
			
 
				     combination in a triple:
			
 
				 
			
 
				-    - c:tk (pickled context URI: triple key)
			
 
				-    - sk:tk (subject key: triple key)
			
 
				-    - pk:tk (pred key: triple key)
			
 
				-    - ok:tk (object key: triple key)
			
 
				-    - spk:tk (subject + predicate key: triple key)
			
 
				-    - sok:tk (subject + object key: triple key)
			
 
				-    - pok:tk (predicate + object key: triple key)
			
 
				+    - th:t (term hash: term key; 1:1)
			
 
				+    - s:po (S key: joined P, O keys; dupsort, dupfixed)
			
 
				+    - p:so (P key: joined S, O keys; dupsort, dupfixed)
			
 
				+    - o:sp (O key: joined S, P keys; dupsort, dupfixed)
			
 
				+    - c:spo (context → triple association; dupsort, dupfixed)
			
 
				     - ns:pfx (pickled namespace: prefix; unique)
			
 
				 
			
 
				-    The above indices (except for ns:pfx) are all multi-valued and store
			
 
				-    fixed-length hash values referring to triples for economy's sake.
			
 
				-
			
 
				-    The search keys for terms are hashed on lookup. @TODO If this is too slow,
			
 
				-    we may want to index term hashes.
			
 
				+    These two data sets are stored in separate environments, i.e. separate
			
 
				+    files in the filesystem. The index could be recreated from the main data
			
 
				+    set in case of a disaster.
			
 
				     '''
			
 
				     context_aware = True
			
 
				     # This is a hassle to maintain for no apparent gain. If some use is devised
			
--- a/lakesuperior/store_layouts/ldp_rs/sparql_connector.py
+++ b/lakesuperior/store_layouts/ldp_rs/sparql_connector.py
@@ -1,58 +0,0 @@
 
				-import logging
			
 
				-
			
 
				-from abc import ABCMeta
			
 
				-from pprint import pformat
			
 
				-
			
 
				-from rdflib import Dataset
			
 
				-from rdflib.term import URIRef
			
 
				-from rdflib.plugins.stores.sparqlstore import SPARQLStore, SPARQLUpdateStore
			
 
				-from SPARQLWrapper.Wrapper import POST
			
 
				-
			
 
				-from lakesuperior.dictionaries.namespaces import ns_collection as nsc
			
 
				-from lakesuperior.store_layouts.ldp_rs.base_connector import BaseConnector
			
 
				-
			
 
				-
			
 
				-class SparqlConnector(BaseConnector):
			
 
				-    '''
			
 
				-    Handles the connection and dataset information.
			
 
				-
			
 
				-    This is indpendent from the application context (production/test) and can
			
 
				-    be passed any configuration options.
			
 
				-    '''
			
 
				-
			
 
				-    # N.B. This is Fuseki-specific.
			
 
				-    UNION_GRAPH_URI = URIRef('urn:x-arq:UnionGraph')
			
 
				-
			
 
				-    _logger = logging.getLogger(__name__)
			
 
				-
			
 
				-    def _init_connection(self, location, query_ep, update_ep=None,
			
 
				-            autocommit=False):
			
 
				-        '''
			
 
				-        Initialize the connection to the SPARQL endpoint.
			
 
				-
			
 
				-        If `update_ep` is not specified, the store is initialized as read-only.
			
 
				-        '''
			
 
				-        if update_ep:
			
 
				-            self.store = SPARQLUpdateStore(
			
 
				-                    queryEndpoint=location + query_ep,
			
 
				-                    update_endpoint=location + update_ep,
			
 
				-                    autocommit=autocommit,
			
 
				-                    dirty_reads=not autocommit)
			
 
				-
			
 
				-            self.readonly = False
			
 
				-        else:
			
 
				-            self.store = SPARQLStore(
			
 
				-                    location + query_ep, default_query_method=POST)
			
 
				-            self.readonly = True
			
 
				-
			
 
				-        self.ds = Dataset(self.store, default_union=True)
			
 
				-
			
 
				-
			
 
				-    def optimize_edits(self):
			
 
				-        opt_edits = [
			
 
				-                l for l in self.store._edits
			
 
				-                if not l.strip().startswith('PREFIX')]
			
 
				-        #opt_edits = list(ns_pfx_sparql.values()) + opt_edits
			
 
				-        self.store._edits = opt_edits
			
 
				-        self._logger.debug('Changes to be committed: {}'.format(
			
 
				-            pformat(self.store._edits)))