Browse Source

Remove SPARQL connector; update documentation.

Stefano Cossu 7 years ago
parent
commit
00e4c33cc4

+ 34 - 20
README.md

@@ -3,24 +3,39 @@
 LAKEsuperior is an experimental [Fedora Repository](http://fedorarepository.org)
 LAKEsuperior is an experimental [Fedora Repository](http://fedorarepository.org)
 implementation.
 implementation.
 
 
-## Basic concepts
+## Guiding Principles
 
 
 LAKEsuperior aims at being an uncomplicated, efficient Fedora 4 implementation.
 LAKEsuperior aims at being an uncomplicated, efficient Fedora 4 implementation.
 
 
-Key features:
+Its main goals are:
+
+- *Simplicity of design:* LAKEsuperior relies on [LMDB](https://symas.com/lmdb/),
+an embedded, high-performance key-value store, for storing metadata and on
+the filesystem to store binaries.
+- *Efficiency:* while raw speed is important, LAKEsuperior also aims at being
+conservative with resources. Its memory and CPU footprint are small. Python C
+extensions are used where possible to improve performance.
+- *Reliability:* fully ACID-compliant writes guarantee consistency of data.
+- *Ease of management:* Contents can be queried directly via term search or
+SPARQL without the aid of external indices. Scripts and interfaces for
+repository administration and monitoring are shipped with the standard release.
+- *Portability:* aims at maintaining a minimal set of dependencies.
+
+## Key features
 
 
 - Drop-in replacement for Fedora4 (with some caveats: see
 - Drop-in replacement for Fedora4 (with some caveats: see
   [Delta document](doc/notes/fcrepo4_deltas.md))—currently being tested with
   [Delta document](doc/notes/fcrepo4_deltas.md))—currently being tested with
   Hyrax 2
   Hyrax 2
-- Stores metadata in a graph store, binaries in filesystem
-- Simple search and SPARQL Query API via back-end triplestore (planned)
-- No performance issues storing many resources under the same container; no
+- Term-based search (*planned*) and SPARQL Query API + UI
+- No performance penalty for storing many resources under the same container; no
   [kudzu](https://www.nature.org/ourinitiatives/urgentissues/land-conservation/forests/kudzu.xml)
   [kudzu](https://www.nature.org/ourinitiatives/urgentissues/land-conservation/forests/kudzu.xml)
   pairtree segmentation <sup id="a1">[1](#f1)</sup>
   pairtree segmentation <sup id="a1">[1](#f1)</sup>
-- Mitigates "many member" issue: constant performance writing to a resource with
+- Constant performance writing to a resource with
   many children or members; option to omit children in retrieval
   many children or members; option to omit children in retrieval
-- Flexible back-end layouts: options to organize information in back end
-- Migration tool (planned)
+- Migration tools (*planned*)
+- Python API (*planned*): Authors of Python clients can use LAKEsuperior as an
+  embedded repository with no HTTP traffic or interim RDF serialization &
+  de-serialization involved.
 
 
 Implementation of the official [Fedora API specs](https://fedora.info/spec/)
 Implementation of the official [Fedora API specs](https://fedora.info/spec/)
 (Fedora 5.x and beyond) is not
 (Fedora 5.x and beyond) is not
@@ -30,20 +45,16 @@ project if it gains support.
 Please make sure you read the [Delta document](doc/notes/fcrepo4_deltas.md) for
 Please make sure you read the [Delta document](doc/notes/fcrepo4_deltas.md) for
 divergences with the official Fedora4 implementation.
 divergences with the official Fedora4 implementation.
 
 
-The application code strives to maintain a linear, intuitive code structure to
-foster collaboration. *TODO link to tech overview and approach*
-
 ## Installation
 ## Installation
 
 
 ### Dependencies
 ### Dependencies
 
 
-1. A triplestore.
-   [Fuseki](https://jena.apache.org/documentation/fuseki2/#download-fuseki)
-   is the benchmark used so far in development. Other implementations are
-   possible as long as they support RDF 1.1 and SPARQL over HTTP
-1. A message broker supporting the STOMP protocol. If you have a separate
-   instance of official Fedora listening to port 61613, that will do the job
-1. Python 3.5 or greater
+1. Python 3.5 or greater.
+1. The [LMDB](https://symas.com/lmdb/) database library. It should be included
+in most Linux distributions' standard package repositories.
+1. A message broker supporting the STOMP protocol. For testing and evaluation
+purposes, Coilmq is included in the dependencies and should be automatically
+installed.
 
 
 ### Installation steps
 ### Installation steps
 
 
@@ -59,7 +70,7 @@ foster collaboration. *TODO link to tech overview and approach*
    `export FCREPO_CONFIG_DIR=<your config dir location>` (alternatively you can
    `export FCREPO_CONFIG_DIR=<your config dir location>` (alternatively you can
    add this line to your virtualenv `activate` script)
    add this line to your virtualenv `activate` script)
 1. Configure the application
 1. Configure the application
-1. Start your triplestore and STOMP broker
+1. Start your STOMP broker
 1. Run `util/bootstrap.py` to initialize the binary and graph stores
 1. Run `util/bootstrap.py` to initialize the binary and graph stores
 1. Run `./fcrepo` for a multi-threaded server or `flask run` for a
 1. Run `./fcrepo` for a multi-threaded server or `flask run` for a
    single-threaded development server
    single-threaded development server
@@ -75,7 +86,10 @@ for a rudimentary road map and status.
 
 
 ## Further documentation
 ## Further documentation
 
 
-The design documents are in the [doc/pdf](doc/pdf) folder. *@TODO needs update*
+Miscellaneous documents are in the [doc](doc) folder. They will be organized
+and linked better some day.
+
+---
 
 
 <b id="f1">1</b> However if your client splits pairtrees upstream, such as
 <b id="f1">1</b> However if your client splits pairtrees upstream, such as
 Hyrax does, that obviously needs to change to get rid of the path
 Hyrax does, that obviously needs to change to get rid of the path

+ 13 - 16
lakesuperior/store_layouts/ldp_rs/lmdb_store.py

@@ -170,29 +170,26 @@ class LmdbStore(Store):
     in the same environment due to complications in handling transaction
     in the same environment due to complications in handling transaction
     contexts.
     contexts.
 
 
-    There are 3 main data sets (preservation worthy data):
+    There are 4 main data sets (preservation worthy data):
 
 
-    - tk:t (triple key: pickled triple; unique keys)
-    - tk:c (Triple key: pickled context; multi-valued keys)
+    - t:st (term key: serialized term; 1:1)
+    - spo:c (joined S, P, O keys: context key; dupsort, dupfixed)
+    - c: (context keys only, values are the empty bytestring; 1:1)
     - pfx:ns (prefix: pickled namespace; unique)
     - pfx:ns (prefix: pickled namespace; unique)
 
 
-    And 8 indices to optimize lookup for all possible bound/unbound term
+    And 6 indices to optimize lookup for all possible bound/unbound term
     combination in a triple:
     combination in a triple:
 
 
-    - c:tk (pickled context URI: triple key)
-    - sk:tk (subject key: triple key)
-    - pk:tk (pred key: triple key)
-    - ok:tk (object key: triple key)
-    - spk:tk (subject + predicate key: triple key)
-    - sok:tk (subject + object key: triple key)
-    - pok:tk (predicate + object key: triple key)
+    - th:t (term hash: term key; 1:1)
+    - s:po (S key: joined P, O keys; dupsort, dupfixed)
+    - p:so (P key: joined S, O keys; dupsort, dupfixed)
+    - o:sp (O key: joined S, P keys; dupsort, dupfixed)
+    - c:spo (context → triple association; dupsort, dupfixed)
     - ns:pfx (pickled namespace: prefix; unique)
     - ns:pfx (pickled namespace: prefix; unique)
 
 
-    The above indices (except for ns:pfx) are all multi-valued and store
-    fixed-length hash values referring to triples for economy's sake.
-
-    The search keys for terms are hashed on lookup. @TODO If this is too slow,
-    we may want to index term hashes.
+    These two data sets are stored in separate environments, i.e. separate
+    files in the filesystem. The index could be recreated from the main data
+    set in case of a disaster.
     '''
     '''
     context_aware = True
     context_aware = True
     # This is a hassle to maintain for no apparent gain. If some use is devised
     # This is a hassle to maintain for no apparent gain. If some use is devised

+ 0 - 58
lakesuperior/store_layouts/ldp_rs/sparql_connector.py

@@ -1,58 +0,0 @@
-import logging
-
-from abc import ABCMeta
-from pprint import pformat
-
-from rdflib import Dataset
-from rdflib.term import URIRef
-from rdflib.plugins.stores.sparqlstore import SPARQLStore, SPARQLUpdateStore
-from SPARQLWrapper.Wrapper import POST
-
-from lakesuperior.dictionaries.namespaces import ns_collection as nsc
-from lakesuperior.store_layouts.ldp_rs.base_connector import BaseConnector
-
-
-class SparqlConnector(BaseConnector):
-    '''
-    Handles the connection and dataset information.
-
-    This is indpendent from the application context (production/test) and can
-    be passed any configuration options.
-    '''
-
-    # N.B. This is Fuseki-specific.
-    UNION_GRAPH_URI = URIRef('urn:x-arq:UnionGraph')
-
-    _logger = logging.getLogger(__name__)
-
-    def _init_connection(self, location, query_ep, update_ep=None,
-            autocommit=False):
-        '''
-        Initialize the connection to the SPARQL endpoint.
-
-        If `update_ep` is not specified, the store is initialized as read-only.
-        '''
-        if update_ep:
-            self.store = SPARQLUpdateStore(
-                    queryEndpoint=location + query_ep,
-                    update_endpoint=location + update_ep,
-                    autocommit=autocommit,
-                    dirty_reads=not autocommit)
-
-            self.readonly = False
-        else:
-            self.store = SPARQLStore(
-                    location + query_ep, default_query_method=POST)
-            self.readonly = True
-
-        self.ds = Dataset(self.store, default_union=True)
-
-
-    def optimize_edits(self):
-        opt_edits = [
-                l for l in self.store._edits
-                if not l.strip().startswith('PREFIX')]
-        #opt_edits = list(ns_pfx_sparql.values()) + opt_edits
-        self.store._edits = opt_edits
-        self._logger.debug('Changes to be committed: {}'.format(
-            pformat(self.store._edits)))