Browse Source

Merge pull request #34 from scossu/development

Development
Stefano Cossu 7 years ago
parent
commit
b6a77d1d0b

+ 5 - 1
README.md

@@ -2,7 +2,7 @@
 
 
 [![build status](
 [![build status](
   http://img.shields.io/travis/scossu/lakesuperior/master.svg?style=flat)](
   http://img.shields.io/travis/scossu/lakesuperior/master.svg?style=flat)](
- https://travis-ci.org/username/repo)
+ https://travis-ci.org/scossu/lakesuperior)
 
 
 LAKEsuperior is an alternative [Fedora Repository](http://fedorarepository.org)
 LAKEsuperior is an alternative [Fedora Repository](http://fedorarepository.org)
 implementation.
 implementation.
@@ -158,6 +158,10 @@ meant to live as a community project.
 
 
 [Content Model](doc/notes/model.md)
 [Content Model](doc/notes/model.md)
 
 
+[Messaging](doc/notes/messaging.md)
+
+[Migration, Backup & Restore](doc/notes/migration.md)
+
 [Command-Line Reference](doc/notes/cli.md)
 [Command-Line Reference](doc/notes/cli.md)
 
 
 [Storage Implementation](doc/notes/storage.md)
 [Storage Implementation](doc/notes/storage.md)

+ 2 - 0
data/bootstrap/rsrc_centric_layout.sparql

@@ -12,6 +12,8 @@ INSERT DATA {
     <info:fcres/> a
     <info:fcres/> a
       fcrepo:RepositoryRoot , fcrepo:Resource , fcrepo:Container ,
       fcrepo:RepositoryRoot , fcrepo:Resource , fcrepo:Container ,
       ldp:Container , ldp:BasicContainer , ldp:RDFSource ;
       ldp:Container , ldp:BasicContainer , ldp:RDFSource ;
+      fcrepo:created "$timestamp"^^xsd:dateTime ;
+      fcrepo:lastModified "$timestamp"^^xsd:dateTime ;
     .
     .
   }
   }
 
 

+ 0 - 2
doc/notes/architecture.md

@@ -1,7 +1,5 @@
 # LAKEsuperior Architecture
 # LAKEsuperior Architecture
 
 
-**DOCUMENTATION AND IMPLEMENTATION OF THIS SECTION ARE WORK-IN-PROGRESS!**
-
 LAKEsuperior is written in Python. It is not excluded that parts of the code
 LAKEsuperior is written in Python. It is not excluded that parts of the code
 may be rewritten in [Cython](http://cython.readthedocs.io/) for performance.
 may be rewritten in [Cython](http://cython.readthedocs.io/) for performance.
 
 

+ 13 - 4
doc/notes/fcrepo4_deltas.md

@@ -167,13 +167,22 @@ specs should not notice any difference.
 The following are improvements in performance or usability that can only be taken
 The following are improvements in performance or usability that can only be taken
 advantage of if client code is adjusted.
 advantage of if client code is adjusted.
 
 
-### LDP-NR metadata by content negotiation
+### LDP-NR content and metadata
 
 
 FCREPO4 relies on the `/fcr:metadata` identifier to retrieve RDF metadata about
 FCREPO4 relies on the `/fcr:metadata` identifier to retrieve RDF metadata about
 an LDP-NR. LAKEsuperior supports this as a legacy option, but encourages the
 an LDP-NR. LAKEsuperior supports this as a legacy option, but encourages the
-use of content negotiation to do the same. Any request to an LDP-NR with an
-`Accept` header set to one of the supported RDF serialization formats will
-yield the RDF metadata of the resource instead of the binary contents.
+use of content negotiation to do the same while offering explicit endpoints
+for RDF and non-RDF content retrieval.
+
+Any request to an LDP-NR with an `Accept` header set to one of the supported
+RDF serialization formats will yield the RDF metadata of the resource instead
+of the binary contents.
+
+The `fcr:metadata` URI returns the RDF metadata of a LDP-NR.
+
+The `fcr:content` URI returns the non-RDF content.
+
+The two optionsabove return an HTTP error if requested for a LDP-RS.
 
 
 ### "Include" and "Omit" options for children
 ### "Include" and "Omit" options for children
 
 

+ 27 - 0
doc/notes/messaging.md

@@ -0,0 +1,27 @@
+# LAKEsuperior Messaging
+
+LAKEsuperior implements a messaging system based on ActivityStreams, as
+indicated by the
+[Feodra API specs](https://fedora.info/2017/06/30/spec/#notifications).
+The metadata set provided is currently quite minimal but can be easily
+enriched by extending the
+[default formatter class](https://github.com/scossu/lakesuperior/blob/master/lakesuperior/messaging/messenger.py).
+
+STOMP is the only supported protocol at the moment. More protocols may be made
+available at a later time.
+
+LAKEsuperior can send messages to any number of destinations: see
+[configuration](https://github.com/scossu/lakesuperior/blob/master/etc.defaults/application.yml#L79).
+By default, CoilMQ is provided for testing purposes and listens to
+`localhost:61613`. The default route sends messages to `/topic/fcrepo`.
+
+A small command-line utility, also provided with the Python dependencies,
+allows to watch incoming messages. To monitor messages, enter the following
+*after activating your virtualenv*:
+
+```
+stomp -H localhost -P 61613 -L /topic/fcrepo
+```
+
+See the [stomp.py library reference page](https://github.com/jasonrbriggs/stomp.py/wiki/Command-Line-Access)
+for details.

+ 57 - 0
doc/notes/migration.md

@@ -0,0 +1,57 @@
+# Migration, Backup & Restore
+
+The LAKEsuperior dataset is by default fully contained in a folder. This means
+that only the data, configuration and code are needed for it to run.
+No Postgres, Redis, or such. These folders can be moved around as needed.
+
+## Migration Tool
+
+Migration is the process of importing and converting data from a different
+Fedora or LDP implementation into a new LAKEsuperior instance. This process
+uses the HTTP/LDP API of the original repository. A command-line utility is
+available as part of the `lsup-admin` suite to assist in such operation.
+
+A repository can be migrated with a one-line command such as:
+
+```
+./lsup-admin migrate http://source-repo.edu/rest /local/dest/folder
+```
+
+For more options, enter
+
+```
+./lsup-admin migrate --help
+```
+
+The script will crawl through the resources and crawl through outbound links
+within them. In order to do this, resources are added as raw triples (i.e.
+no consistency checks are made).
+
+**Note:** the consistency check tool has not yet been implemented at the moment
+but its release should follow shortly. This will ensure that all the links
+between resources are consistent in regard to referential integrity.
+
+This script will create a full dataset in the specified destination folder,
+complete with a default configuration that allows to start the LAKEsuperior
+server immediately after the migration is complete.
+
+Two approaches to migration are possible:
+
+1. By providing a starting point on the source repository. E.g. if the
+   repository you want to migrate is at `http://repo.edu/rest/prod` you can add
+   the `-s /prod` option to the script to avoid migrating irrelevant branches.
+   Note that the script will still reach outside of the starting point if
+   resources are referencing other resources outside of it.
+2. By providing a file containing a list of resources to migrate. This is
+   useful if a source repository cannot produce a full list (e.g. the root node
+   has more children than the server can handle) but a list of individual
+   resources is available via an external index (Solr, triplestore, etc.).
+   The resources can be indicated by their fully qualified URIs or paths
+   relative to the repository root. (*TODO latter option needs testing*)
+
+## Backup & Restore
+
+A back up of a LAKEshore repository consists in copying the RDF and non-RDF
+data folders. The location of these folders is indicated in the application
+configuration. The default commands provided by your OS (`cp`, `rsync`,
+`tar` etc. for Unix) are all is needed.

+ 1 - 1
fcrepo

@@ -2,4 +2,4 @@
 default_conf_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/etc.defaults"
 default_conf_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/etc.defaults"
 conf_dir=${FCREPO_CONFIG_DIR:-$default_conf_dir}
 conf_dir=${FCREPO_CONFIG_DIR:-$default_conf_dir}
 
 
-gunicorn -c "${conf_dir}/gunicorn.py" server:fcrepo
+gunicorn -c "${conf_dir}/gunicorn.py" server:fcrepo --preload

+ 42 - 6
lakesuperior/api/admin.py

@@ -1,27 +1,63 @@
 import logging
 import logging
 
 
+from lakesuperior.config_parser import parse_config
 from lakesuperior.env import env
 from lakesuperior.env import env
+from lakesuperior.globals import AppGlobals
+from lakesuperior.migrator import Migrator
+from lakesuperior.store.ldp_nr.default_layout import DefaultLayout as FileLayout
 from lakesuperior.store.ldp_rs.lmdb_store import TxnManager
 from lakesuperior.store.ldp_rs.lmdb_store import TxnManager
 
 
-__doc__ = '''
+__doc__ = """
 Admin API.
 Admin API.
 
 
 This module contains maintenance utilities and stats.
 This module contains maintenance utilities and stats.
-'''
+"""
 
 
 logger = logging.getLogger(__name__)
 logger = logging.getLogger(__name__)
-app_globals = env.app_globals
 
 
 
 
 def stats():
 def stats():
-    '''
+    """
     Get repository statistics.
     Get repository statistics.
 
 
-    @return dict Store statistics, resource statistics.
-    '''
+    :rtype: dict
+    :return: Store statistics, resource statistics.
+    """
+    import lakesuperior.env_setup
     repo_stats = {'rsrc_stats': env.app_globals.rdfly.count_rsrc()}
     repo_stats = {'rsrc_stats': env.app_globals.rdfly.count_rsrc()}
     with TxnManager(env.app_globals.rdf_store) as txn:
     with TxnManager(env.app_globals.rdf_store) as txn:
         repo_stats['store_stats'] = env.app_globals.rdf_store.stats()
         repo_stats['store_stats'] = env.app_globals.rdf_store.stats()
 
 
     return repo_stats
     return repo_stats
 
 
+
+def migrate(src, dest, start_pts=None, list_file=None, **kwargs):
+    """
+    Migrate an LDP repository to a new LAKEsuperior instance.
+
+    See :py:meth:`Migrator.__init__`.
+    """
+    if start_pts:
+        if not isinstance(
+                start_pts, list) and not isinstance(start_pts, tuple):
+            start_pts = (start_pts,)
+    elif not list_file:
+        start_pts = ('/',)
+
+    return Migrator(src, dest, **kwargs).migrate(start_pts, list_file)
+
+
+def integrity_check(config_dir=None):
+    """
+    Check integrity of the data set.
+
+    At the moment this is limited to referential integrity. Other checks can
+    be added and triggered by different argument flags.
+    """
+    if config_dir:
+        env.config = parse_config(config_dir)[0]
+        env.app_globals = AppGlobals(env.config)
+    else:
+        import lakesuperior.env_setup
+    with TxnManager(env.app_globals.rdfly.store):
+        return { t for t in env.app_globals.rdfly.find_refint_violations()}

+ 39 - 20
lakesuperior/api/resource.py

@@ -20,7 +20,6 @@ from lakesuperior.store.ldp_rs.lmdb_store import TxnManager
 
 
 
 
 logger = logging.getLogger(__name__)
 logger = logging.getLogger(__name__)
-app_globals = env.app_globals
 
 
 __doc__ = '''
 __doc__ = '''
 Primary API for resource manipulation.
 Primary API for resource manipulation.
@@ -75,12 +74,11 @@ def transaction(write=False):
             # update timestamps on resources.
             # update timestamps on resources.
             env.timestamp = arrow.utcnow()
             env.timestamp = arrow.utcnow()
             env.timestamp_term = Literal(env.timestamp, datatype=XSD.dateTime)
             env.timestamp_term = Literal(env.timestamp, datatype=XSD.dateTime)
-            with TxnManager(app_globals.rdf_store, write=write) as txn:
+            with TxnManager(env.app_globals.rdf_store, write=write) as txn:
                 ret = fn(*args, **kwargs)
                 ret = fn(*args, **kwargs)
-            if len(app_globals.changelog):
+            if len(env.app_globals.changelog):
                 job = Thread(target=process_queue)
                 job = Thread(target=process_queue)
                 job.start()
                 job.start()
-            logger.debug('Deleting timestamp: {}'.format(getattr(env, 'timestamp')))
             delattr(env, 'timestamp')
             delattr(env, 'timestamp')
             delattr(env, 'timestamp_term')
             delattr(env, 'timestamp_term')
             return ret
             return ret
@@ -94,8 +92,8 @@ def process_queue():
     '''
     '''
     lock = Lock()
     lock = Lock()
     lock.acquire()
     lock.acquire()
-    while len(app_globals.changelog):
-        send_event_msg(*app_globals.changelog.popleft())
+    while len(env.app_globals.changelog):
+        send_event_msg(*env.app_globals.changelog.popleft())
     lock.release()
     lock.release()
 
 
 
 
@@ -118,11 +116,35 @@ def send_event_msg(remove_trp, add_trp, metadata):
     subjects = set(remove_dict.keys()) | set(add_dict.keys())
     subjects = set(remove_dict.keys()) | set(add_dict.keys())
     for rsrc_uri in subjects:
     for rsrc_uri in subjects:
         logger.debug('Processing event for subject: {}'.format(rsrc_uri))
         logger.debug('Processing event for subject: {}'.format(rsrc_uri))
-        app_globals.messenger.send(rsrc_uri, **metadata)
+        env.app_globals.messenger.send(rsrc_uri, **metadata)
 
 
 
 
 ### API METHODS ###
 ### API METHODS ###
 
 
+@transaction()
+def exists(uid):
+    '''
+    Return whether a resource exists (is stored) in the repository.
+
+    @param uid (string) Resource UID.
+    '''
+    try:
+        exists = LdpFactory.from_stored(uid).is_stored
+    except ResourceNotExistsError:
+        exists = False
+    return exists
+
+
+@transaction()
+def get_metadata(uid):
+    '''
+    Get metadata (admin triples) of an LDPR resource.
+
+    @param uid (string) Resource UID.
+    '''
+    return LdpFactory.from_stored(uid).metadata
+
+
 @transaction()
 @transaction()
 def get(uid, repr_options={}):
 def get(uid, repr_options={}):
     '''
     '''
@@ -229,13 +251,10 @@ def update(uid, update_str, is_metadata=False):
     If False, and the resource being updated is a LDP-NR, an error is raised.
     If False, and the resource being updated is a LDP-NR, an error is raised.
     '''
     '''
     rsrc = LdpFactory.from_stored(uid)
     rsrc = LdpFactory.from_stored(uid)
-    if LDP_NR_TYPE in rsrc.ldp_types:
-        if is_metadata:
-            rsrc.patch_metadata(update_str)
-        else:
-            raise InvalidResourceError(uid)
-    else:
-        rsrc.patch(update_str)
+    if LDP_NR_TYPE in rsrc.ldp_types and not is_metadata:
+        raise InvalidResourceError(uid)
+
+    rsrc.sparql_update(update_str)
 
 
     return rsrc
     return rsrc
 
 
@@ -266,11 +285,11 @@ def delete(uid, soft=True):
     '''
     '''
     # If referential integrity is enforced, grab all inbound relationships
     # If referential integrity is enforced, grab all inbound relationships
     # to break them.
     # to break them.
-    refint = app_globals.rdfly.config['referential_integrity']
+    refint = env.app_globals.rdfly.config['referential_integrity']
     inbound = True if refint else inbound
     inbound = True if refint else inbound
     repr_opts = {'incl_inbound' : True} if refint else {}
     repr_opts = {'incl_inbound' : True} if refint else {}
 
 
-    children = app_globals.rdfly.get_descendants(uid)
+    children = env.app_globals.rdfly.get_descendants(uid)
 
 
     if soft:
     if soft:
         rsrc = LdpFactory.from_stored(uid, repr_opts)
         rsrc = LdpFactory.from_stored(uid, repr_opts)
@@ -279,16 +298,16 @@ def delete(uid, soft=True):
         for child_uri in children:
         for child_uri in children:
             try:
             try:
                 child_rsrc = LdpFactory.from_stored(
                 child_rsrc = LdpFactory.from_stored(
-                    app_globals.rdfly.uri_to_uid(child_uri),
+                    env.app_globals.rdfly.uri_to_uid(child_uri),
                     repr_opts={'incl_children' : False})
                     repr_opts={'incl_children' : False})
             except (TombstoneError, ResourceNotExistsError):
             except (TombstoneError, ResourceNotExistsError):
                 continue
                 continue
             child_rsrc.bury_rsrc(inbound, tstone_pointer=rsrc.uri)
             child_rsrc.bury_rsrc(inbound, tstone_pointer=rsrc.uri)
     else:
     else:
-        ret = app_globals.rdfly.forget_rsrc(uid, inbound)
+        ret = env.app_globals.rdfly.forget_rsrc(uid, inbound)
         for child_uri in children:
         for child_uri in children:
-            child_uid = app_globals.rdfly.uri_to_uid(child_uri)
-            ret = app_globals.rdfly.forget_rsrc(child_uid, inbound)
+            child_uid = env.app_globals.rdfly.uri_to_uid(child_uri)
+            ret = env.app_globals.rdfly.forget_rsrc(child_uid, inbound)
 
 
     return ret
     return ret
 
 

+ 69 - 46
lakesuperior/config_parser.py

@@ -5,49 +5,72 @@ from os import path, environ
 import hiyapyco
 import hiyapyco
 import yaml
 import yaml
 
 
-configs = (
-    'application',
-    'logging',
-    'namespaces',
-    'flask',
-)
-
-# This will hold a dict of all configuration values.
-config = {}
-
-# Parse configuration
-CONFIG_DIR = environ.get(
-        'FCREPO_CONFIG_DIR',
-        path.dirname(path.dirname(path.abspath(__file__))) + '/etc.defaults')
-
-print('Reading configuration at {}'.format(CONFIG_DIR))
-
-for cname in configs:
-    file = '{}/{}.yml'.format(CONFIG_DIR , cname)
-    with open(file, 'r') as stream:
-        config[cname] = yaml.load(stream, yaml.SafeLoader)
-
-# Merge default and test configurations.
-error_msg = '''
-**************
-** WARNING! **
-**************
-
-Your test {} store location is set to be the same as the production location.
-This means that if you run a test suite, your live data may be wiped clean!
-
-Please review your configuration before starting.
-'''
-
-test_config = {'application': hiyapyco.load(CONFIG_DIR + '/application.yml',
-        CONFIG_DIR + '/test.yml', method=hiyapyco.METHOD_MERGE)}
-
-if config['application']['store']['ldp_rs']['location'] \
-        == test_config['application']['store']['ldp_rs']['location']:
-            raise RuntimeError(error_msg.format('RDF'))
-            sys.exit()
-
-if config['application']['store']['ldp_nr']['path'] \
-        == test_config['application']['store']['ldp_nr']['path']:
-            raise RuntimeError(error_msg.format('binary'))
-            sys.exit()
+
+def parse_config(config_dir=None):
+    """
+    Parse configuration from a directory.
+
+    This is normally called by the standard endpoints (``lsup_admin``, web
+    server, etc.) or by a Python client by importing
+    :py:mod:`lakesuperior.env_setup` but an application using a non-default
+    configuration may specify an alternative configuration directory.
+
+    The directory must have the same structure as the one provided in
+    ``etc.defaults``.
+
+    :param config_dir: Location on the filesystem of the configuration
+    directory. The default is set by the ``FCREPO_CONFIG_DIR`` environment
+    variable or, if this is not set, the ``etc.defaults`` stock directory.
+    """
+    configs = (
+        'application',
+        'logging',
+        'namespaces',
+        'flask',
+    )
+
+    if not config_dir:
+        config_dir = environ.get('FCREPO_CONFIG_DIR', path.dirname(
+                path.dirname(path.abspath(__file__))) + '/etc.defaults')
+
+    # This will hold a dict of all configuration values.
+    _config = {}
+
+    print('Reading configuration at {}'.format(config_dir))
+
+    for cname in configs:
+        file = '{}/{}.yml'.format(config_dir , cname)
+        with open(file, 'r') as stream:
+            _config[cname] = yaml.load(stream, yaml.SafeLoader)
+
+    error_msg = '''
+    **************
+    ** WARNING! **
+    **************
+
+    Your test {} store location is set to be the same as the production
+    location. This means that if you run a test suite, your live data may be
+    wiped clean!
+
+    Please review your configuration before starting.
+    '''
+
+    # Merge default and test configurations.
+    _test_config = {'application': hiyapyco.load(
+            config_dir + '/application.yml',
+            config_dir + '/test.yml', method=hiyapyco.METHOD_MERGE)}
+
+    if _config['application']['store']['ldp_rs']['location'] \
+            == _test_config['application']['store']['ldp_rs']['location']:
+                raise RuntimeError(error_msg.format('RDF'))
+                sys.exit()
+
+    if _config['application']['store']['ldp_nr']['path'] \
+            == _test_config['application']['store']['ldp_nr']['path']:
+                raise RuntimeError(error_msg.format('binary'))
+                sys.exit()
+    return _config, _test_config
+
+
+# Load default configuration.
+config, test_config = parse_config()

+ 16 - 17
lakesuperior/endpoints/ldp.py

@@ -107,9 +107,11 @@ def log_request_end(rsp):
 
 
 @ldp.route('/<path:uid>', methods=['GET'], strict_slashes=False)
 @ldp.route('/<path:uid>', methods=['GET'], strict_slashes=False)
 @ldp.route('/', defaults={'uid': '/'}, methods=['GET'], strict_slashes=False)
 @ldp.route('/', defaults={'uid': '/'}, methods=['GET'], strict_slashes=False)
-@ldp.route('/<path:uid>/fcr:metadata', defaults={'force_rdf' : True},
+@ldp.route('/<path:uid>/fcr:metadata', defaults={'out_fmt' : 'rdf'},
+        methods=['GET'])
+@ldp.route('/<path:uid>/fcr:content', defaults={'out_fmt' : 'non_rdf'},
         methods=['GET'])
         methods=['GET'])
-def get_resource(uid, force_rdf=False):
+def get_resource(uid, out_fmt=None):
     '''
     '''
     https://www.w3.org/TR/ldp/#ldpr-HTTP_GET
     https://www.w3.org/TR/ldp/#ldpr-HTTP_GET
 
 
@@ -117,9 +119,9 @@ def get_resource(uid, force_rdf=False):
 
 
     @param uid (string) UID of resource to retrieve. The repository root has
     @param uid (string) UID of resource to retrieve. The repository root has
     an empty string for UID.
     an empty string for UID.
-    @param force_rdf (boolean) Whether to retrieve RDF even if the resource is
+    @param out_fmt (string) Force output to RDF or non-RDF if the resource is
     a LDP-NR. This is not available in the API but is used e.g. by the
     a LDP-NR. This is not available in the API but is used e.g. by the
-    `*/fcr:metadata` endpoint. The default is False.
+    `*/fcr:metadata` and `*/fcr:content` endpoints. The default is False.
     '''
     '''
     logger.info('UID: {}'.format(uid))
     logger.info('UID: {}'.format(uid))
     out_headers = std_headers
     out_headers = std_headers
@@ -137,17 +139,22 @@ def get_resource(uid, force_rdf=False):
     except TombstoneError as e:
     except TombstoneError as e:
         return _tombstone_response(e, uid)
         return _tombstone_response(e, uid)
     else:
     else:
+        if out_fmt is None:
+            out_fmt = (
+                    'rdf'
+                    if isinstance(rsrc, LdpRs) or is_accept_hdr_rdf_parsable()
+                    else 'non_rdf')
         out_headers.update(_headers_from_metadata(rsrc))
         out_headers.update(_headers_from_metadata(rsrc))
         uri = g.tbox.uid_to_uri(uid)
         uri = g.tbox.uid_to_uri(uid)
-        if (
-                isinstance(rsrc, LdpRs)
-                or is_accept_hdr_rdf_parsable()
-                or force_rdf):
+        if out_fmt == 'rdf':
             ggr = g.tbox.globalize_graph(rsrc.out_graph)
             ggr = g.tbox.globalize_graph(rsrc.out_graph)
             ggr.namespace_manager = nsm
             ggr.namespace_manager = nsm
             return _negotiate_content(ggr, out_headers, uid=uid, uri=uri)
             return _negotiate_content(ggr, out_headers, uid=uid, uri=uri)
         else:
         else:
-            logger.info('Streaming out binary content.')
+            if not getattr(rsrc, 'local_path', False):
+                return ('{} has no binary content.'.format(rsrc.uid), 404)
+
+            logger.debug('Streaming out binary content.')
             rsp = make_response(send_file(
             rsp = make_response(send_file(
                     rsrc.local_path, as_attachment=True,
                     rsrc.local_path, as_attachment=True,
                     attachment_filename=rsrc.filename,
                     attachment_filename=rsrc.filename,
@@ -265,7 +272,6 @@ def put_resource(uid):
     rsp_headers = {'Content-Type' : 'text/plain; charset=utf-8'}
     rsp_headers = {'Content-Type' : 'text/plain; charset=utf-8'}
 
 
     handling, disposition = set_post_put_params()
     handling, disposition = set_post_put_params()
-    #import pdb; pdb.set_trace()
     stream, mimetype = _bistream_from_req()
     stream, mimetype = _bistream_from_req()
 
 
     if LdpFactory.is_rdf_parsable(mimetype):
     if LdpFactory.is_rdf_parsable(mimetype):
@@ -494,13 +500,6 @@ def _bistream_from_req():
     return stream, mimetype
     return stream, mimetype
 
 
 
 
-def _get_bitstream(rsrc):
-    # @TODO This may change in favor of more low-level handling if the file
-    # system is not local.
-    return send_file(rsrc.local_path, as_attachment=True,
-            attachment_filename=rsrc.filename)
-
-
 def _tombstone_response(e, uid):
 def _tombstone_response(e, uid):
     headers = {
     headers = {
         'Link': '<{}/fcr:tombstone>; rel="hasTombstone"'.format(request.url),
         'Link': '<{}/fcr:tombstone>; rel="hasTombstone"'.format(request.url),

+ 11 - 6
lakesuperior/endpoints/templates/resource.html

@@ -23,15 +23,20 @@
 </nav>
 </nav>
 {% endblock %}
 {% endblock %}
 {% block content %}
 {% block content %}
+{% if gr[gr.identifier : nsc['rdf'].type : nsc['ldp'].NonRDFSource] %}
+<div class="pull-right">
+    <a href="{{ gr.identifier }}/fcr:content" class="btn btn-success btn-lg">
+        <span class="glyphicon glyphicon-download" aria-hidden="true"></span>
+        Download Content</a>
+</div>
+{% endif %}
 {% set created_ts = arrow.get(
 {% set created_ts = arrow.get(
-    gr.value(gr.identifier, nsc['fcrepo'].created)).replace(
-    tzinfo='local') %}
+    gr.value(gr.identifier, nsc['fcrepo'].created)).to('local') %}
 {% set updated_ts = arrow.get(
 {% set updated_ts = arrow.get(
-    gr.value(gr.identifier, nsc['fcrepo'].lastModified)).replace(
-    tzinfo='local') %}
-<p><strong>Created on:</strong>&nbsp;{{ created_ts }}&nbsp;
+    gr.value(gr.identifier, nsc['fcrepo'].lastModified)).to('local') %}
+<p><strong>Created on:</strong>&nbsp;{{ created_ts.format('YYYY-MM-DD HH:mm:ss ZZ') }}&nbsp;
 ({{created_ts.humanize() }})</p>
 ({{created_ts.humanize() }})</p>
-<p><strong>Last updated on:</strong>&nbsp;{{ updated_ts }}&nbsp;
+<p><strong>Last updated on:</strong>&nbsp;{{ updated_ts.format('YYYY-MM-DD HH:mm:ss ZZ') }}&nbsp;
 ({{updated_ts.humanize() }})</p>
 ({{updated_ts.humanize() }})</p>
 <p><strong>Types:</strong>
 <p><strong>Types:</strong>
 {% for t in gr[gr.identifier : nsc['rdf'].type :] | sort %}
 {% for t in gr[gr.identifier : nsc['rdf'].type :] | sort %}

+ 277 - 0
lakesuperior/migrator.py

@@ -0,0 +1,277 @@
+import logging
+import shutil
+
+from contextlib import ContextDecorator
+from os import makedirs, path
+from urllib.parse import urldefrag
+
+import requests
+import yaml
+
+from rdflib import Graph, URIRef
+
+from lakesuperior.dictionaries.namespaces import ns_collection as nsc
+from lakesuperior.env import env
+from lakesuperior.globals import AppGlobals, ROOT_UID
+from lakesuperior.config_parser import parse_config
+from lakesuperior.store.ldp_rs.lmdb_store import TxnManager
+
+
+logger = logging.getLogger(__name__)
+
+
+class StoreWrapper(ContextDecorator):
+    """
+    Open and close a store.
+    """
+    def __init__(self, store):
+        self.store = store
+
+    def __enter__(self):
+        self.store.open(
+                env.config['application']['store']['ldp_rs'])
+
+    def __exit__(self, *exc):
+        self.store.close()
+
+
+class Migrator:
+    """
+    Class to handle a database migration.
+
+    This class holds state of progress and shared variables as it crawls
+    through linked resources in an LDP server.
+
+    Since a repository migration can be a very long operation but it is
+    impossible to know the number of the resources to gather by LDP interaction
+    alone, a progress ticker outputs the number of processed resources at
+    regular intervals.
+    """
+
+    """
+    LMDB database parameters.
+
+    See :meth:`lmdb.Environment.__init__`
+    """
+    db_params = {
+        'map_size': 1024 ** 4,
+        'metasync': False,
+        'readahead': False,
+        'meminit': False,
+    }
+
+    """List of predicates to ignore when looking for links."""
+    ignored_preds = (
+        nsc['fcrepo'].hasParent,
+        nsc['fcrepo'].hasTransactionProvider,
+        nsc['fcrepo'].hasFixityService,
+    )
+
+
+    def __init__(
+            self, src, dest, zero_binaries=False, compact_uris=False,
+            skip_errors=False):
+        """
+        Set up base paths and clean up existing directories.
+
+        :param rdflib.URIRef src: Webroot of source repository. This must
+        correspond to the LDP root node (for Fedora it can be e.g.
+        ``http://localhost:8080fcrepo/rest/``) and is used to determine if URIs
+        retrieved are managed by this repository.
+        :param str dest: Destination repository path. If the location exists
+        it must be a writable directory. It will be deleted and recreated. If
+        it does not exist, it will be created along with its parents if
+        missing.
+        :param str binary_handling: One of ``include``, ``truncate`` or
+        ``split``.
+        :param bool compact_uris: NOT IMPLEMENTED. Whether the process should
+        attempt to compact URIs generated with broken up path segments. If the
+        UID matches a pattern such as `/12/34/56/123456...` it is converted to
+        `/123456...`. This would remove a lot of cruft caused by the pairtree
+        segments. Note that this will change the publicly exposed URIs. If
+        durability is a concern, a rewrite directive can be added to the HTTP
+        server that proxies the WSGI endpoint.
+        """
+        # Set up repo folder structure and copy default configuration to
+        # destination file.
+        cur_dir = path.dirname(path.dirname(path.abspath(__file__)))
+        self.dbpath = '{}/data/ldprs_store'.format(dest)
+        self.fpath = '{}/data/ldpnr_store'.format(dest)
+        self.config_dir = '{}/etc'.format(dest)
+
+        shutil.rmtree(dest, ignore_errors=True)
+        shutil.copytree(
+                '{}/etc.defaults'.format(cur_dir), self.config_dir)
+
+        # Modify and overwrite destination configuration.
+        orig_config, _ = parse_config(self.config_dir)
+        orig_config['application']['store']['ldp_rs']['location'] = self.dbpath
+        orig_config['application']['store']['ldp_nr']['path'] = self.fpath
+
+        with open('{}/application.yml'.format(self.config_dir), 'w') \
+                as config_file:
+            config_file.write(yaml.dump(orig_config['application']))
+
+        env.config = parse_config(self.config_dir)[0]
+        env.app_globals = AppGlobals(env.config)
+
+        self.rdfly = env.app_globals.rdfly
+        self.nonrdfly = env.app_globals.nonrdfly
+
+        with TxnManager(env.app_globals.rdf_store, write=True) as txn:
+            self.rdfly.bootstrap()
+            self.rdfly.store.close()
+        env.app_globals.nonrdfly.bootstrap()
+
+        self.src = src.rstrip('/')
+        self.zero_binaries = zero_binaries
+        self.skip_errors = skip_errors
+
+
+
+    def migrate(self, start_pts=None, list_file=None):
+        """
+        Migrate the database.
+
+        This method creates a fully functional and configured LAKEsuperior
+        data set contained in a folder from an LDP repository.
+
+        :param start_pts: List of starting points to retrieve
+        :type start_pts: tuple or list 
+        resources from. It would typically be the repository root in case of a
+        full dump or one or more resources in the repository for a partial one.
+        :param str listf_ile: path to a local file containing a list of URIs,
+        one per line.
+        """
+        from lakesuperior.api import resource as rsrc_api
+        self._ct = 0
+        with StoreWrapper(self.rdfly.store):
+            if start_pts:
+                for start in start_pts:
+                    if not start.startswith('/'):
+                        raise ValueError(
+                            'Starting point {} does not begin with a slash.'
+                            .format(start))
+
+                    if start != ROOT_UID:
+                        # Create the full hierarchy with link to the parents.
+                        rsrc_api.create_or_replace(start)
+                    # Then populate the new resource and crawl for more
+                    # relationships.
+                    self._crawl(start)
+            elif list_file:
+                with open(list_file, 'r') as fp:
+                    for uri in fp:
+                        uid = uri.strip().replace(self.src, '')
+                        if uid != ROOT_UID:
+                            rsrc_api.create_or_replace(uid)
+                        self._crawl(uid)
+        logger.info('Dumped {} resources.'.format(self._ct))
+
+        return self._ct
+
+
+    def _crawl(self, uid):
+        """
+        Get the contents of a resource and its relationships recursively.
+
+        This method recurses into itself each time a reference to a resource
+        managed by the repository is encountered.
+
+        :param str uid: The path relative to the source server webroot
+        pointing to the resource to crawl, effectively the resource UID.
+        """
+        ibase = str(nsc['fcres'])
+        # Public URI of source repo.
+        uri = self.src + uid
+        # Internal URI of destination.
+        iuri = ibase + uid
+
+        rsp = requests.head(uri)
+        if not self.skip_errors:
+            rsp.raise_for_status()
+        elif rsp.status_code > 399:
+            print('Error retrieving resource {} headers: {} {}'.format(
+                uri, rsp.status_code, rsp.text))
+
+        # Determine LDP type.
+        ldp_type = 'ldp_nr'
+        try:
+            for link in requests.utils.parse_header_links(
+                    rsp.headers.get('link')):
+                if (
+                        link.get('rel') == 'type'
+                        and (
+                            link.get('url') == str(nsc['ldp'].RDFSource)
+                            or link.get('url') == str(nsc['ldp'].Container))
+                ):
+                    # Resource is an LDP-RS.
+                    ldp_type = 'ldp_rs'
+                    break
+        except TypeError:
+            ldp_type = 'ldp_rs'
+            #raise ValueError('URI {} is not an LDP resource.'.format(uri))
+
+        # Get the whole RDF document now because we have to know all outbound
+        # links.
+        get_uri = (
+                uri if ldp_type == 'ldp_rs' else '{}/fcr:metadata'.format(uri))
+        get_rsp = requests.get(get_uri)
+        if not self.skip_errors:
+            get_rsp.raise_for_status()
+        elif get_rsp.status_code > 399:
+            print('Error retrieving resource {} body: {} {}'.format(
+                uri, get_rsp.status_code, get_rsp.text))
+
+        data = get_rsp.content.replace(
+                self.src.encode('utf-8'), ibase.encode('utf-8'))
+        gr = Graph(identifier=iuri).parse(data=data, format='turtle')
+
+        # Store raw graph data. No checks.
+        with TxnManager(self.rdfly.store, True):
+            self.rdfly.modify_rsrc(uid, add_trp=set(gr))
+
+        # Grab binary and set new resource parameters.
+        if ldp_type == 'ldp_nr':
+            provided_imr = gr.resource(URIRef(iuri))
+            if self.zero_binaries:
+                data = b''
+            else:
+                bin_rsp = requests.get(uri)
+                if not self.skip_errors:
+                    bin_rsp.raise_for_status()
+                elif bin_rsp.status_code > 399:
+                    print('Error retrieving resource {} body: {} {}'.format(
+                        uri, bin_rsp.status_code, bin_rsp.text))
+                data = bin_rsp.content
+            #import pdb; pdb.set_trace()
+            uuid = str(gr.value(
+                URIRef(iuri), nsc['premis'].hasMessageDigest)).split(':')[-1]
+            fpath = self.nonrdfly.local_path(
+                    self.nonrdfly.config['path'], uuid)
+            makedirs(path.dirname(fpath), exist_ok=True)
+            with open(fpath, 'wb') as fh:
+                fh.write(data)
+
+        self._ct += 1
+        if self._ct % 10 == 0:
+            print('{} resources processed so far.'.format(self._ct))
+
+        # Now, crawl through outbound links.
+        # LDP-NR fcr:metadata must be checked too.
+        for pred, obj in gr.predicate_objects():
+            #import pdb; pdb.set_trace()
+            obj_uid = obj.replace(ibase, '')
+            with TxnManager(self.rdfly.store, True):
+                conditions = bool(
+                    isinstance(obj, URIRef)
+                    and obj.startswith(iuri)
+                    # Avoid ∞ loop with fragment URIs.
+                    and str(urldefrag(obj).url) != str(iuri)
+                    # Avoid ∞ loop with circular references.
+                    and not self.rdfly.ask_rsrc_exists(obj_uid)
+                    and pred not in self.ignored_preds
+                )
+            if conditions:
+                print('Object {} will be crawled.'.format(obj_uid))
+                self._crawl(urldefrag(obj_uid).url)

+ 7 - 4
lakesuperior/model/ldp_factory.py

@@ -78,7 +78,8 @@ class LdpFactory:
 
 
 
 
     @staticmethod
     @staticmethod
-    def from_provided(uid, mimetype, stream=None, **kwargs):
+    def from_provided(
+            uid, mimetype=None, stream=None, provided_imr=None, **kwargs):
         '''
         '''
         Determine LDP type from request content.
         Determine LDP type from request content.
 
 
@@ -87,15 +88,15 @@ class LdpFactory:
         @param stream (IOStream | None) The provided data stream. This can be
         @param stream (IOStream | None) The provided data stream. This can be
         RDF or non-RDF content, or None. In the latter case, an empty container
         RDF or non-RDF content, or None. In the latter case, an empty container
         is created.
         is created.
+        @param **kwargs Arguments passed to the LDP class constructor.
         '''
         '''
         uri = nsc['fcres'][uid]
         uri = nsc['fcres'][uid]
 
 
-        if not stream:
+        if not stream and not mimetype:
             # Create empty LDPC.
             # Create empty LDPC.
             logger.info('No data received in request. '
             logger.info('No data received in request. '
                     'Creating empty container.')
                     'Creating empty container.')
             inst = Ldpc(uid, provided_imr=Resource(Graph(), uri), **kwargs)
             inst = Ldpc(uid, provided_imr=Resource(Graph(), uri), **kwargs)
-
         elif __class__.is_rdf_parsable(mimetype):
         elif __class__.is_rdf_parsable(mimetype):
             # Create container and populate it with provided RDF data.
             # Create container and populate it with provided RDF data.
             input_rdf = stream.read()
             input_rdf = stream.read()
@@ -125,7 +126,9 @@ class LdpFactory:
 
 
         else:
         else:
             # Create a LDP-NR and equip it with the binary file provided.
             # Create a LDP-NR and equip it with the binary file provided.
-            provided_imr = Resource(Graph(), uri)
+            # The IMR can also be provided for additional metadata.
+            if not provided_imr:
+                provided_imr = Resource(Graph(), uri)
             inst = LdpNr(uid, stream=stream, mimetype=mimetype,
             inst = LdpNr(uid, stream=stream, mimetype=mimetype,
                     provided_imr=provided_imr, **kwargs)
                     provided_imr=provided_imr, **kwargs)
 
 

+ 2 - 12
lakesuperior/model/ldp_nr.py

@@ -63,7 +63,8 @@ class LdpNr(Ldpr):
     def local_path(self):
     def local_path(self):
         cksum_term = self.imr.value(nsc['premis'].hasMessageDigest)
         cksum_term = self.imr.value(nsc['premis'].hasMessageDigest)
         cksum = str(cksum_term.identifier.replace('urn:sha1:',''))
         cksum = str(cksum_term.identifier.replace('urn:sha1:',''))
-        return nonrdfly.local_path(cksum)
+        return nonrdfly.__class__.local_path(
+                nonrdfly.root, cksum, nonrdfly.bl, nonrdfly.bc)
 
 
 
 
     def create_or_replace(self, create_only=False):
     def create_or_replace(self, create_only=False):
@@ -87,17 +88,6 @@ class LdpNr(Ldpr):
             return ev_type
             return ev_type
 
 
 
 
-    def patch_metadata(self, update_str):
-        '''
-        Update resource metadata by applying a SPARQL-UPDATE query.
-
-        @param update_str (string) SPARQL-Update staements.
-        '''
-        self.handling = 'lenient' # FCREPO does that and Hyrax requires it.
-
-        return self._sparql_update(update_str)
-
-
     ## PROTECTED METHODS ##
     ## PROTECTED METHODS ##
 
 
     def _add_srv_mgd_triples(self, create=False):
     def _add_srv_mgd_triples(self, create=False):

+ 133 - 164
lakesuperior/model/ldpr.py

@@ -1,5 +1,4 @@
 import logging
 import logging
-import pdb
 
 
 from abc import ABCMeta
 from abc import ABCMeta
 from collections import defaultdict
 from collections import defaultdict
@@ -7,12 +6,9 @@ from uuid import uuid4
 
 
 import arrow
 import arrow
 
 
-from flask import current_app
 from rdflib import Graph, URIRef, Literal
 from rdflib import Graph, URIRef, Literal
 from rdflib.resource import Resource
 from rdflib.resource import Resource
 from rdflib.namespace import RDF
 from rdflib.namespace import RDF
-from rdflib.plugins.sparql.algebra import translateUpdate
-from rdflib.plugins.sparql.parser import parseUpdate
 
 
 from lakesuperior.env import env
 from lakesuperior.env import env
 from lakesuperior.globals import (
 from lakesuperior.globals import (
@@ -32,7 +28,7 @@ logger = logging.getLogger(__name__)
 
 
 
 
 class Ldpr(metaclass=ABCMeta):
 class Ldpr(metaclass=ABCMeta):
-    '''LDPR (LDP Resource).
+    """LDPR (LDP Resource).
 
 
     Definition: https://www.w3.org/TR/ldp/#ldpr-resource
     Definition: https://www.w3.org/TR/ldp/#ldpr-resource
 
 
@@ -55,7 +51,7 @@ class Ldpr(metaclass=ABCMeta):
 
 
     The data passed to the store layout for processing should be in a graph.
     The data passed to the store layout for processing should be in a graph.
     All conversion from request payload strings is done here.
     All conversion from request payload strings is done here.
-    '''
+    """
 
 
     EMBED_CHILD_RES_URI = nsc['fcrepo'].EmbedResources
     EMBED_CHILD_RES_URI = nsc['fcrepo'].EmbedResources
     FCREPO_PTREE_TYPE = nsc['fcrepo'].Pairtree
     FCREPO_PTREE_TYPE = nsc['fcrepo'].Pairtree
@@ -97,21 +93,31 @@ class Ldpr(metaclass=ABCMeta):
     }
     }
 
 
 
 
+    # Predicates to remove when a resource is replaced.
+    delete_preds_on_replace = {
+        nsc['ebucore'].hasMimeType,
+        nsc['fcrepo'].lastModified,
+        nsc['fcrepo'].lastModifiedBy,
+        nsc['premis'].hasSize,
+        nsc['premis'].hasMessageDigest,
+    }
+
+
     ## MAGIC METHODS ##
     ## MAGIC METHODS ##
 
 
     def __init__(self, uid, repr_opts={}, provided_imr=None, **kwargs):
     def __init__(self, uid, repr_opts={}, provided_imr=None, **kwargs):
-        '''Instantiate an in-memory LDP resource that can be loaded from and
+        """Instantiate an in-memory LDP resource that can be loaded from and
         persisted to storage.
         persisted to storage.
 
 
-        @param uid (string) uid of the resource. If None (must be explicitly
+        :param str uid: uid of the resource. If None (must be explicitly
         set) it refers to the root node. It can also be the full URI or URN,
         set) it refers to the root node. It can also be the full URI or URN,
         in which case it will be converted.
         in which case it will be converted.
-        @param repr_opts (dict) Options used to retrieve the IMR. See
+        :param dict repr_opts: Options used to retrieve the IMR. See
         `parse_rfc7240` for format details.
         `parse_rfc7240` for format details.
-        @Param provd_rdf (string) RDF data provided by the client in
+        :param str provd_rdf: RDF data provided by the client in
         operations such as `PUT` or `POST`, serialized as a string. This sets
         operations such as `PUT` or `POST`, serialized as a string. This sets
         the `provided_imr` property.
         the `provided_imr` property.
-        '''
+        """
         self.uid = (
         self.uid = (
             rdfly.uri_to_uid(uid) if isinstance(uid, URIRef) else uid)
             rdfly.uri_to_uid(uid) if isinstance(uid, URIRef) else uid)
         self.uri = nsc['fcres'][uid]
         self.uri = nsc['fcres'][uid]
@@ -121,15 +127,17 @@ class Ldpr(metaclass=ABCMeta):
 
 
         self.provided_imr = provided_imr
         self.provided_imr = provided_imr
 
 
+        # Disable all internal checks e.g. for raw I/O.
+
 
 
     @property
     @property
     def rsrc(self):
     def rsrc(self):
-        '''
+        """
         The RDFLib resource representing this LDPR. This is a live
         The RDFLib resource representing this LDPR. This is a live
         representation of the stored data if present.
         representation of the stored data if present.
 
 
         @return rdflib.resource.Resource
         @return rdflib.resource.Resource
-        '''
+        """
         if not hasattr(self, '_rsrc'):
         if not hasattr(self, '_rsrc'):
             self._rsrc = rdfly.ds.resource(self.uri)
             self._rsrc = rdfly.ds.resource(self.uri)
 
 
@@ -138,14 +146,14 @@ class Ldpr(metaclass=ABCMeta):
 
 
     @property
     @property
     def imr(self):
     def imr(self):
-        '''
+        """
         Extract an in-memory resource from the graph store.
         Extract an in-memory resource from the graph store.
 
 
         If the resource is not stored (yet), a `ResourceNotExistsError` is
         If the resource is not stored (yet), a `ResourceNotExistsError` is
         raised.
         raised.
 
 
         @return rdflib.resource.Resource
         @return rdflib.resource.Resource
-        '''
+        """
         if not hasattr(self, '_imr'):
         if not hasattr(self, '_imr'):
             if hasattr(self, '_imr_options'):
             if hasattr(self, '_imr_options'):
                 logger.debug(
                 logger.debug(
@@ -163,12 +171,12 @@ class Ldpr(metaclass=ABCMeta):
 
 
     @imr.setter
     @imr.setter
     def imr(self, v):
     def imr(self, v):
-        '''
+        """
         Replace in-memory buffered resource.
         Replace in-memory buffered resource.
 
 
-        @param v (set | rdflib.Graph) New set of triples to populate the IMR
-        with.
-        '''
+        :param v: New set of triples to populate the IMR with.
+        :type v: set or rdflib.Graph
+        """
         if isinstance(v, Resource):
         if isinstance(v, Resource):
             v = v.graph
             v = v.graph
         self._imr = Resource(Graph(), self.uri)
         self._imr = Resource(Graph(), self.uri)
@@ -178,17 +186,17 @@ class Ldpr(metaclass=ABCMeta):
 
 
     @imr.deleter
     @imr.deleter
     def imr(self):
     def imr(self):
-        '''
+        """
         Delete in-memory buffered resource.
         Delete in-memory buffered resource.
-        '''
+        """
         delattr(self, '_imr')
         delattr(self, '_imr')
 
 
 
 
     @property
     @property
     def metadata(self):
     def metadata(self):
-        '''
+        """
         Get resource metadata.
         Get resource metadata.
-        '''
+        """
         if not hasattr(self, '_metadata'):
         if not hasattr(self, '_metadata'):
             if hasattr(self, '_imr'):
             if hasattr(self, '_imr'):
                 logger.info('Metadata is IMR.')
                 logger.info('Metadata is IMR.')
@@ -203,59 +211,32 @@ class Ldpr(metaclass=ABCMeta):
 
 
     @metadata.setter
     @metadata.setter
     def metadata(self, rsrc):
     def metadata(self, rsrc):
-        '''
+        """
         Set resource metadata.
         Set resource metadata.
-        '''
+        """
         if not isinstance(rsrc, Resource):
         if not isinstance(rsrc, Resource):
             raise TypeError('Provided metadata is not a Resource object.')
             raise TypeError('Provided metadata is not a Resource object.')
         self._metadata = rsrc
         self._metadata = rsrc
 
 
 
 
-    @property
-    def stored_or_new_imr(self):
-        '''
-        Extract an in-memory resource for harmless manipulation and output.
-
-        If the resource is not stored (yet), initialize a new IMR with basic
-        triples.
-
-        @return rdflib.resource.Resource
-        '''
-        if not hasattr(self, '_imr'):
-            if hasattr(self, '_imr_options'):
-                #logger.debug('IMR options:{}'.format(self._imr_options))
-                imr_options = self._imr_options
-            else:
-                imr_options = {}
-            options = dict(imr_options, strict=True)
-            try:
-                self._imr = rdfly.extract_imr(self.uid, **options)
-            except ResourceNotExistsError:
-                self._imr = Resource(Graph(), self.uri)
-                for t in self.base_types:
-                    self.imr.add(RDF.type, t)
-
-        return self._imr
-
-
     @property
     @property
     def out_graph(self):
     def out_graph(self):
-        '''
+        """
         Retun a graph of the resource's IMR formatted for output.
         Retun a graph of the resource's IMR formatted for output.
-        '''
+        """
         out_gr = Graph(identifier=self.uri)
         out_gr = Graph(identifier=self.uri)
 
 
         for t in self.imr.graph:
         for t in self.imr.graph:
             if (
             if (
                 # Exclude digest hash and version information.
                 # Exclude digest hash and version information.
                 t[1] not in {
                 t[1] not in {
-                    nsc['premis'].hasMessageDigest,
+                    #nsc['premis'].hasMessageDigest,
                     nsc['fcrepo'].hasVersion,
                     nsc['fcrepo'].hasVersion,
                 }
                 }
             ) and (
             ) and (
                 # Only include server managed triples if requested.
                 # Only include server managed triples if requested.
-                self._imr_options.get('incl_srv_mgd', True)
-                or not self._is_trp_managed(t)
+                self._imr_options.get('incl_srv_mgd', True) or
+                not self._is_trp_managed(t)
             ):
             ):
                 out_gr.add(t)
                 out_gr.add(t)
 
 
@@ -264,9 +245,9 @@ class Ldpr(metaclass=ABCMeta):
 
 
     @property
     @property
     def version_info(self):
     def version_info(self):
-        '''
+        """
         Return version metadata (`fcr:versions`).
         Return version metadata (`fcr:versions`).
-        '''
+        """
         if not hasattr(self, '_version_info'):
         if not hasattr(self, '_version_info'):
             try:
             try:
                 #@ TODO get_version_info should return a graph.
                 #@ TODO get_version_info should return a graph.
@@ -279,12 +260,12 @@ class Ldpr(metaclass=ABCMeta):
 
 
     @property
     @property
     def version_uids(self):
     def version_uids(self):
-        '''
+        """
         Return a generator of version UIDs (relative to their parent resource).
         Return a generator of version UIDs (relative to their parent resource).
-        '''
+        """
         gen = self.version_info[
         gen = self.version_info[
-                self.uri :
-                nsc['fcrepo'].hasVersion / nsc['fcrepo'].hasVersionLabel :]
+            self.uri:
+            nsc['fcrepo'].hasVersion / nsc['fcrepo'].hasVersionLabel:]
 
 
         return {str(uid) for uid in gen}
         return {str(uid) for uid in gen}
 
 
@@ -302,10 +283,10 @@ class Ldpr(metaclass=ABCMeta):
 
 
     @property
     @property
     def types(self):
     def types(self):
-        '''All RDF types.
+        """All RDF types.
 
 
         @return set(rdflib.term.URIRef)
         @return set(rdflib.term.URIRef)
-        '''
+        """
         if not hasattr(self, '_types'):
         if not hasattr(self, '_types'):
             if len(self.metadata.graph):
             if len(self.metadata.graph):
                 metadata = self.metadata
                 metadata = self.metadata
@@ -322,10 +303,10 @@ class Ldpr(metaclass=ABCMeta):
 
 
     @property
     @property
     def ldp_types(self):
     def ldp_types(self):
-        '''The LDP types.
+        """The LDP types.
 
 
         @return set(rdflib.term.URIRef)
         @return set(rdflib.term.URIRef)
-        '''
+        """
         if not hasattr(self, '_ldp_types'):
         if not hasattr(self, '_ldp_types'):
             self._ldp_types = {t for t in self.types if nsc['ldp'] in t}
             self._ldp_types = {t for t in self.types if nsc['ldp'] in t}
 
 
@@ -335,9 +316,9 @@ class Ldpr(metaclass=ABCMeta):
     ## LDP METHODS ##
     ## LDP METHODS ##
 
 
     def head(self):
     def head(self):
-        '''
+        """
         Return values for the headers.
         Return values for the headers.
-        '''
+        """
         out_headers = defaultdict(list)
         out_headers = defaultdict(list)
 
 
         digest = self.metadata.value(nsc['premis'].hasMessageDigest)
         digest = self.metadata.value(nsc['premis'].hasMessageDigest)
@@ -358,22 +339,22 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def get_version(self, ver_uid, **kwargs):
     def get_version(self, ver_uid, **kwargs):
-        '''
+        """
         Get a version by label.
         Get a version by label.
-        '''
+        """
         return rdfly.extract_imr(self.uid, ver_uid, **kwargs).graph
         return rdfly.extract_imr(self.uid, ver_uid, **kwargs).graph
 
 
 
 
     def create_or_replace(self, create_only=False):
     def create_or_replace(self, create_only=False):
-        '''
+        """
         Create or update a resource. PUT and POST methods, which are almost
         Create or update a resource. PUT and POST methods, which are almost
         identical, are wrappers for this method.
         identical, are wrappers for this method.
 
 
-        @param create_only (boolean) Whether this is a create-only operation.
-        '''
+        :param boolean create_only: Whether this is a create-only operation.
+        """
         create = create_only or not self.is_stored
         create = create_only or not self.is_stored
-        ev_type = RES_CREATED if create else RES_UPDATED
 
 
+        ev_type = RES_CREATED if create else RES_UPDATED
         self._add_srv_mgd_triples(create)
         self._add_srv_mgd_triples(create)
         ref_int = rdfly.config['referential_integrity']
         ref_int = rdfly.config['referential_integrity']
         if ref_int:
         if ref_int:
@@ -384,10 +365,10 @@ class Ldpr(metaclass=ABCMeta):
             rdfly.truncate_rsrc(self.uid)
             rdfly.truncate_rsrc(self.uid)
 
 
         remove_trp = {
         remove_trp = {
-            (self.uri, nsc['fcrepo'].lastModified, None),
-            (self.uri, nsc['fcrepo'].lastModifiedBy, None),
-        }
-        add_trp = set(self.provided_imr.graph) | self._containment_rel(create)
+            (self.uri, pred, None) for pred in self.delete_preds_on_replace}
+        add_trp = (
+            set(self.provided_imr.graph) |
+            self._containment_rel(create))
 
 
         self._modify_rsrc(ev_type, remove_trp, add_trp)
         self._modify_rsrc(ev_type, remove_trp, add_trp)
         new_gr = Graph()
         new_gr = Graph()
@@ -399,33 +380,15 @@ class Ldpr(metaclass=ABCMeta):
         return ev_type
         return ev_type
 
 
 
 
-    def put(self):
-        '''
-        https://www.w3.org/TR/ldp/#ldpr-HTTP_PUT
-        '''
-        return self.create_or_replace()
-
-
-    def patch(self, update_str):
-        '''
-        Update an existing resource by applying a SPARQL-UPDATE query.
-
-        @param update_str (string) SPARQL-Update staements.
-        '''
-        self.handling = 'lenient' # FCREPO does that and Hyrax requires it.
-
-        return self._sparql_update(update_str)
-
-
     def bury_rsrc(self, inbound, tstone_pointer=None):
     def bury_rsrc(self, inbound, tstone_pointer=None):
-        '''
+        """
         Delete a single resource and create a tombstone.
         Delete a single resource and create a tombstone.
 
 
-        @param inbound (boolean) Whether to delete the inbound relationships.
-        @param tstone_pointer (URIRef) If set to a URN, this creates a pointer
+        :param boolean inbound: Whether to delete the inbound relationships.
+        :param URIRef tstone_pointer: If set to a URN, this creates a pointer
         to the tombstone of the resource that used to contain the deleted
         to the tombstone of the resource that used to contain the deleted
         resource. Otherwise the deleted resource becomes a tombstone.
         resource. Otherwise the deleted resource becomes a tombstone.
-        '''
+        """
         logger.info('Burying resource {}'.format(self.uid))
         logger.info('Burying resource {}'.format(self.uid))
         # Create a backup snapshot for resurrection purposes.
         # Create a backup snapshot for resurrection purposes.
         self.create_rsrc_snapshot(uuid4())
         self.create_rsrc_snapshot(uuid4())
@@ -457,9 +420,9 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def forget_rsrc(self, inbound=True):
     def forget_rsrc(self, inbound=True):
-        '''
+        """
         Remove all traces of a resource and versions.
         Remove all traces of a resource and versions.
-        '''
+        """
         logger.info('Purging resource {}'.format(self.uid))
         logger.info('Purging resource {}'.format(self.uid))
         refint = env.config['store']['ldp_rs']['referential_integrity']
         refint = env.config['store']['ldp_rs']['referential_integrity']
         inbound = True if refint else inbound
         inbound = True if refint else inbound
@@ -470,9 +433,9 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def create_rsrc_snapshot(self, ver_uid):
     def create_rsrc_snapshot(self, ver_uid):
-        '''
+        """
         Perform version creation and return the version UID.
         Perform version creation and return the version UID.
-        '''
+        """
         # Create version resource from copying the current state.
         # Create version resource from copying the current state.
         logger.info(
         logger.info(
             'Creating version snapshot {} for resource {}.'.format(
             'Creating version snapshot {} for resource {}.'.format(
@@ -510,27 +473,27 @@ class Ldpr(metaclass=ABCMeta):
             (self.uri, nsc['fcrepo'].hasVersion, ver_uri),
             (self.uri, nsc['fcrepo'].hasVersion, ver_uri),
             (self.uri, nsc['fcrepo'].hasVersions, nsc['fcres'][vers_uid]),
             (self.uri, nsc['fcrepo'].hasVersions, nsc['fcres'][vers_uid]),
         }
         }
-        self._modify_rsrc(RES_UPDATED, add_trp=rsrc_add_gr, notify=False)
+        self._modify_rsrc(RES_UPDATED, add_trp=rsrc_add_gr)
 
 
         return ver_uid
         return ver_uid
 
 
 
 
     def resurrect_rsrc(self):
     def resurrect_rsrc(self):
-        '''
+        """
         Resurrect a resource from a tombstone.
         Resurrect a resource from a tombstone.
 
 
         @EXPERIMENTAL
         @EXPERIMENTAL
-        '''
+        """
         tstone_trp = set(rdfly.extract_imr(self.uid, strict=False).graph)
         tstone_trp = set(rdfly.extract_imr(self.uid, strict=False).graph)
 
 
-        ver_rsp = self.version_info.graph.query('''
+        ver_rsp = self.version_info.graph.query("""
         SELECT ?uid {
         SELECT ?uid {
           ?latest fcrepo:hasVersionLabel ?uid ;
           ?latest fcrepo:hasVersionLabel ?uid ;
             fcrepo:created ?ts .
             fcrepo:created ?ts .
         }
         }
         ORDER BY DESC(?ts)
         ORDER BY DESC(?ts)
         LIMIT 1
         LIMIT 1
-        ''')
+        """)
         ver_uid = str(ver_rsp.bindings[0]['uid'])
         ver_uid = str(ver_rsp.bindings[0]['uid'])
         ver_trp = set(rdfly.get_metadata(self.uid, ver_uid).graph)
         ver_trp = set(rdfly.get_metadata(self.uid, ver_uid).graph)
 
 
@@ -541,9 +504,9 @@ class Ldpr(metaclass=ABCMeta):
             }:
             }:
                 laz_gr.add((self.uri, t[1], t[2]))
                 laz_gr.add((self.uri, t[1], t[2]))
         laz_gr.add((self.uri, RDF.type, nsc['fcrepo'].Resource))
         laz_gr.add((self.uri, RDF.type, nsc['fcrepo'].Resource))
-        if nsc['ldp'].NonRdfSource in laz_gr[: RDF.type :]:
+        if nsc['ldp'].NonRdfSource in laz_gr[:RDF.type:]:
             laz_gr.add((self.uri, RDF.type, nsc['fcrepo'].Binary))
             laz_gr.add((self.uri, RDF.type, nsc['fcrepo'].Binary))
-        elif nsc['ldp'].Container in laz_gr[: RDF.type :]:
+        elif nsc['ldp'].Container in laz_gr[:RDF.type:]:
             laz_gr.add((self.uri, RDF.type, nsc['fcrepo'].Container))
             laz_gr.add((self.uri, RDF.type, nsc['fcrepo'].Container))
 
 
         laz_set = set(laz_gr) | self._containment_rel()
         laz_set = set(laz_gr) | self._containment_rel()
@@ -554,16 +517,16 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def create_version(self, ver_uid=None):
     def create_version(self, ver_uid=None):
-        '''
+        """
         Create a new version of the resource.
         Create a new version of the resource.
 
 
         NOTE: This creates an event only for the resource being updated (due
         NOTE: This creates an event only for the resource being updated (due
         to the added `hasVersion` triple and possibly to the `hasVersions` one)
         to the added `hasVersion` triple and possibly to the `hasVersions` one)
         but not for the version being created.
         but not for the version being created.
 
 
-        @param ver_uid Version ver_uid. If already existing, an exception is
+        :param  ver_uid: Version ver_uid. If already existing, an exception is
         raised.
         raised.
-        '''
+        """
         if not ver_uid or ver_uid in self.version_uids:
         if not ver_uid or ver_uid in self.version_uids:
             ver_uid = str(uuid4())
             ver_uid = str(uuid4())
 
 
@@ -571,13 +534,13 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def revert_to_version(self, ver_uid, backup=True):
     def revert_to_version(self, ver_uid, backup=True):
-        '''
+        """
         Revert to a previous version.
         Revert to a previous version.
 
 
-        @param ver_uid (string) Version UID.
-        @param backup (boolean) Whether to create a backup snapshot. Default is
+        :param str ver_uid: Version UID.
+        :param boolean backup: Whether to create a backup snapshot. Default is
         true.
         true.
-        '''
+        """
         # Create a backup snapshot.
         # Create a backup snapshot.
         if backup:
         if backup:
             self.create_version()
             self.create_version()
@@ -598,45 +561,50 @@ class Ldpr(metaclass=ABCMeta):
     ## PROTECTED METHODS ##
     ## PROTECTED METHODS ##
 
 
     def _is_trp_managed(self, t):
     def _is_trp_managed(self, t):
-        '''
+        """
         Whether a triple is server-managed.
         Whether a triple is server-managed.
 
 
-        @return boolean
-        '''
+        :param tuple t: Triple as a 3-tuple of terms.
+
+        :rtype: boolean
+        """
         return t[1] in srv_mgd_predicates or (
         return t[1] in srv_mgd_predicates or (
             t[1] == RDF.type and t[2] in srv_mgd_types)
             t[1] == RDF.type and t[2] in srv_mgd_types)
 
 
 
 
     def _modify_rsrc(
     def _modify_rsrc(
-            self, ev_type, remove_trp=set(), add_trp=set(), notify=True):
-        '''
+            self, ev_type, remove_trp=set(), add_trp=set()):
+        """
         Low-level method to modify a graph for a single resource.
         Low-level method to modify a graph for a single resource.
 
 
         This is a crucial point for messaging. Any write operation on the RDF
         This is a crucial point for messaging. Any write operation on the RDF
         store that needs to be notified should be performed by invoking this
         store that needs to be notified should be performed by invoking this
         method.
         method.
 
 
-        @param ev_type (string) The type of event (create, update, delete).
-        @param remove_trp (set) Triples to be removed.
-        @param add_trp (set) Triples to be added.
-        @param notify (boolean) Whether to send a message about the change.
-        '''
+        :param ev_type: The type of event (create, update,
+        delete) or None. In the latter case, no notification is sent.
+        :type ev_type: str or None
+        :param set remove_trp: Triples to be removed.
+        :param set add_trp: Triples to be added.
+        """
         rdfly.modify_rsrc(self.uid, remove_trp, add_trp)
         rdfly.modify_rsrc(self.uid, remove_trp, add_trp)
 
 
-        if notify and env.config['application'].get('messaging'):
+        if (
+                ev_type is not None and
+                env.config['application'].get('messaging')):
             logger.debug('Enqueuing message for {}'.format(self.uid))
             logger.debug('Enqueuing message for {}'.format(self.uid))
             self._enqueue_msg(ev_type, remove_trp, add_trp)
             self._enqueue_msg(ev_type, remove_trp, add_trp)
 
 
 
 
     def _enqueue_msg(self, ev_type, remove_trp=None, add_trp=None):
     def _enqueue_msg(self, ev_type, remove_trp=None, add_trp=None):
-        '''
+        """
         Compose a message about a resource change.
         Compose a message about a resource change.
 
 
         The message is enqueued for asynchronous processing.
         The message is enqueued for asynchronous processing.
 
 
-        @param ev_type (string) The event type. See global constants.
-        @param remove_trp (set) Triples removed. Only used if the 
-        '''
+        :param str ev_type: The event type. See global constants.
+        :param set remove_trp: Triples removed. Only used if the
+        """
         try:
         try:
             rsrc_type = tuple(str(t) for t in self.types)
             rsrc_type = tuple(str(t) for t in self.types)
             actor = self.metadata.value(nsc['fcrepo'].createdBy)
             actor = self.metadata.value(nsc['fcrepo'].createdBy)
@@ -674,11 +642,11 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def _check_mgd_terms(self, gr):
     def _check_mgd_terms(self, gr):
-        '''
+        """
         Check whether server-managed terms are in a RDF payload.
         Check whether server-managed terms are in a RDF payload.
 
 
-        @param gr (rdflib.Graph) The graph to validate.
-        '''
+        :param rdflib.Graph gr: The graph to validate.
+        """
         offending_subjects = set(gr.subjects()) & srv_mgd_subjects
         offending_subjects = set(gr.subjects()) & srv_mgd_subjects
         if offending_subjects:
         if offending_subjects:
             if self.handling == 'strict':
             if self.handling == 'strict':
@@ -715,11 +683,11 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def _add_srv_mgd_triples(self, create=False):
     def _add_srv_mgd_triples(self, create=False):
-        '''
+        """
         Add server-managed triples to a provided IMR.
         Add server-managed triples to a provided IMR.
 
 
-        @param create (boolean) Whether the resource is being created.
-        '''
+        :param  create: Whether the resource is being created.
+        """
         # Base LDP types.
         # Base LDP types.
         for t in self.base_types:
         for t in self.base_types:
             self.provided_imr.add(RDF.type, t)
             self.provided_imr.add(RDF.type, t)
@@ -747,7 +715,7 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def _containment_rel(self, create):
     def _containment_rel(self, create):
-        '''Find the closest parent in the path indicated by the uid and
+        """Find the closest parent in the path indicated by the uid and
         establish a containment triple.
         establish a containment triple.
 
 
         Check the path-wise parent of the new resource. If it exists, add the
         Check the path-wise parent of the new resource. If it exists, add the
@@ -763,9 +731,9 @@ class Ldpr(metaclass=ABCMeta):
         - If fcres:/e is being created, the root node becomes container of
         - If fcres:/e is being created, the root node becomes container of
           fcres:/e.
           fcres:/e.
 
 
-        @param create (bool) Whether the resource is being created. If false,
+        :param bool create: Whether the resource is being created. If false,
         the parent container is not updated.
         the parent container is not updated.
-        '''
+        """
         from lakesuperior.model.ldp_factory import LdpFactory
         from lakesuperior.model.ldp_factory import LdpFactory
 
 
         if '/' in self.uid.lstrip('/'):
         if '/' in self.uid.lstrip('/'):
@@ -789,13 +757,13 @@ class Ldpr(metaclass=ABCMeta):
             parent_uid = ROOT_UID
             parent_uid = ROOT_UID
 
 
         parent_rsrc = LdpFactory.from_stored(
         parent_rsrc = LdpFactory.from_stored(
-            parent_uid, repr_opts={'incl_children' : False}, handling='none')
+            parent_uid, repr_opts={'incl_children': False}, handling='none')
 
 
         # Only update parent if the resource is new.
         # Only update parent if the resource is new.
         if create:
         if create:
             add_gr = Graph()
             add_gr = Graph()
             add_gr.add(
             add_gr.add(
-                    (nsc['fcres'][parent_uid], nsc['ldp'].contains, self.uri))
+                (nsc['fcres'][parent_uid], nsc['ldp'].contains, self.uri))
             parent_rsrc._modify_rsrc(RES_UPDATED, add_trp=add_gr)
             parent_rsrc._modify_rsrc(RES_UPDATED, add_trp=add_gr)
 
 
         # Direct or indirect container relationship.
         # Direct or indirect container relationship.
@@ -803,13 +771,13 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def _dedup_deltas(self, remove_gr, add_gr):
     def _dedup_deltas(self, remove_gr, add_gr):
-        '''
+        """
         Remove duplicate triples from add and remove delta graphs, which would
         Remove duplicate triples from add and remove delta graphs, which would
         otherwise contain unnecessary statements that annul each other.
         otherwise contain unnecessary statements that annul each other.
 
 
         @return tuple 2 "clean" sets of respectively remove statements and
         @return tuple 2 "clean" sets of respectively remove statements and
         add statements.
         add statements.
-        '''
+        """
         return (
         return (
             remove_gr - add_gr,
             remove_gr - add_gr,
             add_gr - remove_gr
             add_gr - remove_gr
@@ -817,11 +785,11 @@ class Ldpr(metaclass=ABCMeta):
 
 
 
 
     def _add_ldp_dc_ic_rel(self, cont_rsrc):
     def _add_ldp_dc_ic_rel(self, cont_rsrc):
-        '''
+        """
         Add relationship triples from a parent direct or indirect container.
         Add relationship triples from a parent direct or indirect container.
 
 
-        @param cont_rsrc (rdflib.resource.Resouce)  The container resource.
-        '''
+        :param rdflib.resource.Resouce cont_rsrc:  The container resource.
+        """
         cont_p = set(cont_rsrc.metadata.graph.predicates())
         cont_p = set(cont_rsrc.metadata.graph.predicates())
 
 
         logger.info('Checking direct or indirect containment.')
         logger.info('Checking direct or indirect containment.')
@@ -842,8 +810,9 @@ class Ldpr(metaclass=ABCMeta):
                 o = self.uri
                 o = self.uri
 
 
             elif (
             elif (
-                    cont_rsrc.metadata[RDF.type: nsc['ldp'].IndirectContainer]
-                    and self.INS_CNT_REL_URI in cont_p):
+                    cont_rsrc.metadata[
+                        RDF.type: nsc['ldp'].IndirectContainer] and
+                    self.INS_CNT_REL_URI in cont_p):
                 logger.info('Parent is an indirect container.')
                 logger.info('Parent is an indirect container.')
                 cont_rel_uri = cont_rsrc.metadata.value(
                 cont_rel_uri = cont_rsrc.metadata.value(
                     self.INS_CNT_REL_URI).identifier
                     self.INS_CNT_REL_URI).identifier
@@ -857,22 +826,21 @@ class Ldpr(metaclass=ABCMeta):
         return add_trp
         return add_trp
 
 
 
 
-    def _sparql_update(self, update_str, notify=True):
-        '''
+    def sparql_update(self, update_str):
+        """
         Apply a SPARQL update to a resource.
         Apply a SPARQL update to a resource.
 
 
-        @param update_str (string) SPARQL-Update string. All URIs are local.
-
-        @return 
-        '''
-        self.handling = 'lenient' # FCREPO does that and Hyrax requires it.
+        :param str update_str: SPARQL-Update string. All URIs are local.
+        """
+        # FCREPO does that and Hyrax requires it.
+        self.handling = 'lenient'
         delta = self._sparql_delta(update_str)
         delta = self._sparql_delta(update_str)
 
 
-        return self._modify_rsrc(RES_UPDATED, *delta, notify=notify)
+        self._modify_rsrc(RES_UPDATED, *delta)
 
 
 
 
     def _sparql_delta(self, q):
     def _sparql_delta(self, q):
-        '''
+        """
         Calculate the delta obtained by a SPARQL Update operation.
         Calculate the delta obtained by a SPARQL Update operation.
 
 
         This is a critical component of the SPARQL update prcess and does a
         This is a critical component of the SPARQL update prcess and does a
@@ -890,10 +858,11 @@ class Ldpr(metaclass=ABCMeta):
         modified. If a server-managed term is present in the query but does not
         modified. If a server-managed term is present in the query but does not
         cause any change in the updated resource, no error is raised.
         cause any change in the updated resource, no error is raised.
 
 
-        @return tuple(rdflib.Graph) Remove and add graphs. These can be used
-        with `BaseStoreLayout.update_resource` and/or recorded as separate
+        :rtype: tuple(rdflib.Graph)
+        :return: Remove and add graphs. These can be used
+        with ``BaseStoreLayout.update_resource`` and/or recorded as separate
         events in a provenance tracking system.
         events in a provenance tracking system.
-        '''
+        """
         logger.debug('Provided SPARQL query: {}'.format(q))
         logger.debug('Provided SPARQL query: {}'.format(q))
         pre_gr = self.imr.graph
         pre_gr = self.imr.graph
 
 

+ 33 - 27
lakesuperior/store/ldp_nr/default_layout.py

@@ -15,6 +15,36 @@ class DefaultLayout(BaseNonRdfLayout):
     '''
     '''
     Default file layout.
     Default file layout.
     '''
     '''
+    @staticmethod
+    def local_path(root, uuid, bl=4, bc=4):
+        '''
+        Generate the resource path splitting the resource checksum according to
+        configuration parameters.
+
+        @param uuid (string) The resource UUID. This corresponds to the content
+        checksum.
+        '''
+        logger.debug('Generating path from uuid: {}'.format(uuid))
+        term = len(uuid) if bc == 0 else min(bc * bl, len(uuid))
+
+        path = [uuid[i : i + bl] for i in range(0, term, bl)]
+
+        if bc > 0:
+            path.append(uuid[term :])
+        path.insert(0, root)
+
+        return '/'.join(path)
+
+
+    def __init__(self, *args, **kwargs):
+        '''
+        Set up path segmentation parameters.
+        '''
+        super().__init__(*args, **kwargs)
+
+        self.bl = self.config['pairtree_branch_length']
+        self.bc = self.config['pairtree_branches']
+
 
 
     ## INTERFACE METHODS ##
     ## INTERFACE METHODS ##
 
 
@@ -60,11 +90,11 @@ class DefaultLayout(BaseNonRdfLayout):
             os.unlink(tmp_file)
             os.unlink(tmp_file)
             raise
             raise
         if size == 0:
         if size == 0:
-            logger.warn('Zero-file size received.')
+            logger.warn('Zero-length file received.')
 
 
         # Move temp file to final destination.
         # Move temp file to final destination.
         uuid = hash.hexdigest()
         uuid = hash.hexdigest()
-        dst = self.local_path(uuid)
+        dst = __class__.local_path(self.root, uuid, self.bl, self.bc)
         logger.debug('Saving file to disk: {}'.format(dst))
         logger.debug('Saving file to disk: {}'.format(dst))
         if not os.access(os.path.dirname(dst), os.X_OK):
         if not os.access(os.path.dirname(dst), os.X_OK):
             os.makedirs(os.path.dirname(dst))
             os.makedirs(os.path.dirname(dst))
@@ -84,28 +114,4 @@ class DefaultLayout(BaseNonRdfLayout):
         '''
         '''
         See BaseNonRdfLayout.delete.
         See BaseNonRdfLayout.delete.
         '''
         '''
-        os.unlink(self.local_path(uuid))
-
-
-    ## PROTECTED METHODS ##
-
-    def local_path(self, uuid):
-        '''
-        Generate the resource path splitting the resource checksum according to
-        configuration parameters.
-
-        @param uuid (string) The resource UUID. This corresponds to the content
-        checksum.
-        '''
-        logger.debug('Generating path from uuid: {}'.format(uuid))
-        bl = self.config['pairtree_branch_length']
-        bc = self.config['pairtree_branches']
-        term = len(uuid) if bc==0 else min(bc*bl, len(uuid))
-
-        path = [ uuid[i:i+bl] for i in range(0, term, bl) ]
-
-        if bc > 0:
-            path.append(uuid[term:])
-        path.insert(0, self.root)
-
-        return '/'.join(path)
+        os.unlink(__class__.local_path(self.root, uuid, self.bl, self.bc))

+ 24 - 0
lakesuperior/store/ldp_rs/lmdb_store.py

@@ -595,6 +595,30 @@ class LmdbStore(Store):
                 yield self._from_key(spok), contexts
                 yield self._from_key(spok), contexts
 
 
 
 
+    def all_terms(self, term_type):
+        """
+        Return all terms of a type (``s``, ``p``, or ``o``) in the store.
+
+        :param str term_type: one of ``s``, ``p`` or ``o``.
+
+        :rtype: Iterator(rdflib.term.Identifier)
+        :return: Iterator of all terms.
+        :raise ValueError: if the term type is not one of the expected values.
+        """
+        if term_type == 's':
+            idx_label = 's:po'
+        elif term_type == 'p':
+            idx_label = 'p:so'
+        elif term_type == 'o':
+            idx_label = 'o:sp'
+        else:
+            raise ValueError('Term type must be \'s\', \'p\' or \'o\'.')
+
+        with self.cur(idx_label) as cur:
+            for key in cur.iternext_nodup():
+                yield self._from_key(key)[0]
+
+
     def bind(self, prefix, namespace):
     def bind(self, prefix, namespace):
         '''
         '''
         Bind a prefix to a namespace.
         Bind a prefix to a namespace.

+ 29 - 17
lakesuperior/store/ldp_rs/rsrc_centric_layout.py

@@ -2,6 +2,9 @@ import logging
 
 
 from collections import defaultdict
 from collections import defaultdict
 from itertools import chain
 from itertools import chain
+from string import Template
+
+import arrow
 
 
 from rdflib import Dataset, Graph, Literal, URIRef, plugin
 from rdflib import Dataset, Graph, Literal, URIRef, plugin
 from rdflib.namespace import RDF
 from rdflib.namespace import RDF
@@ -178,7 +181,8 @@ class RsrcCentricLayout:
         store.open()
         store.open()
         with TxnManager(store, True):
         with TxnManager(store, True):
             with open('data/bootstrap/rsrc_centric_layout.sparql', 'r') as f:
             with open('data/bootstrap/rsrc_centric_layout.sparql', 'r') as f:
-                self.ds.update(f.read())
+                data = Template(f.read())
+                self.ds.update(data.substitute(timestamp=arrow.utcnow()))
 
 
 
 
     def get_raw(self, uri, ctx=None):
     def get_raw(self, uri, ctx=None):
@@ -491,7 +495,8 @@ class RsrcCentricLayout:
             # Add metadata.
             # Add metadata.
             meta_gr.set(
             meta_gr.set(
                     (gr_uri, nsc['foaf'].primaryTopic, nsc['fcres'][uid]))
                     (gr_uri, nsc['foaf'].primaryTopic, nsc['fcres'][uid]))
-            meta_gr.set((gr_uri, nsc['fcrepo'].created, env.timestamp_term))
+            ts = getattr(env, 'timestamp_term', Literal(arrow.utcnow()))
+            meta_gr.set((gr_uri, nsc['fcrepo'].created, ts))
             if historic:
             if historic:
                 # @FIXME Ugly reverse engineering.
                 # @FIXME Ugly reverse engineering.
                 ver_uid = uid.split(VERS_CONT_LABEL)[1].lstrip('/')
                 ver_uid = uid.split(VERS_CONT_LABEL)[1].lstrip('/')
@@ -529,21 +534,6 @@ class RsrcCentricLayout:
         return '{}/{}/{}'.format(uid, VERS_CONT_LABEL, ver_uid)
         return '{}/{}/{}'.format(uid, VERS_CONT_LABEL, ver_uid)
 
 
 
 
-    def clear_smt(self, uid):
-        '''
-        This is an ugly way to deal with lenient SPARQL update statements
-        that may insert server-managed triples into a user graph.
-
-        @TODO Deprecate when a solution to provide a sanitized SPARQL update
-        sring is found.
-        '''
-        gr = self.ds.graph(nsc['fcmain'][uid])
-        for p in srv_mgd_predicates:
-            gr.remove((None, p, None))
-        for t in srv_mgd_types:
-            gr.remove((None, RDF.type, t))
-
-
     def uri_to_uid(self, uri):
     def uri_to_uid(self, uri):
         '''
         '''
         Convert an internal URI to a UID.
         Convert an internal URI to a UID.
@@ -551,6 +541,28 @@ class RsrcCentricLayout:
         return str(uri).replace(nsc['fcres'], '')
         return str(uri).replace(nsc['fcres'], '')
 
 
 
 
+    def find_refint_violations(self):
+        """
+        Find all referential integrity violations.
+
+        This method looks for dangling relationships within a repository by
+        checking the objects of each triple; if the object is an in-repo
+        resource reference, and no resource with that URI results to be in the
+        repo, that triple is reported.
+
+        :rtype: set
+        :return: Triples referencing a repository URI that is not a resource.
+        """
+        for obj in self.store.all_terms('o'):
+            if (
+                    isinstance(obj, URIRef)
+                    and str(obj).startswith(nsc['fcres'])
+                    and not self.ask_rsrc_exists(self.uri_to_uid(obj))):
+                print('Object not found: {}'.format(obj))
+                for trp in self.store.triples((None, None, obj)):
+                    yield trp
+
+
     ## PROTECTED MEMBERS ##
     ## PROTECTED MEMBERS ##
 
 
     def _check_rsrc_status(self, rsrc):
     def _check_rsrc_status(self, rsrc):

+ 87 - 61
lsup-admin

@@ -1,19 +1,18 @@
 #!/usr/bin/env python
 #!/usr/bin/env python
 import click
 import click
+import click_log
 import json
 import json
+import logging
 import os
 import os
 import sys
 import sys
 
 
-import lakesuperior.env_setup
-
 from lakesuperior.api import admin as admin_api
 from lakesuperior.api import admin as admin_api
 from lakesuperior.config_parser import config
 from lakesuperior.config_parser import config
-from lakesuperior.globals import AppGlobals
 from lakesuperior.env import env
 from lakesuperior.env import env
 from lakesuperior.store.ldp_rs.lmdb_store import TxnManager
 from lakesuperior.store.ldp_rs.lmdb_store import TxnManager
 
 
-rdfly = env.app_globals.rdfly
-nonrdfly = env.app_globals.nonrdfly
+logger = logging.getLogger(__name__)
+click_log.basic_config(logger)
 
 
 
 
 @click.group()
 @click.group()
@@ -22,7 +21,7 @@ def admin():
 
 
 @click.command()
 @click.command()
 def bootstrap():
 def bootstrap():
-    '''
+    """
     Bootstrap binary and graph stores.
     Bootstrap binary and graph stores.
 
 
     This script will parse configuration files and initialize a filesystem and
     This script will parse configuration files and initialize a filesystem and
@@ -30,7 +29,12 @@ def bootstrap():
     It is used in test suites and on a first run.
     It is used in test suites and on a first run.
 
 
     Additional scaffolding files may be parsed to create initial contents.
     Additional scaffolding files may be parsed to create initial contents.
-    '''
+    """
+    import lakesuperior.env_setup
+
+    rdfly = env.app_globals.rdfly
+    nonrdfly = env.app_globals.nonrdfly
+
     click.echo(
     click.echo(
             click.style(
             click.style(
                 'WARNING: This operation will WIPE ALL YOUR DATA.\n',
                 'WARNING: This operation will WIPE ALL YOUR DATA.\n',
@@ -58,12 +62,12 @@ def bootstrap():
     '--human', '-h', is_flag=True, flag_value=True,
     '--human', '-h', is_flag=True, flag_value=True,
     help='Print a human-readable string. By default, JSON is printed.')
     help='Print a human-readable string. By default, JSON is printed.')
 def stats(human=False):
 def stats(human=False):
-    '''
+    """
     Print repository statistics.
     Print repository statistics.
 
 
     @param human (bool) Whether to output the data in human-readable
     @param human (bool) Whether to output the data in human-readable
     format.
     format.
-    '''
+    """
     stat_data = admin_api.stats()
     stat_data = admin_api.stats()
     if human:
     if human:
         click.echo(
         click.echo(
@@ -75,43 +79,46 @@ def stats(human=False):
 
 
 @click.command()
 @click.command()
 def check_fixity(uid):
 def check_fixity(uid):
-    '''
+    """
     [STUB] Check fixity of a resource.
     [STUB] Check fixity of a resource.
-    '''
+    """
     pass
     pass
 
 
 
 
+@click.option(
+    '--config-folder', '-c', default=None, help='Alternative configuration '
+    'folder to look up. If not set, the location set in the environment or '
+    'the default configuration is used.')
 @click.command()
 @click.command()
-def check_refint():
-    '''
-    [STUB] Check referential integrity.
+def check_refint(config_folder=None):
+    """
+    Check referential integrity.
 
 
     This command scans the graph store to verify that all references to
     This command scans the graph store to verify that all references to
     resources within the repository are effectively pointing to existing
     resources within the repository are effectively pointing to existing
-    resources. For repositories set up with the `referencial_integrity` option
+    resources. For repositories set up with the `referential_integrity` option
     (the default), this is a pre-condition for a consistent data set.
     (the default), this is a pre-condition for a consistent data set.
-    '''
-    pass
+
+    Note: this check is run regardless of whether the repository enforces
+    referential integrity.
+    """
+    check_results = admin_api.integrity_check(config_folder)
+    click.echo('Integrity check results:')
+    if len(check_results):
+        click.echo(click.style('Inconsistencies found!', fg='red', bold=True))
+        click.echo('Missing object in the following triples:')
+        for trp in check_results:
+            click.echo(' '.join([str(t) for t in trp[0]]))
+    else:
+        click.echo(click.style('Clean. ', fg='green', bold=True)
+                + 'No inconsistency found.')
 
 
 
 
 @click.command()
 @click.command()
 def cleanup():
 def cleanup():
-    '''
+    """
     [STUB] Clean up orphan database items.
     [STUB] Clean up orphan database items.
-    '''
-    pass
-
-
-@click.command()
-def copy():
-    '''
-    [STUB] Copy (backup) repository data.
-
-    This s a low-level copy, which backs up the data directories containing
-    graph and binary data. It may not even be a necessary command since to
-    back up the repository one just needs to copy the binary and metadata
-    folders.
-    '''
+    """
     pass
     pass
 
 
 
 
@@ -119,41 +126,60 @@ def copy():
 @click.argument('src')
 @click.argument('src')
 @click.argument('dest')
 @click.argument('dest')
 @click.option(
 @click.option(
-    '--binaries', '-b', show_default=True,
-    help='If set to `include`, full binaries are included in the dump. If '
-    'set to `truncate`, binaries are created as zero-byte files in the proper '
-    'folder structure. If set to `skip`, binaries are not exported. Data '
-    'folders are not created.')
-def dump(src, dest, binaries='include'):
-    '''
-    [STUB] Dump repository to disk.
-
-    Dump a Fedora 4 repository to disk. The Fedora repo can be
-    LAKEsuperior or another compatible implementation.
-    '''
-    pass
-
-
-@click.command()
-@click.argument('src')
-@click.argument('dest')
-def load(src, dest):
-    '''
-    [STUB] Load serialized repository data.
-
-    Load serialized data from a filesystem location into a Fedora repository.
-    The Fedora repo can be LAKEsuperior or another compatible implementation.
-    '''
-    pass
+    '--start', '-s', show_default=True,
+    help='Starting point for looking for resources in the repository.\n'
+    'The default `/` value starts at the root, i.e. migrates the whole '
+    'repository.')
+@click.option(
+    '--list-file', '-l', help='Path to a local file containing URIs to be '
+    'used as starting points, one per line. Use this alternatively to `-s`. '
+    'The URIs can be relative to the repository root (e.g. `/a/b/c`) or fully '
+    'qualified (e.g. `https://example.edu/fcrepo/rest/a/b/c`).')
+@click.option(
+    '--zero-binaries', '-z', is_flag=True,
+    help='If set, binaries are created as zero-byte files in the proper '
+    'folder structure rather than having their full content copied.')
+@click.option(
+    '--skip-errors', '-e', is_flag=True,
+    help='If set, when the application encounters an error while retrieving '
+    'a resource from the source repository, it will log the error rather than '
+    'quitting. Other exceptions caused by the application will terminate the '
+    'process as usual.')
+@click_log.simple_verbosity_option(logger)
+def migrate(src, dest, start, list_file, zero_binaries, skip_errors):
+    """
+    Migrate an LDP repository to LAKEsuperior.
+
+    This utility creates a fully functional LAKEshore repository from an
+    existing repository. The source repo can be LAKEsuperior or
+    another LDP-compatible implementation.
+
+    A folder will be created in the location indicated by ``dest``. If the
+    folder exists already, it will be deleted and recreated. The folder will be
+    populated with the RDF and binary data directories and a default
+    configuration directory. The new repository can be immediately started
+    from this location.
+    """
+    logger.info('Migrating {} into a new repository on {}.'.format(
+            src, dest))
+    entries = admin_api.migrate(
+            src, dest, start_pts=start, list_file=list_file,
+            zero_binaries=zero_binaries, skip_errors=skip_errors)
+    logger.info('Migrated {} resources.'.format(entries))
+    logger.info("""Migration complete. To start the new repository, from the
+    directory you launched this script run:
+
+    FCREPO_CONFIG_DIR="{}/etc" ./fcrepo
+
+    Make sure that the default port is not being used by another repository.
+    """.format(dest))
 
 
 
 
 admin.add_command(bootstrap)
 admin.add_command(bootstrap)
 admin.add_command(check_fixity)
 admin.add_command(check_fixity)
 admin.add_command(check_refint)
 admin.add_command(check_refint)
 admin.add_command(cleanup)
 admin.add_command(cleanup)
-admin.add_command(copy)
-admin.add_command(dump)
-admin.add_command(load)
+admin.add_command(migrate)
 admin.add_command(stats)
 admin.add_command(stats)
 
 
 if __name__ == '__main__':
 if __name__ == '__main__':

+ 1 - 0
requirements.txt

@@ -5,6 +5,7 @@ Pillow==4.3.0
 PyYAML==3.12
 PyYAML==3.12
 arrow==0.10.0
 arrow==0.10.0
 click==6.7
 click==6.7
+click-log==0.2.1
 gevent==1.2.2
 gevent==1.2.2
 gunicorn==19.7.1
 gunicorn==19.7.1
 lmdb==0.93
 lmdb==0.93

+ 26 - 38
util/benchmark.py

@@ -7,10 +7,11 @@ from uuid import uuid4
 import arrow
 import arrow
 import requests
 import requests
 
 
-from rdflib import Graph, URIRef, Literal
-
-from util.generators import random_utf8_string
+from util.generators import random_image, random_graph, random_utf8_string
 
 
+__doc__ = '''
+Benchmark script to measure write performance.
+'''
 
 
 default_n = 10000
 default_n = 10000
 webroot = 'http://localhost:8000/ldp'
 webroot = 'http://localhost:8000/ldp'
@@ -31,7 +32,9 @@ if choice and choice.lower() not in ('post', 'put'):
     raise ValueError('Not a valid verb.')
     raise ValueError('Not a valid verb.')
 method = choice.lower() or 'put'
 method = choice.lower() or 'put'
 
 
-# Generate 10,000 children of root node.
+sys.stdout.write('RDF Sources (r), Non-RDF (n), or Both 50/50 (b)? [b] >')
+choice = input().lower()
+res_type = choice or 'b'
 
 
 if del_cont  == 'y':
 if del_cont  == 'y':
     requests.delete(container_uri, headers={'prefer': 'no-tombstone'})
     requests.delete(container_uri, headers={'prefer': 'no-tombstone'})
@@ -44,52 +47,37 @@ ckpt = start
 print('Inserting {} children.'.format(n))
 print('Inserting {} children.'.format(n))
 
 
 # URI used to establish an in-repo relationship.
 # URI used to establish an in-repo relationship.
-prev_uri = container_uri
-size = 50 # Size of graph to be multiplied by 4.
+ref = container_uri
+size = 200 # Size of graph.
 
 
 try:
 try:
-    for i in range(1, n):
+    for i in range(1, n + 1):
         url = '{}/{}'.format(container_uri, uuid4()) if method == 'put' \
         url = '{}/{}'.format(container_uri, uuid4()) if method == 'put' \
                 else container_uri
                 else container_uri
 
 
-        # Generate synthetic graph.
-        #print('generating graph: {}'.format(i))
-        g = Graph()
-        for ii in range(size):
-            g.add((
-                URIRef(''),
-                URIRef('urn:inturi_p:{}'.format(ii % size)),
-                URIRef(prev_uri)
-            ))
-            g.add((
-                URIRef(''),
-                URIRef('urn:lit_p:{}'.format(ii % size)),
-                Literal(random_utf8_string(64))
-            ))
-            g.add((
-                URIRef(''),
-                URIRef('urn:lit_p:{}'.format(ii % size)),
-                Literal(random_utf8_string(64))
-            ))
-            g.add((
-                URIRef(''),
-                URIRef('urn:exturi_p:{}'.format(ii % size)),
-                URIRef('http://exmple.edu/res/{}'.format(ii // 10))
-            ))
-
-        # Send request.
-        rsp = requests.request(
-                method, url, data=g.serialize(format='ttl'),
-                headers={ 'content-type': 'text/turtle'})
+        if res_type == 'r' or (res_type == 'b' and i % 2 == 0):
+            data = random_graph(size, ref).serialize(format='ttl')
+            headers = {'content-type': 'text/turtle'}
+        else:
+            img = random_image(name=uuid4(), ts=16, ims=512)
+            data = img['content']
+            data.seek(0)
+            headers = {
+                    'content-type': 'image/png',
+                    'content-disposition': 'attachment; filename="{}"'
+                        .format(uuid4())}
+
+        #import pdb; pdb.set_trace()
+        rsp = requests.request(method, url, data=data, headers=headers)
         rsp.raise_for_status()
         rsp.raise_for_status()
-        prev_uri = rsp.headers['location']
+        ref = rsp.headers['location']
         if i % 10 == 0:
         if i % 10 == 0:
             now = arrow.utcnow()
             now = arrow.utcnow()
             tdelta = now - ckpt
             tdelta = now - ckpt
             ckpt = now
             ckpt = now
             print('Record: {}\tTime elapsed: {}'.format(i, tdelta))
             print('Record: {}\tTime elapsed: {}'.format(i, tdelta))
 except KeyboardInterrupt:
 except KeyboardInterrupt:
-    print('Interruped after {} iterations.'.format(i))
+    print('Interrupted after {} iterations.'.format(i))
 
 
 tdelta = arrow.utcnow() - start
 tdelta = arrow.utcnow() - start
 print('Total elapsed time: {}'.format(tdelta))
 print('Total elapsed time: {}'.format(tdelta))

+ 45 - 0
util/generators.py

@@ -2,11 +2,14 @@ import io
 import random
 import random
 
 
 from hashlib import sha1
 from hashlib import sha1
+from math import floor
 
 
 import requests
 import requests
 import numpy
 import numpy
 
 
 from PIL import Image
 from PIL import Image
+from rdflib import Graph, URIRef, Literal
+from rdflib.namespace import Namespace, NamespaceManager
 
 
 
 
 # @TODO Update this to include code point ranges to be sampled
 # @TODO Update this to include code point ranges to be sampled
@@ -51,3 +54,45 @@ def random_image(name, ts=8, ims=256):
     }
     }
 
 
 
 
+nsm = NamespaceManager(Graph())
+nsc = {
+    'extp': Namespace('http://ex.org/exturi_p#'),
+    'intp': Namespace('http://ex.org/inturi_p#'),
+    'litp': Namespace('http://ex.org/lit_p#'),
+}
+for pfx, ns in nsc.items():
+    nsm.bind(pfx, ns)
+
+def random_graph(size, ref):
+    '''
+    Generate a synthetic graph.
+
+    @param size (int) size Size of the graph. It will be rounded by a
+    multiplier of 4.
+    '''
+    gr = Graph()
+    gr.namespace_manager = nsm
+    for ii in range(floor(size / 4)):
+        gr.add((
+            URIRef(''),
+            nsc['intp'][str(ii % size)],
+            URIRef(ref)
+        ))
+        gr.add((
+            URIRef(''),
+            nsc['litp'][str(ii % size)],
+            Literal(random_utf8_string(64))
+        ))
+        gr.add((
+            URIRef(''),
+            nsc['litp'][str(ii % size)],
+            Literal(random_utf8_string(64))
+        ))
+        gr.add((
+            URIRef(''),
+            nsc['extp'][str(ii % size)],
+            URIRef('http://example.edu/res/{}'.format(ii // 10))
+        ))
+
+    #print('Graph: {}'.format(gr.serialize(format='turtle').decode('utf-8')))
+    return gr