Jelajahi Sumber

Merge sphinx branch; resolve conflicts.

Stefano Cossu 7 tahun lalu
induk
melakukan
cf2fc146df
60 mengubah file dengan 2279 tambahan dan 1678 penghapusan
  1. 0 177
      README.md
  2. 64 0
      README.rst
  3. 0 43
      doc/notes/architecture.md
  4. 0 33
      doc/notes/cli.md
  5. 0 213
      doc/notes/fcrepo4_deltas.md
  6. 0 260
      doc/notes/indexing_strategy.md
  7. 0 27
      doc/notes/messaging.md
  8. 0 57
      doc/notes/migration.md
  9. 0 63
      doc/notes/model.md
  10. 0 112
      doc/notes/performance.md
  11. 0 82
      doc/notes/storage.md
  12. 22 0
      docs/Makefile
  13. 32 0
      docs/api.rst
  14. 46 0
      docs/apidoc/lakesuperior.model.rst
  15. 85 0
      docs/apidoc/lakesuperior.rst
  16. 7 0
      docs/apidoc/modules.rst
  17. 48 0
      docs/architecture.rst
  18. 0 0
      docs/assets/lakesuperior_arch.png
  19. 0 0
      docs/assets/lakesuperior_recommendations.pdf
  20. TEMPAT SAMPAH
      docs/assets/profile_1K_children_get.pdf
  21. 33 0
      docs/cli.rst
  22. 178 0
      docs/conf.py
  23. 0 0
      docs/examples/store_layouts/graph_per_aspect.trig
  24. 0 0
      docs/examples/store_layouts/graph_per_resource+.trig
  25. 0 0
      docs/examples/store_layouts/graph_per_resource.trig
  26. 239 0
      docs/fcrepo4_deltas.rst
  27. 133 0
      docs/index.rst
  28. 311 0
      docs/indexing_strategy.rst
  29. 30 0
      docs/messaging.rst
  30. 65 0
      docs/migration.rst
  31. 66 0
      docs/model.rst
  32. 0 0
      docs/notes/TODO.historic
  33. 131 0
      docs/performance.rst
  34. 90 0
      docs/setup.rst
  35. 0 0
      docs/src/lakesuperior_arch.graphml
  36. 0 0
      docs/src/lakesuperior_content_model.graphml
  37. 0 0
      docs/src/lakesuperior_recommendations.md
  38. 0 0
      docs/src/template.latex
  39. 0 0
      docs/src/use_cases_transactions.md
  40. 94 0
      docs/storage.rst
  41. 9 8
      lakesuperior/api/query.py
  42. 65 62
      lakesuperior/api/resource.py
  43. 2 2
      lakesuperior/config_parser.py
  44. 8 8
      lakesuperior/endpoints/admin.py
  45. 41 41
      lakesuperior/endpoints/ldp.py
  46. 2 6
      lakesuperior/endpoints/main.py
  47. 5 5
      lakesuperior/endpoints/query.py
  48. 14 18
      lakesuperior/messaging/formatters.py
  49. 4 8
      lakesuperior/messaging/handlers.py
  50. 6 8
      lakesuperior/messaging/messenger.py
  51. 20 18
      lakesuperior/migrator.py
  52. 28 26
      lakesuperior/model/ldp_factory.py
  53. 14 13
      lakesuperior/model/ldp_nr.py
  54. 15 17
      lakesuperior/model/ldp_rs.py
  55. 44 44
      lakesuperior/model/ldpr.py
  56. 7 13
      lakesuperior/store/ldp_nr/base_non_rdf_layout.py
  57. 15 18
      lakesuperior/store/ldp_nr/default_layout.py
  58. 184 184
      lakesuperior/store/ldp_rs/lmdb_store.py
  59. 90 85
      lakesuperior/store/ldp_rs/rsrc_centric_layout.py
  60. 32 27
      lakesuperior/toolbox.py

+ 0 - 177
README.md

@@ -1,177 +0,0 @@
-# LAKEsuperior
-
-[![build status](
-  http://img.shields.io/travis/scossu/lakesuperior/master.svg?style=flat)](
- https://travis-ci.org/scossu/lakesuperior)
-
-LAKEsuperior is an alternative [Fedora Repository](http://fedorarepository.org)
-implementation.
-
-## Guiding Principles
-
-LAKEsuperior aims at being an uncomplicated, efficient Fedora 4 implementation.
-
-Its main goals are:
-
-- **Reliability:** Based on solid technologies with stability in mind.
-- **Efficiency:** Small memory and CPU footprint, high scalability.
-- **Ease of management:** Tools to perform monitoring and maintenance included.
-- **Simplicity of design:** Straight-forward architecture, robustness over
-  features.
-
-## Key features
-
-- Drop-in replacement for Fedora4 (with some
-  [caveats](doc/notes/fcrepo4_deltas.md)); currently being tested with Hyrax 2
-- Very stable persistence layer based on [LMDB](https://symas.com/lmdb/) and
-  filesystem. Fully ACID-compliant writes guarantee consistency of data.
-- Term-based search (*planned*) and SPARQL Query API + UI
-- No performance penalty for storing many resources under the same container; no
-  [kudzu](https://www.nature.org/ourinitiatives/urgentissues/land-conservation/forests/kudzu.xml)
-  pairtree segmentation <sup id="a1">[1](#f1)</sup>
-- Extensible [provenance metadata](doc/notes/model.md) tracking
-- [Multi-modal access](doc/notes/architecture.md#multi-modal-access): HTTP
-  (REST), command line interface and native Python API.
-- Fits in a pocket: you can carry 50M triples in an 8Gb memory stick.
-
-Implementation of the official [Fedora API specs](https://fedora.info/spec/)
-(Fedora 5.x and beyond) is not
-foreseen in the short term, however it would be a natural evolution of this
-project if it gains support.
-
-Please make sure you read the [Delta document](doc/notes/fcrepo4_deltas.md) for
-divergences with the official Fedora4 implementation.
-
-## Target Audience
-
-LAKEsuperior is for anybody who cares about preserving data in the long term.
-
-Less vaguely, LAKEsuperior is targeted at who needs to store large quantities
-of highly linked metadata and documents.
-
-Its Python/C environment and API make it particularly well suited for academic
-and scientific environments who would be able to embed it in a Python
-application as a library or extend it via plug-ins.
-
-LAKEsuperior is able to be exposed to the Web as a
-[Linked Data Platform](https://www.w3.org/TR/ldp-primer/) server. It also acts
-as a SPARQL query (read-only) endpoint, however it is not meant to be used as
-a full-fledged triplestore at the moment.
-
-In its current status, LAKEsuperior is aimed at developers and
-hands-on managers who are interested in evaluating this project.
-
-## Quick Install: Running in Docker
-
-Thanks to @mbklein for docker image and instructions.
-
-You can run LAKEsuperior in Docker for a hands-off quickstart.
-
-[Docker](http://docker.com/) is a containerization platform that allows you to run
-services in lightweight virtual machine environments without having to worry about
-installing all of the prerequisites on your host machine.
-
-1. Install the correct [Docker Community Edition](https://www.docker.com/community-edition)
-   for your operating system.
-1. Clone this repo: `git clone https://github.com/scossu/lakesuperior.git`
-1. `cd` into repo folder
-1. Run `docker-compose up`
-
-LAKEsuperior should now be available at `http://localhost:8000/`.
-
-The provided Docker configuration includes persistent storage as a self-container Docker
-volume, meaning your data will persist between runs. If you want to clear the decks,
-simply run `docker-compose down -v`.
-
-## Manual Install (a bit less quick, a bit more power)
-
-**Note:** These instructions have been tested on Linux. They may work on Darwin
-with little modification, and possibly on Windows with some
-modifications. Feedback is welcome.
-
-### Dependencies
-
-1. Python 3.5 or greater.
-1. A message broker supporting the STOMP protocol. For testing and evaluation
-purposes, [CoilMQ](https://github.com/hozn/coilmq) is included with the
-dependencies and should be automatically installed.
-
-### Installation steps
-
-1. Create a virtualenv in a project folder:
-   `virtualenv -p <python 3.5+ exec path> <virtualenv folder>`
-1. Activate the virtualenv: `source <path_to_virtualenv>/bin/activate`
-1. Clone this repo: `git clone https://github.com/scossu/lakesuperior.git`
-1. `cd` into repo folder
-1. Install dependencies: `pip install -r requirements.txt`
-1. Start your STOMP broker, e.g.: `coilmq &`. If you have another queue manager
-   listening to port 61613 you can either configure a different port on the
-   application configuration, or use the existing message queue.
-1. Run `./lsup-admin bootstrap` to initialize the binary and graph stores
-1. Run `./fcrepo`.
-
-### Configuration
-
-The app should run for testing and evaluation purposes without any further
-configuration. All the application data are stored by default in the `data`
-directory.
-
-To change the default configuration you should:
-
-1. Copy the `etc.skeleton` folder to a separate location
-1. Set the configuration folder location in the environment:
-   `export FCREPO_CONFIG_DIR=<your config dir location>` (you can
-   add this line at the end of your virtualenv `activate` script)
-1. Configure the application
-1. Bootstrap the app or copy the original data folders to the new location if
-   any loction options changed
-1. (Re)start the server: `./fcrepo`
-
-The configuration options are documented in the files.
-
-**Note:** `test.yml` must specify a different location for the graph and for
-the binary stores than the default one, otherwise running a test suite will
-destroy your main data store. The application will issue an error message and
-refuse to start if these locations overlap.
-
-### Production deployment
-
-If you like fried repositories for lunch, deploy before 11AM.
-
-## Status and development
-
-LAKEsuperior is in **alpha** status. Please see the
-[project issues](https://github.com/scossu/lakesuperior/issues) list for a
-rudimentary road map.
-
-## Contributing
-
-This has been so far a single person's off-hours project (with much input from
-several sides). In order to turn into anything close to a Beta release and
-eventually to a production-ready implementation, it needs some community love.
-
-Contributions are welcome in all forms, including ideas, issue reports, or
-even just spinning up the software and providing some feedback. LAKEsuperior is
-meant to live as a community project.
-
-## Technical documentation
-
-[Architecture Overview](doc/notes/architecture.md)
-
-[Content Model](doc/notes/model.md)
-
-[Messaging](doc/notes/messaging.md)
-
-[Migration, Backup & Restore](doc/notes/migration.md)
-
-[Command-Line Reference](doc/notes/cli.md)
-
-[Storage Implementation](doc/notes/storage.md)
-
-[Performance Benchmarks](doc/notes/performance.md)
-
----
-
-<b id="f1">1</b> However if your client splits pairtrees upstream, such as
-Hyrax does, that obviously needs to change to get rid of the path
-segments. [↩](#a1)

+ 64 - 0
README.rst

@@ -0,0 +1,64 @@
+LAKEsuperior
+============
+
+|build status|
+
+LAKEsuperior is an alternative `Fedora
+Repository <http://fedorarepository.org>`__ implementation.
+
+Documentation
+-------------
+
+The full documentation is maintained in `Read The Docs
+<http://lakesuperior.readthedocs.io/>`__. Please refer to that for more info.
+
+Installation
+------------
+
+The following instructions are aimed at a manual install using this git
+repository. For a hands-off install using Docker, see
+:doc:`the setup documentation <setup>`.
+
+Dependencies
+~~~~~~~~~~~~
+
+1. Python 3.5 or greater.
+2. A message broker supporting the STOMP protocol. For testing and
+   evaluation purposes, `CoilMQ <https://github.com/hozn/coilmq>`__ is
+   included with the dependencies and should be automatically installed.
+
+Installation steps
+~~~~~~~~~~~~~~~~~~
+
+1. Create a virtualenv in a project folder:
+   ``virtualenv -p <python 3.5+ exec path> <virtualenv folder>``
+2. Activate the virtualenv: ``source <path_to_virtualenv>/bin/activate``
+3. Clone this repo:
+   ``git clone https://github.com/scossu/lakesuperior.git``
+4. ``cd`` into repo folder
+5. Install dependencies: ``pip install -r requirements.txt``
+6. Start your STOMP broker, e.g.: ``coilmq &``. If you have another
+   queue manager listening to port 61613 you can either configure a
+   different port on the application configuration, or use the existing
+   message queue.
+7. Run ``./lsup-admin bootstrap`` to initialize the binary and graph
+   stores
+8. Run ``./fcrepo``.
+
+Contributing
+------------
+
+This has been so far a single person’s off-hours project (with much
+input from several sides). In order to turn into anything close to a
+Beta release and eventually to a production-ready implementation, it
+needs some community love.
+
+Contributions are welcome in all forms, including ideas, issue reports,
+or even just spinning up the software and providing some feedback.
+LAKEsuperior is meant to live as a community project.
+
+See :doc:`related document <contributing>` for further details onhow to fork,
+improve, document and test the project.
+
+.. |build status| image:: http://img.shields.io/travis/scossu/lakesuperior/master.svg?style=flat
+   :target: https://travis-ci.org/username/repo

+ 0 - 43
doc/notes/architecture.md

@@ -1,43 +0,0 @@
-# LAKEsuperior Architecture
-
-LAKEsuperior is written in Python. It is not excluded that parts of the code
-may be rewritten in [Cython](http://cython.readthedocs.io/) for performance.
-
-
-## Multi-Modal Access
-
-LAKEsuperior services and data are accessible in multiple ways:
-
-- Via HTTP. This is the canonical way to interact with LDP resources and
-  conforms quite closely to the Fedora specs (currently v4).
-- Via command line. This method includes long-running admin tasks which are not
-  available via HTTP.
-- Via a Python API. This method allows to use Python scripts to access the same
-  methods available to the two methods above in a programmatic way. It is
-  possible to write Python plugins or even to embed LAKEsuperior in a
-  Python application, even without running a web server.
-
-
-## Architecture Overview
-
-![LAKEsuperior Architecture](../assets/lakesuperior_arch.png)
-
-The LAKEsuperior REST API provides access to the underlying Python API. All
-REST and CLI operations can be replicated by a Python program accessing this
-API.
-
-The main advantage of the Python API is that it makes it very easy to maipulate
-graph and binary data without the need to serialize or deserialize native data
-structures. This matters when handling large ETL jobs for example.
-
-The Python API is divided in three main areas:
-
-- [Resource API](../../lakesuperior/api/resource.py). This API is in charge of
-  all the resource CRUD operations and implements the majority of the Fedora
-  specs.
-- [Admin API](../../lakesuperior/api/admin.py). This exposes utility methods,
-  mostly long-running maintenance jobs.
-- [Query API](../../lakesuperior/api/query.py). This provides several
-  facilities for querying repository data.
-
-

+ 0 - 33
doc/notes/cli.md

@@ -1,33 +0,0 @@
-# LAKEsuperior Command Line Reference
-
-The LAKEsuperior command line tool is used for maintenance and administration
-purposes.
-
-The script is invoked from the main install directory. The tool is
-self-documented, so this is just a redundant overview:
-
-```
-$ ./lsup_admin
-Usage: lsup-admin [OPTIONS] COMMAND [ARGS]...
-
-Options:
-  --help  Show this message and exit.
-
-  bootstrap     Bootstrap binary and graph stores.
-  check_fixity  [STUB] Check fixity of a resource.
-  check_refint  [STUB] Check referential integrity.
-  cleanup       [STUB] Clean up orphan database items.
-  copy          [STUB] Copy (backup) repository data.
-  dump          [STUB] Dump repository to disk.
-  load          [STUB] Load serialized repository data.
-  stats         Print repository statistics.
-
-```
-
-All entries marked `[STUB]` are not yet implemented, however the
-`lsup_admin <command> --help` command will issue a description of what the
-command is meant to do. Please see the [TODO](TODO) document for a rough road
-map.
-
-All of the above commands are also available via, and based upon, the native
-Python API.

+ 0 - 213
doc/notes/fcrepo4_deltas.md

@@ -1,213 +0,0 @@
-# Divergencies between lakesuperior and FCREPO4
-
-This is a (vastly incomplete) list of discrepancies between the current FCREPO4
-implementation and LAKEsuperior. More will be added as more clients will use
-it.
-
-
-## Not yet implemented (but in the plans)
-
-- Various headers handling
-- Versioning (incomplete)
-- AuthN/Z
-- Fixity check
-- Blank nodes
-
-
-## Potentially breaking changes
-
-The following  divergences may lead into incompatibilities with some clients.
-
-### Atomicity
-
-FCREPO4 supports batch atomic operations whereas a transaction can be opened
-and a number of operations (i.e. multiple R/W requests to the repository) can
-be performed. The operations are persisted in the repository only if and when
-the transaction is committed.
-
-LAKesuperior only supports atomicity for a single HTTP request. I.e. a single
-HTTTP request that should result in multiple write operations to the storage
-layer is only persisted if no exception is thrown. Otherwise, the operation is
-rolled back in order to prevent resources to be left in an inconsistent state.
-
-### Tombstone methods
-
-If a client requests a tombstone resource in
-FCREPO4 with a method other than DELETE, the server will return `405 Method Not
-Allowed` regardless of whether the tombstone exists or not.
-
-LAKEsuperior will return `405` only if the tombstone actually exists, `404`
-otherwise.
-
-### Web UI
-
-FCREPO4 includes a web UI for simple CRUD operations.
-
-Such a UI is not in the immediate LAKEsuperior development plans. However, a
-basic UI is available for read-only interaction: LDP resource browsing, SPARQL
-query and other search facilities, and administrative tools. Some of the latter
-*may* involve write operations, such as clean-up tasks.
-
-### Automatic path segment generation
-
-A `POST` request without a slug in FCREPO4 results in a pairtree consisting of
-several intermediate nodes leading to the automatically minted identifier. E.g.
-
-    POST /rest
-
-results in `/rest/8c/9a/07/4e/8c9a074e-dda3-5256-ea30-eec2dd4fcf61` being
-created.
-
-The same request in LAKEsuperior would create
-`/rest/8c9a074e-dda3-5256-ea30-eec2dd4fcf61` (obviously the identifiers will be
-different).
-
-This seems to break Hyrax at some point, but might have been fixed. This needs
-to be verified further.
-
-
-## Non-standard client breaking changes
-
-The following changes may be incompatible with clients relying on some FCREPO4
-behavior not endorsed by LDP or other specifications.
-
-### Pairtrees
-
-FCREPO4 generates "pairtree" resources if a resource is created in a path whose
-segments are missing. E.g. when creating `/a/b/c/d`, if `/a/b` and `/a/b/c` do
-not exist, FCREPO4 will create two Pairtree resources. POSTing and PUTting into
-Pairtrees is not allowed. Also, a containment triple is established between the
-closest LDPC and the created resource, e.g. if `a` exists, a `</a> ldp:contains
-</a/b/c/d>` triple is created.
-
-LAKEsuperior does not employ Pairtrees. In the example above LAKEsuperior would
-create a fully qualified LDPC for each missing segment, which can be POSTed and
-PUT to. Containment triples are created between each link in the path, i.e.
-`</a> ldp:contains </a/b>`, `</a/b> ldp:contains </a/b/c>` etc. This may
-potentially break clients relying on the direct containment model.
-
-The rationale behind this change is that Pairtrees are the byproduct of a
-limitation imposed by Modeshape and introduce complexity in the software stack
-and confusion for the client. LAKEsuperior aligns with the more intuitive UNIX
-filesystem model, where each segment of a path is a "folder" or container
-(except for the leaf nodes that can be either folders or files). In any
-case, clients are discouraged from generating deep paths in LAKEsuperior
-without a specific purpose because these resources create unnecessary data.
-
-### Non-mandatory, non-authoritative slug in version POST
-
-FCREPO4 requires a `Slug` header to POST to `fcr:versions` to create a new
-version.
-
-LAKEsuperior adheres to the more general FCREPO POST rule and if no slug is
-provided, an automatic ID is generated instead. The ID is a UUID4.
-
-Note that internally this ID is not called "label" but "uid" since it
-is treated as a fully qualified identifier. The `fcrepo:hasVersionLabel`
-predicate, however ambiguous in this context, will be kept until the adoption
-of Memento, which will change the retrieval mechanisms.
-
-Also, if a POST is issued on the same resource `fcr:versions` location using
-a version ID that already exists, LAKEsuperior will just mint a random
-identifier rather than returning an error.
-
-
-## Deprecation track
-
-LAKEsuperior offers some "legacy" options to replicate the FCREPO4 behavior,
-however encourages new development to use a different approach for some types
-of interaction.
-
-### Endpoints
-
-The FCREPO root endpoint is `/rest`. The LAKEsuperior root endpoint is `/ldp`.
-
-This should not pose a problem if a client does not have `rest` hard-coded in
-its code, but in any event, the `/rest` endpoint is provided for backwards
-compatibility.
-
-LAKEsuperior adds the (currently stub) `query` endpoint. Other endpoints for
-non-LDP services may be opened in the future.
-
-### Automatic LDP class assignment
-
-Since LAKEsuperior rejects client-provided server-managed triples, and since
-the LDP types are among them, the LDP container type is inferred from the
-provided properties: if the `ldp:hasMemberRelation` and
-`ldp:membershipResource` properties are provided, the resource is a Direct
-Container. If in addition to these the `ldp:insertedContentRelation` property
-is present, the resource is an Indirect Container. If any of the first two are
-missing, the resource is a Container (@TODO discuss: shall it be a Basic
-Container?)
-
-Clients are encouraged to omit LDP types in PUT, POST and PATCH requests.
-
-### Lenient handling
-
-FCREPO4 requires server-managed triples to be expressly indicated in a PUT
-request, unless the `Prefer` header is set to
-`handling=lenient; received="minimal"`, in which case the RDF payload must not
-have any server-managed triples.
-
-LAKEsuperior works under the assumption that client should never provide
-server-managed triples. It automatically handles PUT requests sent to existing
-resources by returning a 412 if any server managed triples are included in the
-payload. This is the same as setting `Prefer` to `handling=strict`, which is
-the default.
-
-If `Prefer` is set to `handling=lenient`, all server-managed triples sent with
-the payload are ignored.
-
-Clients using the `Prefer` header to control PUT behavior as advertised by the
-specs should not notice any difference.
-
-
-## Optional improvements
-
-The following are improvements in performance or usability that can only be taken
-advantage of if client code is adjusted.
-
-### LDP-NR content and metadata
-
-FCREPO4 relies on the `/fcr:metadata` identifier to retrieve RDF metadata about
-an LDP-NR. LAKEsuperior supports this as a legacy option, but encourages the
-use of content negotiation to do the same while offering explicit endpoints
-for RDF and non-RDF content retrieval.
-
-Any request to an LDP-NR with an `Accept` header set to one of the supported
-RDF serialization formats will yield the RDF metadata of the resource instead
-of the binary contents.
-
-The `fcr:metadata` URI returns the RDF metadata of a LDP-NR.
-
-The `fcr:content` URI returns the non-RDF content.
-
-The two optionsabove return an HTTP error if requested for a LDP-RS.
-
-### "Include" and "Omit" options for children
-
-LAKEsuperior offers an additional `Prefer` header option to exclude all
-references to child resources (i.e. by removing all the `ldp:contains` triples)
-while leaving the other server-managed triples when retrieving a resource:
-
-    Prefer: return=representation; [include | omit]="http://fedora.info/definitions/v4/repository#Children"
-
-The default behavior is to include all children URIs.
-
-### Soft-delete and purge
-
-**NOTE**: The implementation of this section is incomplete and debated.
-
-In FCREPO4 a deleted resource leaves a tombstone deleting all traces of the
-previous resource.
-
-In LAKEsuperior, a normal DELETE creates a new version snapshot of the resource
-and puts a tombstone in its place. The resource versions are still available
-in the `fcr:versions` location. The resource can be "resurrected" by
-issuing a POST to its tombstone. This will result in a `201`.
-
-If a tombstone is deleted, the resource and its versions are completely deleted
-(purged).
-
-Moreover, setting the `Prefer:no-tombstone` header option on DELETE allows to
-delete a resource and its versions directly without leaving a tombstone.

+ 0 - 260
doc/notes/indexing_strategy.md

@@ -1,260 +0,0 @@
-# LMDB Store design for RDFLib
-
-This is a log of subsequent strategies employed to store triples in LMDB.
-
-Strategy #5a is the one currently used. The rest is kept for historic reasons
-and academic curiosity (and also because it was too much work to just wipe out
-of memory).
-
-## Storage approach
-
-- Pickle quad and create MD5 or SHA1 hash.
-- Store triples in one database paired with key; store indices separately.
-
-Different strategies involve layout and number of databases.
-
-## Strategy #1
-
-- kq: key: serialized triple (1:1)
-- sk: Serialized subject: key (1:m)
-- pk: Serialized predicate: key (1:m)
-- ok: Serialized object: key (1:m)
-- (optional) lok: Serialized literal object: key (1:m)
-- (optional) tok: Serialized RDF type: key (1:m)
-- ck: Serialized context: key (1:m)
-
-### Retrieval approach
-
-To find all matches for a quad:
-
-- If all terms in the quad are bound, generate the key from the pickled
-quad and look up the triple in `kt`
-- If all terms are unbound, return an iterator of all values in `kt`.
-- If some values are bound and some unbound (most common query):
-  - Get a base list of keys associated wirh the first bound term
-  - For each subsequent bound term, check if each key associated with the term
-  matches a key in the base list
-  - Continue through all the bound terms. If a match is not found at any point,
-  continue to the next term
-  - If a match is found in all the bound term databases, look up the pickled quad
-  matching the key in `kq` and yield it
-
-More optimization can be introduced later, e.g. separating literal and RDF
-type objects in separate databases. Literals can have very long values and a
-database with a longer key setting may be useful. RDF terms can be indexed
-separately because they are the most common bound term.
-
-### Example lookup
-
-Keys and Triples (should actually be quads but this is a simplified version):
-
-A: s1 p1 o1
-B: s1 p2 o2
-C: s2 p3 o1
-D: s2 p3 o3
-
-Indices:
-
-- SK:
-  - s1: A, B
-  - s2: C, D
-- PK:
-  - p1: A
-  - p2: B
-  - p3: C, D
- - OK:
-  - o1: A, C
-  - o2: B
-  - o3: D
-
-Queries:
-
-- s1 ?p ?o → {A, B}
-- s1 p2 ?o → {A, B} & {B} = {B}
-- ?s ?p o3 → {D}
-- s1 p2 o5 → {} (Exit at OK: no term matches 'o5')
-- s2 p3 o2 → {C, D} & {C, D} & {B} = {}
-
-
-## Strategy #2
-
-Separate data and indices in two environments.
-
-### Main data store
-
-Key to quad; main keyspace; all unique.
-
-### Indices
-
-None of these databases is of critical preservation concern. They can be
-rebuilt from the main data store.
-
-All dupsort and dupfixed.
-
-@TODO The first three may not be needed if computing term hash is fast enough.
-
-- t2k (term to term key)
-- lt2k (literal to term key: longer keys)
-- k2t (term key to term)
-
-- s2k (subject key to quad key)
-- p2k (pred key to quad key)
-- o2k (object key to quad key)
-- c2k (context key to quad key)
-
-- sc2qk (subject + context keys to quad key)
-- po2qk (predicate + object keys to quad key)
-
-- sp2qk (subject + predicate keys to quad key)
-- oc2qk (object + context keys to quad key)
-
-- so2qk (subject + object keys to quad key)
-- pc2qk (predicate + context keys to quad key)
-
-
-## Strategy #3
-
-Contexts are much fewer (even in graph per aspect, 5-10 triples per graph)
-
-### Main data store
-
-Preservation-worthy data
-
-- tk:t (triple key: triple; dupsort, dupfixed)
-- tk:c (context key: triple; unique)
-
-### Indices
-
-Rebuildable from main data store
-
-- s2k (subject key: triple key)
-- p2k (pred key: triple key)
-- o2k (object key: triple key)
-- sp2k
-- so2k
-- po2k
-- spo2k
-
-### Lookup
-
-1. Look up triples by s, p, o, sp, so, po and get keys
-2. If a context is specified, for each key try to seek to (context, key) in ct
-   to verify it exists
-3. Intersect sets
-4. Match triple keys with data using kt
-
-#### Shortcuts
-
-- Get all contexts: return list of keys from ct
-- Get all triples for a context: get all values for a contex from ct and match
-  triple data with kt
-- Get one triple match for all contexts: look up in triple indices and match
-  triple data with kt
-
-
-## Strategy #4
-
-Terms are entered individually in main data store. Also, shorter keys are
-used rather than hashes. These two aspects save a great deal of space and I/O,
-but require an additional index to put the terms together in a triple.
-
-### Main Data Store
-
-- t:st (term key: serialized term; 1:1)
-- spo:c (joined S, P, O keys: context key; 1:m)
-- c: (context keys only, values are the empty bytestring)
-
-Storage total: variable
-
-### Indices
-
-- th:t (term hash: term key; 1:1)
-- c:spo (context key: joined triple keys; 1:m)
-- s:po (S key: P + O key; 1:m)
-- p:so (P key: S + O keys; 1:m)
-- o:sp (object key: triple key; 1:m)
-- sp:o (S + P keys: O key; 1:m)
-- so:p (S + O keys: P key; 1:m)
-- po:s (P + O keys: S key; 1:m)
-
-Storage total: 143 bytes per triple
-
-### Disadvantages
-
-- Lots of indices
-- Terms can get orphaned:
-  - No easy way to know if a term is used anywhere in a quad
-  - Needs some routine cleanup
-  - On the other hand, terms are relatively light-weight and can be reused
-  - Almost surely not reusable are UUIDs, message digests, timestamps etc.
-
-
-## Strategy #5
-
-Reduce number of indices and rely on parsing and splitting keys to find triples
-with two bound parameters.
-
-This is especially important for keeping indexing synchronous to achieve fully
-ACID writes.
-
-### Main data store
-
-Same as Strategy #4:
-
-- t:st (term key: serialized term; 1:1)
-- spo:c (joined S, P, O keys: context key; dupsort, dupfixed)
-- c: (context keys only, values are the empty bytestring; 1:1)
-
-Storage total: variable (same as #4)
-
-### Indices
-
-- th:t (term hash: term key; 1:1)
-- s:po (S key: joined P, O keys; dupsort, dupfixed)
-- p:so (P key: joined S, O keys; dupsort, dupfixed)
-- o:sp (O key: joined S, P keys; dupsort, dupfixed)
-- c:spo (context → triple association; dupsort, dupfixed)
-
-Storage total: 95 bytes per triple
-
-### Lookup strategy
-
-- ? ? ? c: [c:spo] all SPO for C → split key → [t:st] term from term key
-- s p o c: [c:spo] exact SPO & C match → split key → [t:st] term from term key
-- s ? ?: [s:po] All PO for S → split key → [t:st] term from term key
-- s p ?: [s:po] All PO for S → filter result by P in split key
-    → [t:st] term from term key
-
-### Advantages
-
-- Less indices: smaller index size and less I/O
-
-### Disadvantages
-
-- Possibly slower retrieval for queries with 2 bound terms (run metrics)
-
-### Further optimization
-
-In order to minimize traversing and splittig results, the first retrieval
-should be made on the term with less average keys. Search order can be balanced
-by establishing a lookup order for indices.
-
-This can be achieved by calling stats on the index databases and looking up the
-database with *most* keys. Since there is an equal number of entries in each of
-the (s:po, p:so, o:sp) indices, the one with most keys will have the least
-average number of values per key. If that lookup is done first, the initial
-data set to traverse and filter will be smaller.
-
-
-## Strategy #5a
-
-This is a slightly different implementation of #5 that somewhat simplifies and
-perhaps speeds up things a bit. It is the currently employed solution.
-
-The indexing and lookup strtegy is the same; but instead of using a separator
-byte for splitting compound keys, the logic relies on the fact that keys have
-a fixed length and are sliced instead. This *should* result in faster key
-manipulation, also because in most cases `memoryview` buffers can be used
-directly instead of being copied from memory.
-
-Index storage is 90 bytes per triple.

+ 0 - 27
doc/notes/messaging.md

@@ -1,27 +0,0 @@
-# LAKEsuperior Messaging
-
-LAKEsuperior implements a messaging system based on ActivityStreams, as
-indicated by the
-[Feodra API specs](https://fedora.info/2017/06/30/spec/#notifications).
-The metadata set provided is currently quite minimal but can be easily
-enriched by extending the
-[default formatter class](https://github.com/scossu/lakesuperior/blob/master/lakesuperior/messaging/messenger.py).
-
-STOMP is the only supported protocol at the moment. More protocols may be made
-available at a later time.
-
-LAKEsuperior can send messages to any number of destinations: see
-[configuration](https://github.com/scossu/lakesuperior/blob/master/etc.defaults/application.yml#L79).
-By default, CoilMQ is provided for testing purposes and listens to
-`localhost:61613`. The default route sends messages to `/topic/fcrepo`.
-
-A small command-line utility, also provided with the Python dependencies,
-allows to watch incoming messages. To monitor messages, enter the following
-*after activating your virtualenv*:
-
-```
-stomp -H localhost -P 61613 -L /topic/fcrepo
-```
-
-See the [stomp.py library reference page](https://github.com/jasonrbriggs/stomp.py/wiki/Command-Line-Access)
-for details.

+ 0 - 57
doc/notes/migration.md

@@ -1,57 +0,0 @@
-# Migration, Backup & Restore
-
-The LAKEsuperior dataset is by default fully contained in a folder. This means
-that only the data, configuration and code are needed for it to run.
-No Postgres, Redis, or such. These folders can be moved around as needed.
-
-## Migration Tool
-
-Migration is the process of importing and converting data from a different
-Fedora or LDP implementation into a new LAKEsuperior instance. This process
-uses the HTTP/LDP API of the original repository. A command-line utility is
-available as part of the `lsup-admin` suite to assist in such operation.
-
-A repository can be migrated with a one-line command such as:
-
-```
-./lsup-admin migrate http://source-repo.edu/rest /local/dest/folder
-```
-
-For more options, enter
-
-```
-./lsup-admin migrate --help
-```
-
-The script will crawl through the resources and crawl through outbound links
-within them. In order to do this, resources are added as raw triples (i.e.
-no consistency checks are made).
-
-**Note:** the consistency check tool has not yet been implemented at the moment
-but its release should follow shortly. This will ensure that all the links
-between resources are consistent in regard to referential integrity.
-
-This script will create a full dataset in the specified destination folder,
-complete with a default configuration that allows to start the LAKEsuperior
-server immediately after the migration is complete.
-
-Two approaches to migration are possible:
-
-1. By providing a starting point on the source repository. E.g. if the
-   repository you want to migrate is at `http://repo.edu/rest/prod` you can add
-   the `-s /prod` option to the script to avoid migrating irrelevant branches.
-   Note that the script will still reach outside of the starting point if
-   resources are referencing other resources outside of it.
-2. By providing a file containing a list of resources to migrate. This is
-   useful if a source repository cannot produce a full list (e.g. the root node
-   has more children than the server can handle) but a list of individual
-   resources is available via an external index (Solr, triplestore, etc.).
-   The resources can be indicated by their fully qualified URIs or paths
-   relative to the repository root. (*TODO latter option needs testing*)
-
-## Backup & Restore
-
-A back up of a LAKEshore repository consists in copying the RDF and non-RDF
-data folders. The location of these folders is indicated in the application
-configuration. The default commands provided by your OS (`cp`, `rsync`,
-`tar` etc. for Unix) are all is needed.

+ 0 - 63
doc/notes/model.md

@@ -1,63 +0,0 @@
-# LAKEsuperior Content Model Rationale
-
-## Internal and Public URIs; Identifiers
-
-Resource URIs are stored internally in LAKEsuperior as domain-agnostic URIs
-with the scheme `info:fcres<resource UID>`. This allows resources to be
-portable across systems. E.g. a resource with an internal URI of
-`info:fcres/a/b/c`, when accessed via the `http://localhost:8000/ldp`
-endpoint, will be found at `http://localhost:8000/ldp/a/b/c`.
-
-The resource UID making up the looks like a UNIX
-filesystem path, i.e. it always starts with a forward slash and can be made up
-of multiple segments separated by slashes. E.g. `/` is the root node UID,
-`/a` is a resource UID just below root. their internal URIs are `info:fcres/`
-and `info:fcres/a` respectively.
-
-In the Python API, the UID and internal URI of an LDP resource can be accessed
-via the `uid` and `uri` properties respectively:
-
-```
->>> import lakesuperior.env_setup
->>> from lakesuperior.api import resource
->>> rsrc = resource.get('/a/b/c')
->>> rsrc.uid
-/a/b/c
->>> rsrc.uri
-rdflib.terms.URIRef('info:fcres/a/b/c')
-```
-
-## Store Layout
-
-One of the key concepts in LAKEsuperior is the store layout. This is a
-module built with a
-specific purpose in mind, i.e. allowing fine-grained recording of provenance
-metadata while providing reasonable performance.
-
-Store layout modules could be replaceable (work needs to
-be done to develop an interface to allow that). The default (and only at the
-moment) layout shipped with LAKEsuperior is the
-[resource-centric layout](../../lakesuperior/store/ldp_rs/rsrc_centric_layout).
-This layout implements a so-called
-[graph-per-aspect pattern](http://patterns.dataincubator.org/book/graph-per-aspect.html)
-which stores different sets of statements about a resource in separate named
-graphs.
-
-The named graphs used for each resource are:
-
-- An admin graph (`info:fcsystem/graph/admin<resource UID>`) which stores
-  administrative metadata, mostly server-managed triples such as LDP types,
-  system create/update timestamps and agents, etc.
-- A structure graph (`info:fcsystem/graph/structure<resource UID>`) reserved for
-  containment triples. The reason
-  for this separation is purely convenience, since it makes it easy to retrieve
-  all the properties of a large container without its child references.
-- One (and, possibly, in the future, more user-defined) named graph for
-  user-provided data (`info:fcsystem/graph/userdata/_main<resource UID>`).
-
-Each of these graphs can be annotated with provenance metadata. The layout
-decides which triples go in which graph based on the predicate or RDF type
-contained in the triple. Adding logic to support arbitrary named graphs based
-e.g. on user agent, or to add more provenance information, should be relatively
-simple.
-

+ 0 - 112
doc/notes/performance.md

@@ -1,112 +0,0 @@
-# Performance Benchmark Report
-
-## Environment
-
-### Hardware
-
-#### ‘Rather Snappy’ Laptop
-
-- Dell Precison M3800 Laptop
-- 4x Intel(R) Core(TM) i7-4712HQ CPU @ 2.30GHz
-- 12Gb RAM
-- SSD
-
-#### ‘Ole Workhorse’ server
-
-8x Intel(R) Xeon(R) CPU X5550  @ 2.67GHz
-16Gb RAM
-Magnetic drive, XXX RPM
-
-### Software
-
-- Arch Linux OS
-- glibc 2.26-11
-- python 3.5.4
-- lmdb 0.9.21-1
-
-### Benchmark script
-
-[Generator script](../../util/benchmark.py)
-
-The script was run with default values: 10,000 children under the same parent,
-PUT requests.
-
-### Data Set
-
-Synthetic graph created by the benchmark script. The graph is unique for each
-request and consists of 200 triples which are partly random data, with a
-consistent size and variation:
-
-- 50 triples have an object that is a URI of an external resource (50 unique
-  predicates; 5 unique objects).
-- 50 triples have an object that is a URI of a repository-managed resource
-  (50 unique predicates; 5 unique objects).
-- 100 triples have an object that is a 64-character random Unicode string
-  (50 unique predicates; 100 unique objects).
-
-## Results
-
-### ‘Rather Snappy’ Laptop
-
-#### FCREPO/Modeshape 4.7.5
-
-15'45" running time
-
-0.094" per resource (100%—reference point)
-
-3.4M triples total in repo at the end of the process
-
-Retrieval of parent resource (~10000 triples), pipe to /dev/null: 3.64" (100%)
-
-Peak memory usage: 2.47Gb
-
-Database size: 3.3 Gb
-
-
-#### LAKEsuperior Alpha 6, LMDB Back End
-
-25' running time
-
-0.152" per resource (161%)
-
-*Some gaps every ~40-50 requests, probably disk flush*
-
-Retrieval of parent resource (10K triples), pipe to /dev/null: 2.13" (58%)
-
-Peak memory usage: ~650 Mb (3 idle workers, 1 active)
-
-Database size: 523 Mb (16%)
-
-### ‘Ole Workhorse’ server
-
-#### FCREPO
-
-0:47:38 running time
-
-0.285" per resource (100%)
-
-Retrieval of parent resource: 9.6" (100%)
-
-#### LAKEsuperior
-
-1:14:19 running time
-
-0.446" per resource (156%)
-
-Retrieval of parent resource: 5.58" (58%)
-
-## Conclusions
-
-LAKEsuperior appears to be markedly slower on writes and markedly faster on
-reads. Both these factors are very likely related to the underlying LMDB store
-which is optimized for read performance.
-
-Comparison of results between the laptop and the server demonstrates that both
-read and write performance gaps
-are identical in the two environments. Disk speed severely affects the numbers.
-
-**Note:** As you can guess, these are only very partial and specific results. They
-should not be taken as a thorough performance assessment. Such an assessment
-may be impossible and pointless to make given the very different nature of
-the storage models, which may behave radically differently depending on many
-variables.

+ 0 - 82
doc/notes/storage.md

@@ -1,82 +0,0 @@
-# Storage Implementation
-
-LAKEsuperior stores non-RDF ("binary") data in the filesystem and RDF data in
-an embedded key-value store, [LMDB](https://symas.com/lmdb/).
-
-## RDF Storage design
-
-LMDB is a very fast, very lightweight C library. It is inspired by BerkeleyDB
-but introduces significant improvements in terms of efficiency and stability.
-
-The LAKEsuperior RDF store consists of two files: the main data store and the
-indices (plus two lock files that are generated at runtime). A good amount of
-effort has been put to develop an indexing strategy that is balanced between
-write performance, read performance, and data size, with no compromise made on
-consistency.
-
-The main data
-store is the one containing the preservation-worthy data. While the indices are
-necessary for LAKEsuperior to function, they can be entirely rebuilt from the
-main data store in case of file corruption (recovery tools are on the TODO
-list).
-
-Detailed notes about the various strategies researched can be found
-[here](indexing_strategy.md).
-
-## Scalability
-
-Since LAKEsuperior is focused on design simplicity, efficiency and reliability,
-its RDF store is embedded and not horizontally scalable. However, LAKEsuperior
-is quite frugal with disk space. About 55 million triples can be
-stored in 8Gb of space (mileage can vary depending on how heterogeneous the
-triples are). This makes it easier to use expensive SSD drives for
-the RDF store, in order to improve performance. A single LMDB environment can
-reportedly scale up to 128 terabytes.
-
-## Maintenance
-
-LMDB has a very simple configuration, and all options are hardcoded
-in LAKESuperior in order to exploit its features. A database automatically
-recovers from a crash.
-
-The LAKEsuperior RDF store abstraction maintains a registry of unique terms.
-These terms are not deleted if a triple is deleted, even if no triple is using
-them, because it would be too expesive to look up for orphaned terms during a
-delete request. While these terms are relatively lightweight, it would be good
-to run a periodical clean-up job. Tools will be developed in the near future to
-facilitate this maintenance task.
-
-## Consistency
-
-LAKEsuperior wraps each LDP operation in a transaction. The indices are updated
-synchronously within the same transaction in order to guarantee
-consistency. If a system loses power or crashes, only the last transaction is
-lost, and the last successful write will include primary and index data.
-
-## Concurrency
-
-LMDB employs
-[MVCC](https://en.wikipedia.org/wiki/Multiversion_concurrency_control)
-to achieve fully ACID transactions. This implies that during
-a write, the whole database is locked. Multiple writes can be initiated
-concurrently, but the performance gain of doing so may be little because
-only one write operation can be performed at a time. Reasonable efforts have
-been put to make write transactions as short as possible (and more can be
-done). Also, this excludes a priori the option to implement long-running atomic
-operations, unless one is willing to block writes on the application for an
-indefinite length of time. On the other hand, write operations never block and
-are never blocked, so an application with a high read-to-write ratio may still
-benefit from multi-threaded requests.
-
-## Performance
-
-The [Performance Benchmark Report](performance.md) contains benchmark results.
-
-Write performance is lower than Modeshape/Fedora4; this may be mostly due to
-the fact that indices are written synchronously in a blocking transaction;
-also, the LMDB B+Tree structure is optimized for read performance rather than
-write performance. Some optimizations on the application layer could be made.
-
-Reads are faster than Modeshape/Fedora.
-
-All tests so far have been performed in a single thread.

+ 22 - 0
docs/Makefile

@@ -0,0 +1,22 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+# Workaround to prevent lmdb from throwing an exception when loaded from Sphinx
+SPHINXBUILD   = python -m sphinx
+#SPHINXBUILD   = sphinx-build
+SPHINXPROJ    = lakesuperior
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

+ 32 - 0
docs/api.rst

@@ -0,0 +1,32 @@
+API Documentation
+==================
+
+.. module:: lakesuperior
+
+Resource API
+~~~~~~~~~~~~
+
+.. automodule:: lakesuperior.api.resource
+   :members:
+
+Query API
+~~~~~~~~~
+
+.. automodule:: lakesuperior.api.query
+   :members:
+
+Admin API
+~~~~~~~~~
+
+.. automodule:: lakesuperior.api.admin
+   :members:
+
+
+Full API docs
+~~~~~~~~~~~~~
+
+.. toctree::
+   :maxdepth: 3
+   :glob:
+
+   apidoc/*

+ 46 - 0
docs/apidoc/lakesuperior.model.rst

@@ -0,0 +1,46 @@
+lakesuperior\.model package
+===========================
+
+Submodules
+----------
+
+lakesuperior\.model\.ldp\_factory module
+----------------------------------------
+
+.. automodule:: lakesuperior.model.ldp_factory
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.model\.ldp\_nr module
+-----------------------------------
+
+.. automodule:: lakesuperior.model.ldp_nr
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.model\.ldp\_rs module
+-----------------------------------
+
+.. automodule:: lakesuperior.model.ldp_rs
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.model\.ldpr module
+--------------------------------
+
+.. automodule:: lakesuperior.model.ldpr
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+
+Module contents
+---------------
+
+.. automodule:: lakesuperior.model
+    :members:
+    :undoc-members:
+    :show-inheritance:

+ 85 - 0
docs/apidoc/lakesuperior.rst

@@ -0,0 +1,85 @@
+lakesuperior package
+====================
+
+Subpackages
+-----------
+
+.. toctree::
+
+    lakesuperior.model
+
+Submodules
+----------
+
+lakesuperior\.app module
+------------------------
+
+.. automodule:: lakesuperior.app
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.config\_parser module
+-----------------------------------
+
+.. automodule:: lakesuperior.config_parser
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.env module
+------------------------
+
+.. automodule:: lakesuperior.env
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.env\_setup module
+-------------------------------
+
+.. automodule:: lakesuperior.env_setup
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.exceptions module
+-------------------------------
+
+.. automodule:: lakesuperior.exceptions
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.globals module
+----------------------------
+
+.. automodule:: lakesuperior.globals
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.migrator module
+-----------------------------
+
+.. automodule:: lakesuperior.migrator
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+lakesuperior\.toolbox module
+----------------------------
+
+.. automodule:: lakesuperior.toolbox
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+
+Module contents
+---------------
+
+.. automodule:: lakesuperior
+    :members:
+    :undoc-members:
+    :show-inheritance:

+ 7 - 0
docs/apidoc/modules.rst

@@ -0,0 +1,7 @@
+lakesuperior
+============
+
+.. toctree::
+   :maxdepth: 4
+
+   lakesuperior

+ 48 - 0
docs/architecture.rst

@@ -0,0 +1,48 @@
+LAKEsuperior Architecture
+=========================
+
+LAKEsuperior is written in Python. It is not excluded that parts of the
+code may be rewritten in `Cython <http://cython.readthedocs.io/>`__ for
+performance.
+
+Multi-Modal Access
+------------------
+
+LAKEsuperior services and data are accessible in multiple ways:
+
+-  Via HTTP. This is the canonical way to interact with LDP resources
+   and conforms quite closely to the Fedora specs (currently v4).
+-  Via command line. This method includes long-running admin tasks which
+   are not available via HTTP.
+-  Via a Python API. This method allows to use Python scripts to access
+   the same methods available to the two methods above in a programmatic
+   way. It is possible to write Python plugins or even to embed
+   LAKEsuperior in a Python application, even without running a web
+   server.
+
+Architecture Overview
+---------------------
+
+.. figure:: assets/lakesuperior_arch.png
+   :alt: LAKEsuperior Architecture
+
+   LAKEsuperior Architecture
+
+The LAKEsuperior REST API provides access to the underlying Python API.
+All REST and CLI operations can be replicated by a Python program
+accessing this API.
+
+The main advantage of the Python API is that it makes it very easy to
+maipulate graph and binary data without the need to serialize or
+deserialize native data structures. This matters when handling large ETL
+jobs for example.
+
+The Python API is divided in three main areas:
+
+-  Resource API: this API in charge of all the resource CRUD operations and
+   implements the majority of the Fedora specs.
+-  Admin API: exposes utility methods, mostly long-running maintenance jobs.
+-  Query API: provides several facilities for querying repository data.
+
+
+See :doc:`API documentation<api>` for more details.

+ 0 - 0
doc/assets/lakesuperior_arch.png → docs/assets/lakesuperior_arch.png


+ 0 - 0
doc/assets/lakesuperior_recommendations.pdf → docs/assets/lakesuperior_recommendations.pdf


TEMPAT SAMPAH
docs/assets/profile_1K_children_get.pdf


+ 33 - 0
docs/cli.rst

@@ -0,0 +1,33 @@
+LAKEsuperior Command Line Reference
+===================================
+
+The LAKEsuperior command line tool is used for maintenance and
+administration purposes.
+
+The script is invoked from the main install directory. The tool is
+self-documented, so this is just a redundant overview:
+
+::
+
+    $ ./lsup_admin
+    Usage: lsup-admin [OPTIONS] COMMAND [ARGS]...
+
+    Options:
+      --help  Show this message and exit.
+
+      bootstrap     Bootstrap binary and graph stores.
+      check_fixity  [STUB] Check fixity of a resource.
+      check_refint  [STUB] Check referential integrity.
+      cleanup       [STUB] Clean up orphan database items.
+      copy          [STUB] Copy (backup) repository data.
+      dump          [STUB] Dump repository to disk.
+      load          [STUB] Load serialized repository data.
+      stats         Print repository statistics.
+
+All entries marked ``[STUB]`` are not yet implemented, however the
+``lsup_admin <command> --help`` command will issue a description of what
+the command is meant to do. Please see the `TODO <TODO>`__ document for
+a rough road map.
+
+All of the above commands are also available via, and based upon, the
+native Python API.

+ 178 - 0
docs/conf.py

@@ -0,0 +1,178 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+#
+# lakesuperior documentation build configuration file, created by
+# sphinx-quickstart on Sat Mar 24 23:05:46 2018.
+#
+# This file is execfile()d with the current directory set to its
+# containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+#sys.path.insert(0, os.path.abspath('../'))
+sys.path.append(os.path.abspath('../'))
+import lakesuperior.env_setup
+
+# -- General configuration ------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = ['sphinx.ext.autodoc',
+    'sphinx.ext.intersphinx',
+    'sphinx.ext.todo',
+    'sphinx.ext.coverage',
+    'sphinx.ext.imgmath',
+    'sphinx.ext.viewcode']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+# source_suffix = ['.rst', '.md']
+source_suffix = '.rst'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = 'lakesuperior'
+copyright = '2018, Everybody & Nobody'
+author = 'Stefano Cossu'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+version = '1.0-alpha'
+# The full version, including alpha/beta/rc tags.
+release = '1.0.0-alpha.8'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This patterns also effect to html_static_path and html_extra_path
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = True
+
+
+# -- Options for HTML output ----------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# This is required for the alabaster theme
+# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
+html_sidebars = {
+    '**': [
+        'relations.html',  # needs 'show_related': True theme option to display
+        'searchbox.html',
+    ]
+}
+
+
+# -- Options for HTMLHelp output ------------------------------------------
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'lakesuperiordoc'
+
+
+# -- Options for LaTeX output ---------------------------------------------
+
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'lakesuperior.tex', 'lakesuperior Documentation',
+     'Stefano Cossu', 'manual'),
+]
+
+
+# -- Options for manual page output ---------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    (master_doc, 'lakesuperior', 'lakesuperior Documentation',
+     [author], 1)
+]
+
+
+# -- Options for Texinfo output -------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'lakesuperior', 'lakesuperior Documentation',
+     author, 'lakesuperior', 'One line description of project.',
+     'Miscellaneous'),
+]
+
+
+
+
+# Example configuration for intersphinx: refer to the Python standard library.
+intersphinx_mapping = {'https://docs.python.org/': None}

+ 0 - 0
doc/examples/store_layouts/graph_per_aspect.trig → docs/examples/store_layouts/graph_per_aspect.trig


+ 0 - 0
doc/examples/store_layouts/graph_per_resource+.trig → docs/examples/store_layouts/graph_per_resource+.trig


+ 0 - 0
doc/examples/store_layouts/graph_per_resource.trig → docs/examples/store_layouts/graph_per_resource.trig


+ 239 - 0
docs/fcrepo4_deltas.rst

@@ -0,0 +1,239 @@
+Divergencies between lakesuperior and FCREPO4
+=============================================
+
+This is a (vastly incomplete) list of discrepancies between the current
+FCREPO4 implementation and LAKEsuperior. More will be added as more
+clients will use it.
+
+Not yet implemented (but in the plans)
+--------------------------------------
+
+-  Various headers handling
+-  Versioning (incomplete)
+-  AuthN/Z
+-  Fixity check
+-  Blank nodes
+
+Potentially breaking changes
+----------------------------
+
+The following divergences may lead into incompatibilities with some
+clients.
+
+Atomicity
+~~~~~~~~~
+
+FCREPO4 supports batch atomic operations whereas a transaction can be
+opened and a number of operations (i.e. multiple R/W requests to the
+repository) can be performed. The operations are persisted in the
+repository only if and when the transaction is committed.
+
+LAKesuperior only supports atomicity for a single HTTP request. I.e. a
+single HTTTP request that should result in multiple write operations to
+the storage layer is only persisted if no exception is thrown.
+Otherwise, the operation is rolled back in order to prevent resources to
+be left in an inconsistent state.
+
+Tombstone methods
+~~~~~~~~~~~~~~~~~
+
+If a client requests a tombstone resource in FCREPO4 with a method other
+than DELETE, the server will return ``405 Method Not Allowed``
+regardless of whether the tombstone exists or not.
+
+LAKEsuperior will return ``405`` only if the tombstone actually exists,
+``404`` otherwise.
+
+Web UI
+~~~~~~
+
+FCREPO4 includes a web UI for simple CRUD operations.
+
+Such a UI is not in the immediate LAKEsuperior development plans.
+However, a basic UI is available for read-only interaction: LDP resource
+browsing, SPARQL query and other search facilities, and administrative
+tools. Some of the latter *may* involve write operations, such as
+clean-up tasks.
+
+Automatic path segment generation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A ``POST`` request without a slug in FCREPO4 results in a pairtree
+consisting of several intermediate nodes leading to the automatically
+minted identifier. E.g.
+
+::
+
+    POST /rest
+
+results in ``/rest/8c/9a/07/4e/8c9a074e-dda3-5256-ea30-eec2dd4fcf61``
+being created.
+
+The same request in LAKEsuperior would create
+``/rest/8c9a074e-dda3-5256-ea30-eec2dd4fcf61`` (obviously the
+identifiers will be different).
+
+This seems to break Hyrax at some point, but might have been fixed. This
+needs to be verified further.
+
+Non-standard client breaking changes
+------------------------------------
+
+The following changes may be incompatible with clients relying on some
+FCREPO4 behavior not endorsed by LDP or other specifications.
+
+Pairtrees
+~~~~~~~~~
+
+FCREPO4 generates “pairtree” resources if a resource is created in a
+path whose segments are missing. E.g. when creating ``/a/b/c/d``, if
+``/a/b`` and ``/a/b/c`` do not exist, FCREPO4 will create two Pairtree
+resources. POSTing and PUTting into Pairtrees is not allowed. Also, a
+containment triple is established between the closest LDPC and the
+created resource, e.g. if ``a`` exists, a
+``</a> ldp:contains </a/b/c/d>`` triple is created.
+
+LAKEsuperior does not employ Pairtrees. In the example above
+LAKEsuperior would create a fully qualified LDPC for each missing
+segment, which can be POSTed and PUT to. Containment triples are created
+between each link in the path, i.e. ``</a> ldp:contains </a/b>``,
+``</a/b> ldp:contains </a/b/c>`` etc. This may potentially break clients
+relying on the direct containment model.
+
+The rationale behind this change is that Pairtrees are the byproduct of
+a limitation imposed by Modeshape and introduce complexity in the
+software stack and confusion for the client. LAKEsuperior aligns with
+the more intuitive UNIX filesystem model, where each segment of a path
+is a “folder” or container (except for the leaf nodes that can be either
+folders or files). In any case, clients are discouraged from generating
+deep paths in LAKEsuperior without a specific purpose because these
+resources create unnecessary data.
+
+Non-mandatory, non-authoritative slug in version POST
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+FCREPO4 requires a ``Slug`` header to POST to ``fcr:versions`` to create
+a new version.
+
+LAKEsuperior adheres to the more general FCREPO POST rule and if no slug
+is provided, an automatic ID is generated instead. The ID is a UUID4.
+
+Note that internally this ID is not called “label” but “uid” since it is
+treated as a fully qualified identifier. The ``fcrepo:hasVersionLabel``
+predicate, however ambiguous in this context, will be kept until the
+adoption of Memento, which will change the retrieval mechanisms.
+
+Another notable difference is that if a POST is issued on the same resource
+``fcr:versions`` location using a version ID that already exists, LAKEsuperior
+will just mint a random identifier rather than returning an error.
+
+Deprecation track
+-----------------
+
+LAKEsuperior offers some “legacy” options to replicate the FCREPO4
+behavior, however encourages new development to use a different approach
+for some types of interaction.
+
+Endpoints
+~~~~~~~~~
+
+The FCREPO root endpoint is ``/rest``. The LAKEsuperior root endpoint is
+``/ldp``.
+
+This should not pose a problem if a client does not have ``rest``
+hard-coded in its code, but in any event, the ``/rest`` endpoint is
+provided for backwards compatibility.
+
+Automatic LDP class assignment
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since LAKEsuperior rejects client-provided server-managed triples, and
+since the LDP types are among them, the LDP container type is inferred
+from the provided properties: if the ``ldp:hasMemberRelation`` and
+``ldp:membershipResource`` properties are provided, the resource is a
+Direct Container. If in addition to these the
+``ldp:insertedContentRelation`` property is present, the resource is an
+Indirect Container. If any of the first two are missing, the resource is
+a Container (@TODO discuss: shall it be a Basic Container?)
+
+Clients are encouraged to omit LDP types in PUT, POST and PATCH
+requests.
+
+Lenient handling
+~~~~~~~~~~~~~~~~
+
+FCREPO4 requires server-managed triples to be expressly indicated in a
+PUT request, unless the ``Prefer`` header is set to
+``handling=lenient; received="minimal"``, in which case the RDF payload
+must not have any server-managed triples.
+
+LAKEsuperior works under the assumption that client should never provide
+server-managed triples. It automatically handles PUT requests sent to
+existing resources by returning a 412 if any server managed triples are
+included in the payload. This is the same as setting ``Prefer`` to
+``handling=strict``, which is the default.
+
+If ``Prefer`` is set to ``handling=lenient``, all server-managed triples
+sent with the payload are ignored.
+
+Clients using the ``Prefer`` header to control PUT behavior as
+advertised by the specs should not notice any difference.
+
+Optional improvements
+---------------------
+
+The following are improvements in performance or usability that can only
+be taken advantage of if client code is adjusted.
+
+LDP-NR content and metadata
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+FCREPO4 relies on the ``/fcr:metadata`` identifier to retrieve RDF
+metadata about an LDP-NR. LAKEsuperior supports this as a legacy option,
+but encourages the use of content negotiation to do the same while
+offering explicit endpoints for RDF and non-RDF content retrieval.
+
+Any request to an LDP-NR with an ``Accept`` header set to one of the
+supported RDF serialization formats will yield the RDF metadata of the
+resource instead of the binary contents.
+
+The ``fcr:metadata`` URI returns the RDF metadata of a LDP-NR.
+
+The ``fcr:content`` URI returns the non-RDF content.
+
+The two optionsabove return an HTTP error if requested for a LDP-RS.
+
+“Include” and “Omit” options for children
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+LAKEsuperior offers an additional ``Prefer`` header option to exclude
+all references to child resources (i.e. by removing all the
+``ldp:contains`` triples) while leaving the other server-managed triples
+when retrieving a resource:
+
+::
+
+    Prefer: return=representation; [include | omit]="http://fedora.info/definitions/v4/repository#Children"
+
+The default behavior is to include all children URIs.
+
+Soft-delete and purge
+~~~~~~~~~~~~~~~~~~~~~
+
+**NOTE**: The implementation of this section is incomplete and debated.
+
+In FCREPO4 a deleted resource leaves a tombstone deleting all traces of
+the previous resource.
+
+In LAKEsuperior, a normal DELETE creates a new version snapshot of the
+resource and puts a tombstone in its place. The resource versions are
+still available in the ``fcr:versions`` location. The resource can be
+“resurrected” by issuing a POST to its tombstone. This will result in a
+``201``.
+
+If a tombstone is deleted, the resource and its versions are completely
+deleted (purged).
+
+Moreover, setting the ``Prefer:no-tombstone`` header option on DELETE
+allows to delete a resource and its versions directly without leaving a
+tombstone.

+ 133 - 0
docs/index.rst

@@ -0,0 +1,133 @@
+LAKEsuperior
+============
+
+|build status|
+
+LAKEsuperior is an alternative `Fedora
+Repository <http://fedorarepository.org>`__ implementation.
+
+Fedora is a mature repository software system historically adopted by
+major cultural heritage institutions. It exposes an
+`LDP <https://www.w3.org/TR/ldp-primer/>`__ endpoint to manage
+any type of binary files and their metadata in Linked Data format.
+
+Guiding Principles
+------------------
+
+LAKEsuperior aims at being an uncomplicated, efficient Fedora 4
+implementation.
+
+Its main goals are:
+
+-  **Reliability:** Based on solid technologies with stability in mind.
+-  **Efficiency:** Small memory and CPU footprint, high scalability.
+-  **Ease of management:** Tools to perform monitoring and maintenance
+   included.
+-  **Simplicity of design:** Straight-forward architecture, robustness
+   over features.
+
+Key features
+------------
+
+-  Drop-in replacement for Fedora4 (with some
+   :doc:`caveats <fcrepo4_deltas>`); currently being tested
+   with Hyrax 2
+-  Very stable persistence layer based on
+   `LMDB <https://symas.com/lmdb/>`__ and filesystem. Fully
+   ACID-compliant writes guarantee consistency of data.
+-  Term-based search (*planned*) and SPARQL Query API + UI
+-  No performance penalty for storing many resources under the same
+   container; no
+   `kudzu <https://www.nature.org/ourinitiatives/urgentissues/land-conservation/forests/kudzu.xml>`__
+   pairtree segmentation \ `1 <#f1>`__\ 
+-  Extensible :doc:`provenance metadata <model>` tracking
+-  :doc:`Multi-modal access <architecture>`: HTTP
+   (REST), command line interface and native Python API.
+-  Fits in a pocket: you can carry 50M triples in an 8Gb memory stick.
+
+Implementation of the official `Fedora API
+specs <https://fedora.info/spec/>`__ (Fedora 5.x and beyond) is not
+foreseen in the short term, however it would be a natural evolution of
+this project if it gains support.
+
+Please make sure you read the :doc:`Delta
+document <fcrepo4_deltas>` for divergences with the
+official Fedora4 implementation.
+
+Target Audience
+---------------
+
+LAKEsuperior is for anybody who cares about preserving data in the long
+term.
+
+Less vaguely, LAKEsuperior is targeted at who needs to store large
+quantities of highly linked metadata and documents.
+
+Its Python/C environment and API make it particularly well suited for
+academic and scientific environments who would be able to embed it in a
+Python application as a library or extend it via plug-ins.
+
+LAKEsuperior is able to be exposed to the Web as a `Linked Data
+Platform <https://www.w3.org/TR/ldp-primer/>`__ server. It also acts as
+a SPARQL query (read-only) endpoint, however it is not meant to be used
+as a full-fledged triplestore at the moment.
+
+In its current status, LAKEsuperior is aimed at developers and hands-on
+managers who are interested in evaluating this project.
+
+Status and development
+----------------------
+
+LAKEsuperior is in **alpha** status. Please see the `project
+issues <https://github.com/scossu/lakesuperior/issues>`__ list for a
+rudimentary road map.
+
+Contributing
+------------
+
+This has been so far a single person’s off-hours project (with much
+input from several sides). In order to turn into anything close to a
+Beta release and eventually to a production-ready implementation, it
+needs some community love.
+
+Contributions are welcome in all forms, including ideas, issue reports,
+or even just spinning up the software and providing some feedback.
+LAKEsuperior is meant to live as a community project.
+
+--------------
+
+1 However if your client splits pairtrees upstream, such as Hyrax does,
+that obviously needs to change to get rid of the path segments.
+`↩ <#a1>`__
+
+.. |build status| image:: http://img.shields.io/travis/scossu/lakesuperior/master.svg?style=flat
+   :target: https://travis-ci.org/username/repo
+
+Indices and tables
+------------------
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents
+
+    Installation and Configuration <setup>
+    Architecture Overview <architecture>
+    Divergences from Fedora 4 <fcrepo4_deltas>
+    Messaging <messaging>
+    Migration Guide <migration>
+    Command Line Reference <cli>
+    Performance Benchmarks <performance>
+
+.. toctree::
+   :maxdepth: 1
+   :caption: In-depth tech & design
+
+    Contributing <contributing>
+    API documentation <api>
+    Indexing Strategy <indexing_strategy>
+    Storage Implementation <storage>
+    Content Model <model>

+ 311 - 0
docs/indexing_strategy.rst

@@ -0,0 +1,311 @@
+LMDB Store design for RDFLib
+============================
+
+This is a log of subsequent strategies employed to store triples in
+LMDB.
+
+Strategy #5a is the one currently used. The rest is kept for historic
+reasons and academic curiosity (and also because it was too much work to
+just wipe out of memory).
+
+Storage approach
+----------------
+
+-  Pickle quad and create MD5 or SHA1 hash.
+-  Store triples in one database paired with key; store indices
+   separately.
+
+Different strategies involve layout and number of databases.
+
+Strategy #1
+-----------
+
+-  kq: key: serialized triple (1:1)
+-  sk: Serialized subject: key (1:m)
+-  pk: Serialized predicate: key (1:m)
+-  ok: Serialized object: key (1:m)
+-  (optional) lok: Serialized literal object: key (1:m)
+-  (optional) tok: Serialized RDF type: key (1:m)
+-  ck: Serialized context: key (1:m)
+
+Retrieval approach
+~~~~~~~~~~~~~~~~~~
+
+To find all matches for a quad:
+
+-  If all terms in the quad are bound, generate the key from the pickled
+   quad and look up the triple in ``kt``
+-  If all terms are unbound, return an iterator of all values in ``kt``.
+-  If some values are bound and some unbound (most common query):
+
+   -  Get a base list of keys associated wirh the first bound term
+   -  For each subsequent bound term, check if each key associated with
+      the term matches a key in the base list
+   -  Continue through all the bound terms. If a match is not found at
+      any point, continue to the next term
+   -  If a match is found in all the bound term databases, look up the
+      pickled quad matching the key in ``kq`` and yield it
+
+More optimization can be introduced later, e.g. separating literal and
+RDF type objects in separate databases. Literals can have very long
+values and a database with a longer key setting may be useful. RDF terms
+can be indexed separately because they are the most common bound term.
+
+Example lookup
+~~~~~~~~~~~~~~
+
+Keys and Triples (should actually be quads but this is a simplified
+version):
+
+A: s1 p1 o1 B: s1 p2 o2 C: s2 p3 o1 D: s2 p3 o3
+
+Indices:
+
+-  SK:
+
+   -  s1: A, B
+   -  s2: C, D
+
+-  PK:
+
+   -  p1: A
+   -  p2: B
+   -  p3: C, D
+
+-  OK:
+-  o1: A, C
+-  o2: B
+-  o3: D
+
+Queries:
+
+-  s1 ?p ?o → {A, B}
+-  s1 p2 ?o → {A, B} & {B} = {B}
+-  ?s ?p o3 → {D}
+-  s1 p2 o5 → {} (Exit at OK: no term matches ‘o5’)
+-  s2 p3 o2 → {C, D} & {C, D} & {B} = {}
+
+Strategy #2
+-----------
+
+Separate data and indices in two environments.
+
+Main data store
+~~~~~~~~~~~~~~~
+
+Key to quad; main keyspace; all unique.
+
+Indices
+~~~~~~~
+
+None of these databases is of critical preservation concern. They can be
+rebuilt from the main data store.
+
+All dupsort and dupfixed.
+
+@TODO The first three may not be needed if computing term hash is fast
+enough.
+
+-  t2k (term to term key)
+-  lt2k (literal to term key: longer keys)
+-  k2t (term key to term)
+
+-  s2k (subject key to quad key)
+-  p2k (pred key to quad key)
+-  o2k (object key to quad key)
+-  c2k (context key to quad key)
+
+-  sc2qk (subject + context keys to quad key)
+-  po2qk (predicate + object keys to quad key)
+
+-  sp2qk (subject + predicate keys to quad key)
+-  oc2qk (object + context keys to quad key)
+
+-  so2qk (subject + object keys to quad key)
+-  pc2qk (predicate + context keys to quad key)
+
+Strategy #3
+-----------
+
+Contexts are much fewer (even in graph per aspect, 5-10 triples per
+graph)
+
+.. _main-data-store-1:
+
+Main data store
+~~~~~~~~~~~~~~~
+
+Preservation-worthy data
+
+-  tk:t (triple key: triple; dupsort, dupfixed)
+-  tk:c (context key: triple; unique)
+
+.. _indices-1:
+
+Indices
+~~~~~~~
+
+Rebuildable from main data store
+
+-  s2k (subject key: triple key)
+-  p2k (pred key: triple key)
+-  o2k (object key: triple key)
+-  sp2k
+-  so2k
+-  po2k
+-  spo2k
+
+Lookup
+~~~~~~
+
+1. Look up triples by s, p, o, sp, so, po and get keys
+2. If a context is specified, for each key try to seek to (context, key)
+   in ct to verify it exists
+3. Intersect sets
+4. Match triple keys with data using kt
+
+Shortcuts
+^^^^^^^^^
+
+-  Get all contexts: return list of keys from ct
+-  Get all triples for a context: get all values for a contex from ct
+   and match triple data with kt
+-  Get one triple match for all contexts: look up in triple indices and
+   match triple data with kt
+
+Strategy #4
+-----------
+
+Terms are entered individually in main data store. Also, shorter keys
+are used rather than hashes. These two aspects save a great deal of
+space and I/O, but require an additional index to put the terms together
+in a triple.
+
+.. _main-data-store-2:
+
+Main Data Store
+~~~~~~~~~~~~~~~
+
+-  t:st (term key: serialized term; 1:1)
+-  spo:c (joined S, P, O keys: context key; 1:m)
+-  c: (context keys only, values are the empty bytestring)
+
+Storage total: variable
+
+.. _indices-2:
+
+Indices
+~~~~~~~
+
+-  th:t (term hash: term key; 1:1)
+-  c:spo (context key: joined triple keys; 1:m)
+-  s:po (S key: P + O key; 1:m)
+-  p:so (P key: S + O keys; 1:m)
+-  o:sp (object key: triple key; 1:m)
+-  sp:o (S + P keys: O key; 1:m)
+-  so:p (S + O keys: P key; 1:m)
+-  po:s (P + O keys: S key; 1:m)
+
+Storage total: 143 bytes per triple
+
+Disadvantages
+~~~~~~~~~~~~~
+
+-  Lots of indices
+-  Terms can get orphaned:
+
+   -  No easy way to know if a term is used anywhere in a quad
+   -  Needs some routine cleanup
+   -  On the other hand, terms are relatively light-weight and can be
+      reused
+   -  Almost surely not reusable are UUIDs, message digests, timestamps
+      etc.
+
+Strategy #5
+-----------
+
+Reduce number of indices and rely on parsing and splitting keys to find
+triples with two bound parameters.
+
+This is especially important for keeping indexing synchronous to achieve
+fully ACID writes.
+
+.. _main-data-store-3:
+
+Main data store
+~~~~~~~~~~~~~~~
+
+Same as Strategy #4:
+
+-  t:st (term key: serialized term; 1:1)
+-  spo:c (joined S, P, O keys: context key; dupsort, dupfixed)
+-  c: (context keys only, values are the empty bytestring; 1:1)
+
+Storage total: variable (same as #4)
+
+.. _indices-3:
+
+Indices
+~~~~~~~
+
+-  th:t (term hash: term key; 1:1)
+-  s:po (S key: joined P, O keys; dupsort, dupfixed)
+-  p:so (P key: joined S, O keys; dupsort, dupfixed)
+-  o:sp (O key: joined S, P keys; dupsort, dupfixed)
+-  c:spo (context → triple association; dupsort, dupfixed)
+
+Storage total: 95 bytes per triple
+
+Lookup strategy
+~~~~~~~~~~~~~~~
+
+-  ? ? ? c: [c:spo] all SPO for C → split key → [t:st] term from term
+   key
+-  s p o c: [c:spo] exact SPO & C match → split key → [t:st] term from
+   term key
+-  s ? ?: [s:po] All PO for S → split key → [t:st] term from term key
+-  s p ?: [s:po] All PO for S → filter result by P in split key → [t:st]
+   term from term key
+
+Advantages
+~~~~~~~~~~
+
+-  Less indices: smaller index size and less I/O
+
+.. _disadvantages-1:
+
+Disadvantages
+~~~~~~~~~~~~~
+
+-  Possibly slower retrieval for queries with 2 bound terms (run
+   metrics)
+
+Further optimization
+~~~~~~~~~~~~~~~~~~~~
+
+In order to minimize traversing and splittig results, the first
+retrieval should be made on the term with less average keys. Search
+order can be balanced by establishing a lookup order for indices.
+
+This can be achieved by calling stats on the index databases and looking
+up the database with *most* keys. Since there is an equal number of
+entries in each of the (s:po, p:so, o:sp) indices, the one with most
+keys will have the least average number of values per key. If that
+lookup is done first, the initial data set to traverse and filter will
+be smaller.
+
+Strategy #5a
+------------
+
+This is a slightly different implementation of #5 that somewhat
+simplifies and perhaps speeds up things a bit. It is the currently
+employed solution.
+
+The indexing and lookup strtegy is the same; but instead of using a
+separator byte for splitting compound keys, the logic relies on the fact
+that keys have a fixed length and are sliced instead. This *should*
+result in faster key manipulation, also because in most cases
+``memoryview`` buffers can be used directly instead of being copied from
+memory.
+
+Index storage is 90 bytes per triple.

+ 30 - 0
docs/messaging.rst

@@ -0,0 +1,30 @@
+LAKEsuperior Messaging
+======================
+
+LAKEsuperior implements a messaging system based on ActivityStreams, as
+indicated by the `Feodra API
+specs <https://fedora.info/2017/06/30/spec/#notifications>`__. The
+metadata set provided is currently quite minimal but can be easily
+enriched by extending the `default formatter
+class <https://github.com/scossu/lakesuperior/blob/master/lakesuperior/messaging/messenger.py>`__.
+
+STOMP is the only supported protocol at the moment. More protocols may
+be made available at a later time.
+
+LAKEsuperior can send messages to any number of destinations: see
+`configuration <https://github.com/scossu/lakesuperior/blob/master/etc.defaults/application.yml#L79>`__.
+By default, CoilMQ is provided for testing purposes and listens to
+``localhost:61613``. The default route sends messages to
+``/topic/fcrepo``.
+
+A small command-line utility, also provided with the Python
+dependencies, allows to watch incoming messages. To monitor messages,
+enter the following *after activating your virtualenv*:
+
+::
+
+    stomp -H localhost -P 61613 -L /topic/fcrepo
+
+See the `stomp.py library reference
+page <https://github.com/jasonrbriggs/stomp.py/wiki/Command-Line-Access>`__
+for details.

+ 65 - 0
docs/migration.rst

@@ -0,0 +1,65 @@
+Migration, Backup & Restore
+===========================
+
+All LAKEsuperior data is by default fully contained in a folder. This
+means that only the data, configurations and code folders are needed for
+it to run. No Postgres, Redis, or such. Data and configuration folders
+can be moved around as needed.
+
+Migration Tool
+--------------
+
+Migration is the process of importing and converting data from a
+different Fedora or LDP implementation into a new LAKEsuperior instance.
+This process uses the HTTP/LDP API of the original repository. A
+command-line utility is available as part of the ``lsup-admin`` suite to
+assist in such operation.
+
+A repository can be migrated with a one-line command such as:
+
+::
+
+    ./lsup-admin migrate http://source-repo.edu/rest /local/dest/folder
+
+For more options, enter
+
+::
+
+    ./lsup-admin migrate --help
+
+The script will crawl through the resources and crawl through outbound
+links within them. In order to do this, resources are added as raw
+triples ( i.e. no consistency checks are made).
+
+**Note:** the consistency check tool has not yet been implemented at the
+moment but its release should follow shortly. This will ensure that all
+the links between resources are consistent in regard to referential
+integrity.
+
+This script will create a full dataset in the specified destination
+folder, complete with a default configuration that allows to start the
+LAKEsuperior server immediately after the migration is complete.
+
+Two approaches to migration are possible:
+
+1. By providing a starting point on the source repository. E.g. if the
+   repository you want to migrate is at ``http://repo.edu/rest/prod``
+   you can add the ``-s /prod`` option to the script to avoid migrating
+   irrelevant branches. Note that the script will still reach outside of
+   the starting point if resources are referencing other resources
+   outside of it.
+2. By providing a file containing a list of resources to migrate. This
+   is useful if a source repository cannot produce a full list (e.g. the
+   root node has more children than the server can handle) but a list of
+   individual resources is available via an external index (Solr,
+   triplestore, etc.). The resources can be indicated by their fully
+   qualified URIs or paths relative to the repository root. (*TODO
+   latter option needs testing*)
+
+Backup And Restore
+------------------
+
+A back up of a LAKEshore repository consists in copying the RDF and
+non-RDF data folders. These folders are indicated in the application
+configuration. The default commands provided by your OS (``cp``,
+``rsync``, ``tar`` etc. for Unix) are all is needed.

+ 66 - 0
docs/model.rst

@@ -0,0 +1,66 @@
+LAKEsuperior Content Model Rationale
+====================================
+
+Internal and Public URIs; Identifiers
+-------------------------------------
+
+Resource URIs are stored internally in LAKEsuperior as domain-agnostic
+URIs with the scheme ``info:fcres<resource UID>``. This allows resources
+to be portable across systems. E.g. a resource with an internal URI of
+``info:fcres/a/b/c``, when accessed via the
+``http://localhost:8000/ldp`` endpoint, will be found at
+``http://localhost:8000/ldp/a/b/c``.
+
+The resource UID making up the looks like a UNIX filesystem path,
+i.e. it always starts with a forward slash and can be made up of
+multiple segments separated by slashes. E.g. ``/`` is the root node UID,
+``/a`` is a resource UID just below root. their internal URIs are
+``info:fcres/`` and ``info:fcres/a`` respectively.
+
+In the Python API, the UID and internal URI of an LDP resource can be
+accessed via the ``uid`` and ``uri`` properties respectively:
+
+::
+
+    >>> import lakesuperior.env_setup
+    >>> from lakesuperior.api import resource
+    >>> rsrc = resource.get('/a/b/c')
+    >>> rsrc.uid
+    /a/b/c
+    >>> rsrc.uri
+    rdflib.terms.URIRef('info:fcres/a/b/c')
+
+Store Layout
+------------
+
+One of the key concepts in LAKEsuperior is the store layout. This is a
+module built with a specific purpose in mind, i.e. allowing fine-grained
+recording of provenance metadata while providing reasonable performance.
+
+Store layout modules could be replaceable (work needs to be done to
+develop an interface to allow that). The default (and only at the
+moment) layout shipped with LAKEsuperior is the `resource-centric
+layout <../../lakesuperior/store/ldp_rs/rsrc_centric_layout>`__. This
+layout implements a so-called `graph-per-aspect
+pattern <http://patterns.dataincubator.org/book/graph-per-aspect.html>`__
+which stores different sets of statements about a resource in separate
+named graphs.
+
+The named graphs used for each resource are:
+
+-  An admin graph (``info:fcsystem/graph/admin<resource UID>``) which
+   stores administrative metadata, mostly server-managed triples such as
+   LDP types, system create/update timestamps and agents, etc.
+-  A structure graph (``info:fcsystem/graph/structure<resource UID>``)
+   reserved for containment triples. The reason for this separation is
+   purely convenience, since it makes it easy to retrieve all the
+   properties of a large container without its child references.
+-  One (and, possibly, in the future, more user-defined) named graph for
+   user-provided data
+   (``info:fcsystem/graph/userdata/_main<resource UID>``).
+
+Each of these graphs can be annotated with provenance metadata. The
+layout decides which triples go in which graph based on the predicate or
+RDF type contained in the triple. Adding logic to support arbitrary
+named graphs based e.g. on user agent, or to add more provenance
+information, should be relatively simple.

+ 0 - 0
doc/notes/TODO → docs/notes/TODO.historic


+ 131 - 0
docs/performance.rst

@@ -0,0 +1,131 @@
+Performance Benchmark Report
+============================
+
+Environment
+-----------
+
+Hardware
+~~~~~~~~
+
+‘Rather Snappy’ Laptop
+^^^^^^^^^^^^^^^^^^^^^^
+
+-  Dell Precison M3800 Laptop
+-  4x Intel(R) Core(TM) i7-4712HQ CPU @ 2.30GHz
+-  12Gb RAM
+-  SSD
+
+‘Ole Workhorse’ server
+^^^^^^^^^^^^^^^^^^^^^^
+
+8x Intel(R) Xeon(R) CPU X5550 @ 2.67GHz 16Gb RAM Magnetic drive, XXX RPM
+
+Software
+~~~~~~~~
+
+-  Arch Linux OS
+-  glibc 2.26-11
+-  python 3.5.4
+-  lmdb 0.9.21-1
+
+Benchmark script
+~~~~~~~~~~~~~~~~
+
+`Generator script <../../util/benchmark.py>`__
+
+The script was run with default values: 10,000 children under the same
+parent, PUT requests.
+
+Data Set
+~~~~~~~~
+
+Synthetic graph created by the benchmark script. The graph is unique for
+each request and consists of 200 triples which are partly random data,
+with a consistent size and variation:
+
+-  50 triples have an object that is a URI of an external resource (50
+   unique predicates; 5 unique objects).
+-  50 triples have an object that is a URI of a repository-managed
+   resource (50 unique predicates; 5 unique objects).
+-  100 triples have an object that is a 64-character random Unicode
+   string (50 unique predicates; 100 unique objects).
+
+Results
+-------
+
+.. _rather-snappy-laptop-1:
+
+‘Rather Snappy’ Laptop
+~~~~~~~~~~~~~~~~~~~~~~
+
+FCREPO/Modeshape 4.7.5
+^^^^^^^^^^^^^^^^^^^^^^
+
+15’45" running time
+
+0.094" per resource (100%—reference point)
+
+3.4M triples total in repo at the end of the process
+
+Retrieval of parent resource (~10000 triples), pipe to /dev/null: 3.64"
+(100%)
+
+Peak memory usage: 2.47Gb
+
+Database size: 3.3 Gb
+
+LAKEsuperior Alpha 6, LMDB Back End
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+25’ running time
+
+0.152" per resource (161%)
+
+*Some gaps every ~40-50 requests, probably disk flush*
+
+Retrieval of parent resource (10K triples), pipe to /dev/null: 2.13"
+(58%)
+
+Peak memory usage: ~650 Mb (3 idle workers, 1 active)
+
+Database size: 523 Mb (16%)
+
+.. _ole-workhorse-server-1:
+
+‘Ole Workhorse’ server
+~~~~~~~~~~~~~~~~~~~~~~
+
+FCREPO
+^^^^^^
+
+0:47:38 running time
+
+0.285" per resource (100%)
+
+Retrieval of parent resource: 9.6" (100%)
+
+LAKEsuperior
+^^^^^^^^^^^^
+
+1:14:19 running time
+
+0.446" per resource (156%)
+
+Retrieval of parent resource: 5.58" (58%)
+
+Conclusions
+-----------
+
+LAKEsuperior appears to be markedly slower on writes and markedly faster
+on reads. Both these factors are very likely related to the underlying
+LMDB store which is optimized for read performance.
+
+Comparison of results between the laptop and the server demonstrates
+that both read and write performance gaps are identical in the two
+environments. Disk speed severely affects the numbers.
+
+**Note:** As you can guess, these are only very partial and specific
+results. They should not be taken as a thorough performance assessment.
+Such an assessment may be impossible and pointless to make given the
+very different nature of the storage models, which may behave radically
+differently depending on many variables.

+ 90 - 0
docs/setup.rst

@@ -0,0 +1,90 @@
+Installation & Configuration
+============================
+
+Quick Install: Running in Docker
+--------------------------------
+
+You can run LAKEsuperior in Docker for a hands-off quickstart.
+
+`Docker <http://docker.com/>`__ is a containerization platform that
+allows you to run services in lightweight virtual machine environments
+without having to worry about installing all of the prerequisites on
+your host machine.
+
+1. Install the correct `Docker Community
+   Edition <https://www.docker.com/community-edition>`__ for your
+   operating system.
+2. Clone the LAKEsuperior git repository:
+   ``git clone https://github.com/scossu/lakesuperior.git``
+3. ``cd`` into repo folder
+4. Run ``docker-compose up``
+
+LAKEsuperior should now be available at ``http://localhost:8000/``.
+
+The provided Docker configuration includes persistent storage as a
+self-container Docker volume, meaning your data will persist between
+runs. If you want to clear the decks, simply run
+``docker-compose down -v``.
+
+Manual Install (a bit less quick, a bit more power)
+---------------------------------------------------
+
+**Note:** These instructions have been tested on Linux. They may work on
+Darwin with little modification, and possibly on Windows with some
+modifications. Feedback is welcome.
+
+Dependencies
+~~~~~~~~~~~~
+
+1. Python 3.5 or greater.
+2. A message broker supporting the STOMP protocol. For testing and
+   evaluation purposes, `CoilMQ <https://github.com/hozn/coilmq>`__ is
+   included with the dependencies and should be automatically installed.
+
+Installation steps
+~~~~~~~~~~~~~~~~~~
+
+1. Create a virtualenv in a project folder:
+   ``virtualenv -p <python 3.5+ exec path> <virtualenv folder>``
+2. Activate the virtualenv: ``source <path_to_virtualenv>/bin/activate``
+3. Clone this repo:
+   ``git clone https://github.com/scossu/lakesuperior.git``
+4. ``cd`` into repo folder
+5. Install dependencies: ``pip install -r requirements.txt``
+6. Start your STOMP broker, e.g.: ``coilmq &``. If you have another
+   queue manager listening to port 61613 you can either configure a
+   different port on the application configuration, or use the existing
+   message queue.
+7. Run ``./lsup-admin bootstrap`` to initialize the binary and graph
+   stores
+8. Run ``./fcrepo``.
+
+Configuration
+-------------
+
+The app should run for testing and evaluation purposes without any
+further configuration. All the application data are stored by default in
+the ``data`` directory.
+
+To change the default configuration you should:
+
+1. Copy the ``etc.skeleton`` folder to a separate location
+2. Set the configuration folder location in the environment:
+   ``export FCREPO_CONFIG_DIR=<your config dir location>`` (you can add
+   this line at the end of your virtualenv ``activate`` script)
+3. Configure the application
+4. Bootstrap the app or copy the original data folders to the new
+   location if any loction options changed
+5. (Re)start the server: ``./fcrepo``
+
+The configuration options are documented in the files.
+
+**Note:** ``test.yml`` must specify a different location for the graph
+and for the binary stores than the default one, otherwise running a test
+suite will destroy your main data store. The application will issue an
+error message and refuse to start if these locations overlap.
+
+Production deployment
+---------------------
+
+If you like fried repositories for lunch, deploy before 11AM.

+ 0 - 0
doc/src/lakesuperior_arch.graphml → docs/src/lakesuperior_arch.graphml


+ 0 - 0
doc/src/lakesuperior_content_model.graphml → docs/src/lakesuperior_content_model.graphml


+ 0 - 0
doc/src/lakesuperior_recommendations.md → docs/src/lakesuperior_recommendations.md


+ 0 - 0
doc/src/template.latex → docs/src/template.latex


+ 0 - 0
doc/src/use_cases_transactions.md → docs/src/use_cases_transactions.md


+ 94 - 0
docs/storage.rst

@@ -0,0 +1,94 @@
+Storage Implementation
+======================
+
+LAKEsuperior stores non-RDF (“binary”) data in the filesystem and RDF
+data in an embedded key-value store, `LMDB <https://symas.com/lmdb/>`__.
+
+RDF Storage design
+------------------
+
+LMDB is a very fast, very lightweight C library. It is inspired by
+BerkeleyDB but introduces significant improvements in terms of
+efficiency and stability.
+
+The LAKEsuperior RDF store consists of two files: the main data store
+and the indices (plus two lock files that are generated at runtime). A
+good amount of effort has been put to develop an indexing strategy that
+is balanced between write performance, read performance, and data size,
+with no compromise made on consistency.
+
+The main data store is the one containing the preservation-worthy data.
+While the indices are necessary for LAKEsuperior to function, they can
+be entirely rebuilt from the main data store in case of file corruption
+(recovery tools are on the TODO list).
+
+Detailed notes about the various strategies researched can be found
+`here <indexing_strategy.md>`__.
+
+Scalability
+-----------
+
+Since LAKEsuperior is focused on design simplicity, efficiency and
+reliability, its RDF store is embedded and not horizontally scalable.
+However, LAKEsuperior is quite frugal with disk space. About 55 million
+triples can be stored in 8Gb of space (mileage can vary depending on how
+heterogeneous the triples are). This makes it easier to use expensive
+SSD drives for the RDF store, in order to improve performance. A single
+LMDB environment can reportedly scale up to 128 terabytes.
+
+Maintenance
+-----------
+
+LMDB has a very simple configuration, and all options are hardcoded in
+LAKESuperior in order to exploit its features. A database automatically
+recovers from a crash.
+
+The LAKEsuperior RDF store abstraction maintains a registry of unique
+terms. These terms are not deleted if a triple is deleted, even if no
+triple is using them, because it would be too expesive to look up for
+orphaned terms during a delete request. While these terms are relatively
+lightweight, it would be good to run a periodical clean-up job. Tools
+will be developed in the near future to facilitate this maintenance
+task.
+
+Consistency
+-----------
+
+LAKEsuperior wraps each LDP operation in a transaction. The indices are
+updated synchronously within the same transaction in order to guarantee
+consistency. If a system loses power or crashes, only the last
+transaction is lost, and the last successful write will include primary
+and index data.
+
+Concurrency
+-----------
+
+LMDB employs
+`MVCC <https://en.wikipedia.org/wiki/Multiversion_concurrency_control>`__
+to achieve fully ACID transactions. This implies that during a write,
+the whole database is locked. Multiple writes can be initiated
+concurrently, but the performance gain of doing so may be little because
+only one write operation can be performed at a time. Reasonable efforts
+have been put to make write transactions as short as possible (and more
+can be done). Also, this excludes a priori the option to implement
+long-running atomic operations, unless one is willing to block writes on
+the application for an indefinite length of time. On the other hand,
+write operations never block and are never blocked, so an application
+with a high read-to-write ratio may still benefit from multi-threaded
+requests.
+
+Performance
+-----------
+
+The `Performance Benchmark Report <performance.md>`__ contains benchmark
+results.
+
+Write performance is lower than Modeshape/Fedora4; this may be mostly
+due to the fact that indices are written synchronously in a blocking
+transaction; also, the LMDB B+Tree structure is optimized for read
+performance rather than write performance. Some optimizations on the
+application layer could be made.
+
+Reads are faster than Modeshape/Fedora.
+
+All tests so far have been performed in a single thread.

+ 9 - 8
lakesuperior/api/query.py

@@ -14,17 +14,18 @@ rdf_store = env.app_globals.rdf_store
 
 
 def sparql_query(qry_str, fmt):
-    '''
+    """
     Send a SPARQL query to the triplestore.
 
-    @param qry_str (str) SPARQL query string. SPARQL 1.1 Query Language
-    (https://www.w3.org/TR/sparql11-query/) is supported.
-    @param fmt(string) Serialization format. This varies depending on the
-    query type (SELECT, ASK, CONSTRUCT, etc.). [@TODO Add reference to RDFLib
-    serialization formats]
+    :param str qry_str: SPARQL query string. SPARQL 1.1 Query Language
+        (https://www.w3.org/TR/sparql11-query/) is supported.
+    :param str fmt: Serialization format. This varies depending on the
+        query type (SELECT, ASK, CONSTRUCT, etc.). [TODO Add reference to
+        RDFLib serialization formats]
 
-    @return BytesIO
-    '''
+    :rtype: BytesIO
+    :return: Serialized SPARQL results.
+    """
     with TxnManager(rdf_store) as txn:
         qres = rdfly.raw_query(qry_str)
         out_stream = BytesIO(qres.serialize(format=fmt))

+ 65 - 62
lakesuperior/api/resource.py

@@ -21,7 +21,7 @@ from lakesuperior.store.ldp_rs.lmdb_store import TxnManager
 
 logger = logging.getLogger(__name__)
 
-__doc__ = '''
+__doc__ = """
 Primary API for resource manipulation.
 
 Quickstart:
@@ -54,10 +54,10 @@ Quickstart:
  (rdflib.term.URIRef('info:fcres/'),
   rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
   rdflib.term.URIRef('http://www.w3.org/ns/ldp#RDFSource'))}
-'''
+"""
 
 def transaction(write=False):
-    '''
+    """
     Handle atomic operations in a store.
 
     This wrapper ensures that a write operation is performed atomically. It
@@ -66,7 +66,7 @@ def transaction(write=False):
 
     ALL write operations on the LDP-RS and LDP-NR stores go through this
     wrapper.
-    '''
+    """
     def _transaction_deco(fn):
         @wraps(fn)
         def _wrapper(*args, **kwargs):
@@ -87,9 +87,9 @@ def transaction(write=False):
 
 
 def process_queue():
-    '''
+    """
     Process the message queue on a separate thread.
-    '''
+    """
     lock = Lock()
     lock.acquire()
     while len(env.app_globals.changelog):
@@ -98,14 +98,14 @@ def process_queue():
 
 
 def send_event_msg(remove_trp, add_trp, metadata):
-    '''
+    """
     Send messages about a changed LDPR.
 
     A single LDPR message packet can contain multiple resource subjects, e.g.
     if the resource graph contains hash URIs or even other subjects. This
     method groups triples by subject and sends a message for each of the
     subjects found.
-    '''
+    """
     # Group delta triples by subject.
     remove_grp = groupby(remove_trp, lambda x : x[0])
     remove_dict = {k[0]: k[1] for k in remove_grp}
@@ -123,11 +123,11 @@ def send_event_msg(remove_trp, add_trp, metadata):
 
 @transaction()
 def exists(uid):
-    '''
+    """
     Return whether a resource exists (is stored) in the repository.
 
-    @param uid (string) Resource UID.
-    '''
+    :param string uid: Resource UID.
+    """
     try:
         exists = LdpFactory.from_stored(uid).is_stored
     except ResourceNotExistsError:
@@ -137,31 +137,32 @@ def exists(uid):
 
 @transaction()
 def get_metadata(uid):
-    '''
+    """
     Get metadata (admin triples) of an LDPR resource.
 
-    @param uid (string) Resource UID.
-    '''
+    :param string uid: Resource UID.
+    """
     return LdpFactory.from_stored(uid).metadata
 
 
 @transaction()
 def get(uid, repr_options={}):
-    '''
+    """
     Get an LDPR resource.
 
     The resource comes preloaded with user data and metadata as indicated by
     the `repr_options` argument. Any further handling of this resource is done
     outside of a transaction.
 
-    @param uid (string) Resource UID.
-    @param repr_options (dict(bool)) Representation options. This is a dict
-    that is unpacked downstream in the process. The default empty dict results
-    in default values. The accepted dict keys are:
+    :param string uid: Resource UID.
+    :param  repr_options: (dict(bool)) Representation options. This is a dict
+        that is unpacked downstream in the process. The default empty dict
+        results in default values. The accepted dict keys are:
+
     - incl_inbound: include inbound references. Default: False.
     - incl_children: include children URIs. Default: True.
     - embed_children: Embed full graph of all child resources. Default: False
-    '''
+    """
     rsrc = LdpFactory.from_stored(uid, repr_options)
     # Load graph before leaving the transaction.
     rsrc.imr
@@ -171,23 +172,23 @@ def get(uid, repr_options={}):
 
 @transaction()
 def get_version_info(uid):
-    '''
+    """
     Get version metadata (fcr:versions).
-    '''
+    """
     return LdpFactory.from_stored(uid).version_info
 
 
 @transaction()
 def get_version(uid, ver_uid):
-    '''
+    """
     Get version metadata (fcr:versions).
-    '''
+    """
     return LdpFactory.from_stored(uid).get_version(ver_uid)
 
 
 @transaction(True)
 def create(parent, slug, **kwargs):
-    '''
+    r"""
     Mint a new UID and create a resource.
 
     The UID is computed from a given parent UID and a "slug", a proposed path
@@ -195,14 +196,14 @@ def create(parent, slug, **kwargs):
     path but it may use a different one if a conflict with an existing resource
     arises.
 
-    @param parent (string) UID of the parent resource.
-    @param slug (string) Tentative path relative to the parent UID.
-    @param **kwargs Other parameters are passed to the
-    LdpFactory.from_provided method. Please see the documentation for that
-    method for explanation of individual parameters.
+    :param str parent: UID of the parent resource.
+    :param str slug: Tentative path relative to the parent UID.
+    :param \*\*kwargs: Other parameters are passed to the
+      :meth:`LdpFactory.from_provided` method.
 
-    @return string UID of the new resource.
-    '''
+    :rtype: str
+    :return: UID of the new resource.
+    """
     uid = LdpFactory.mint_uid(parent, slug)
     logger.debug('Minted UID for new resource: {}'.format(uid))
     rsrc = LdpFactory.from_provided(uid, **kwargs)
@@ -214,7 +215,7 @@ def create(parent, slug, **kwargs):
 
 @transaction(True)
 def create_or_replace(uid, stream=None, **kwargs):
-    '''
+    r"""
     Create or replace a resource with a specified UID.
 
     If the resource already exists, all user-provided properties of the
@@ -222,15 +223,15 @@ def create_or_replace(uid, stream=None, **kwargs):
     content is empty, an exception is raised (not sure why, but that's how
     FCREPO4 handles it).
 
-    @param uid (string) UID of the resource to be created or updated.
-    @param stream (BytesIO) Content stream. If empty, an empty container is
-    created.
-    @param **kwargs Other parameters are passed to the
-    LdpFactory.from_provided method. Please see the documentation for that
-    method for explanation of individual parameters.
+    :param string uid: UID of the resource to be created or updated.
+    :param BytesIO stream: Content stream. If empty, an empty container is
+        created.
+    :param \*\*kwargs: Other parameters are passed to the
+        :meth:`LdpFactory.from_provided` method.
 
-    @return string Event type: whether the resource was created or updated.
-    '''
+    :rtype: str
+    :return: Event type: whether the resource was created or updated.
+    """
     rsrc = LdpFactory.from_provided(uid, stream=stream, **kwargs)
 
     if not stream and rsrc.is_stored:
@@ -242,14 +243,15 @@ def create_or_replace(uid, stream=None, **kwargs):
 
 @transaction(True)
 def update(uid, update_str, is_metadata=False):
-    '''
+    """
     Update a resource with a SPARQL-Update string.
 
-    @param uid (string) Resource UID.
-    @param update_str (string) SPARQL-Update statements.
-    @param is_metadata (bool) Whether the resource metadata is being updated.
-    If False, and the resource being updated is a LDP-NR, an error is raised.
-    '''
+    :param string uid: Resource UID.
+    :param string update_str: SPARQL-Update statements.
+    :param bool is_metadata: Whether the resource metadata is being updated.
+        If False, and the resource being updated is a LDP-NR, an error is
+        raised.
+    """
     rsrc = LdpFactory.from_stored(uid)
     if LDP_NR_TYPE in rsrc.ldp_types and not is_metadata:
         raise InvalidResourceError(uid)
@@ -261,28 +263,29 @@ def update(uid, update_str, is_metadata=False):
 
 @transaction(True)
 def create_version(uid, ver_uid):
-    '''
+    """
     Create a resource version.
 
-    @param uid (string) Resource UID.
-    @param ver_uid (string) Version UID to be appended to the resource URI.
-    NOTE: this is a "slug", i.e. the version URI is not guaranteed to be the
-    one indicated.
+    :param string uid: Resource UID.
+    :param string ver_uid: Version UID to be appended to the resource URI.
+      NOTE: this is a "slug", i.e. the version URI is not guaranteed to be the
+      one indicated.
 
-    @return string Version UID.
-    '''
+    :rtype: str
+    :return: Version UID.
+    """
     return LdpFactory.from_stored(uid).create_version(ver_uid)
 
 
 @transaction(True)
 def delete(uid, soft=True):
-    '''
+    """
     Delete a resource.
 
-    @param uid (string) Resource UID.
-    @param soft (bool) Whether to perform a soft-delete and leave a
-    tombstone resource, or wipe any memory of the resource.
-    '''
+    :param string uid: Resource UID.
+    :param bool soft: Whether to perform a soft-delete and leave a
+      tombstone resource, or wipe any memory of the resource.
+    """
     # If referential integrity is enforced, grab all inbound relationships
     # to break them.
     refint = env.app_globals.rdfly.config['referential_integrity']
@@ -314,9 +317,9 @@ def delete(uid, soft=True):
 
 @transaction(True)
 def resurrect(uid):
-    '''
+    """
     Reinstate a buried (soft-deleted) resource.
 
-    @param uid (string) Resource UID.
-    '''
+    :param str uid: Resource UID.
+    """
     return LdpFactory.from_stored(uid).resurrect_rsrc()

+ 2 - 2
lakesuperior/config_parser.py

@@ -19,8 +19,8 @@ def parse_config(config_dir=None):
     ``etc.defaults``.
 
     :param config_dir: Location on the filesystem of the configuration
-    directory. The default is set by the ``FCREPO_CONFIG_DIR`` environment
-    variable or, if this is not set, the ``etc.defaults`` stock directory.
+        directory. The default is set by the ``FCREPO_CONFIG_DIR`` environment
+        variable or, if this is not set, the ``etc.defaults`` stock directory.
     """
     configs = (
         'application',

+ 8 - 8
lakesuperior/endpoints/admin.py

@@ -13,21 +13,21 @@ admin = Blueprint('admin', __name__)
 
 @admin.route('/stats', methods=['GET'])
 def stats():
-    '''
+    """
     Get repository statistics.
-    '''
+    """
     def fsize_fmt(num, suffix='b'):
-        '''
+        """
         Format an integer into 1024-block file size format.
 
         Adapted from Python 2 code on
         https://stackoverflow.com/a/1094933/3758232
 
-        @param num (int) Size value in bytes.
-        @param suffix (string) Suffix label (defaults to `B`).
+        :param int num: Size value in bytes.
+        :param string suffix: Suffix label (defaults to `B`).
 
         @return string Formatted size to largest fitting unit.
-        '''
+        """
         for unit in ['','K','M','G','T','P','E','Z']:
             if abs(num) < 1024.0:
                 return "{:3.1f} {}{}".format(num, unit, suffix)
@@ -42,9 +42,9 @@ def stats():
 
 @admin.route('/tools', methods=['GET'])
 def admin_tools():
-    '''
+    """
     Admin tools.
 
     @TODO stub.
-    '''
+    """
     return render_template('admin_tools.html')

+ 41 - 41
lakesuperior/endpoints/ldp.py

@@ -62,7 +62,7 @@ std_headers = {
     #'Allow' : ','.join(allow),
 }
 
-'''Predicates excluded by view.'''
+"""Predicates excluded by view."""
 vw_blacklist = {
 }
 
@@ -112,17 +112,17 @@ def log_request_end(rsp):
 @ldp.route('/<path:uid>/fcr:content', defaults={'out_fmt' : 'non_rdf'},
         methods=['GET'])
 def get_resource(uid, out_fmt=None):
-    '''
+    """
     https://www.w3.org/TR/ldp/#ldpr-HTTP_GET
 
     Retrieve RDF or binary content.
 
-    @param uid (string) UID of resource to retrieve. The repository root has
+    :param str uid: UID of resource to retrieve. The repository root has
     an empty string for UID.
-    @param out_fmt (string) Force output to RDF or non-RDF if the resource is
+    :param str out_fmt: Force output to RDF or non-RDF if the resource is
     a LDP-NR. This is not available in the API but is used e.g. by the
     `*/fcr:metadata` and `*/fcr:content` endpoints. The default is False.
-    '''
+    """
     logger.info('UID: {}'.format(uid))
     out_headers = std_headers
     repr_options = defaultdict(dict)
@@ -169,9 +169,9 @@ def get_resource(uid, out_fmt=None):
 
 @ldp.route('/<path:uid>/fcr:versions', methods=['GET'])
 def get_version_info(uid):
-    '''
+    """
     Get version info (`fcr:versions`).
-    '''
+    """
     try:
         gr = rsrc_api.get_version_info(uid)
     except ResourceNotExistsError as e:
@@ -186,12 +186,12 @@ def get_version_info(uid):
 
 @ldp.route('/<path:uid>/fcr:versions/<ver_uid>', methods=['GET'])
 def get_version(uid, ver_uid):
-    '''
+    """
     Get an individual resource version.
 
-    @param uid (string) Resource UID.
-    @param ver_uid (string) Version UID.
-    '''
+    :param str uid: Resource UID.
+    :param str ver_uid: Version UID.
+    """
     try:
         gr = rsrc_api.get_version(uid, ver_uid)
     except ResourceNotExistsError as e:
@@ -208,11 +208,11 @@ def get_version(uid, ver_uid):
 @ldp.route('/', defaults={'parent_uid': '/'}, methods=['POST'],
         strict_slashes=False)
 def post_resource(parent_uid):
-    '''
+    """
     https://www.w3.org/TR/ldp/#ldpr-HTTP_POST
 
     Add a new resource in a new URI.
-    '''
+    """
     out_headers = std_headers
     try:
         slug = request.headers['Slug']
@@ -261,11 +261,11 @@ def post_resource(parent_uid):
 @ldp.route('/<path:uid>/fcr:metadata', defaults={'force_rdf' : True},
         methods=['PUT'])
 def put_resource(uid):
-    '''
+    """
     https://www.w3.org/TR/ldp/#ldpr-HTTP_PUT
 
     Add or replace a new resource at a specified URI.
-    '''
+    """
     # Parse headers.
     logger.debug('Request headers: {}'.format(request.headers))
 
@@ -310,11 +310,11 @@ def put_resource(uid):
 
 @ldp.route('/<path:uid>', methods=['PATCH'], strict_slashes=False)
 def patch_resource(uid, is_metadata=False):
-    '''
+    """
     https://www.w3.org/TR/ldp/#ldpr-HTTP_PATCH
 
     Update an existing resource with a SPARQL-UPDATE payload.
-    '''
+    """
     rsp_headers = {'Content-Type' : 'text/plain; charset=utf-8'}
     if request.mimetype != 'application/sparql-update':
         return 'Provided content type is not a valid parsable format: {}'\
@@ -344,7 +344,7 @@ def patch_resource_metadata(uid):
 
 @ldp.route('/<path:uid>', methods=['DELETE'])
 def delete_resource(uid):
-    '''
+    """
     Delete a resource and optionally leave a tombstone.
 
     This behaves differently from FCREPO. A tombstone indicated that the
@@ -355,7 +355,7 @@ def delete_resource(uid):
     In order to completely wipe out all traces of a resource, the tombstone
     must be deleted as well, or the `Prefer:no-tombstone` header can be used.
     The latter will forget (completely delete) the resource immediately.
-    '''
+    """
     headers = std_headers
 
     if 'prefer' in request.headers:
@@ -377,12 +377,12 @@ def delete_resource(uid):
 @ldp.route('/<path:uid>/fcr:tombstone', methods=['GET', 'POST', 'PUT',
         'PATCH', 'DELETE'])
 def tombstone(uid):
-    '''
+    """
     Handle all tombstone operations.
 
     The only allowed methods are POST and DELETE; any other verb will return a
     405.
-    '''
+    """
     try:
         rsrc = rsrc_api.get(uid)
     except TombstoneError as e:
@@ -409,9 +409,9 @@ def tombstone(uid):
 
 @ldp.route('/<path:uid>/fcr:versions', methods=['POST', 'PUT'])
 def post_version(uid):
-    '''
+    """
     Create a new resource version.
-    '''
+    """
     if request.method == 'PUT':
         return 'Method not allowed.', 405
     ver_uid = request.headers.get('slug', None)
@@ -430,14 +430,14 @@ def post_version(uid):
 
 @ldp.route('/<path:uid>/fcr:versions/<ver_uid>', methods=['PATCH'])
 def patch_version(uid, ver_uid):
-    '''
+    """
     Revert to a previous version.
 
     NOTE: This creates a new version snapshot.
 
-    @param uid (string) Resource UID.
-    @param ver_uid (string) Version UID.
-    '''
+    :param str uid: Resource UID.
+    :param str ver_uid: Version UID.
+    """
     try:
         LdpFactory.from_stored(uid).revert_to_version(ver_uid)
     except ResourceNotExistsError as e:
@@ -453,9 +453,9 @@ def patch_version(uid, ver_uid):
 ## PRIVATE METHODS ##
 
 def _negotiate_content(gr, headers=None, **vw_kwargs):
-    '''
+    """
     Return HTML or serialized RDF depending on accept headers.
-    '''
+    """
     if request.accept_mimetypes.best == 'text/html':
         return render_template(
                 'resource.html', gr=gr, nsc=nsc, nsm=nsm,
@@ -467,9 +467,9 @@ def _negotiate_content(gr, headers=None, **vw_kwargs):
 
 
 def _bistream_from_req():
-    '''
+    """
     Find how a binary file and its MIMEtype were uploaded in the request.
-    '''
+    """
     #logger.debug('Content type: {}'.format(request.mimetype))
     #logger.debug('files: {}'.format(request.files))
     #logger.debug('stream: {}'.format(request.stream))
@@ -508,9 +508,9 @@ def _tombstone_response(e, uid):
 
 
 def set_post_put_params():
-    '''
+    """
     Sets handling and content disposition for POST and PUT by parsing headers.
-    '''
+    """
     handling = 'strict'
     if 'prefer' in request.headers:
         prefer = g.tbox.parse_rfc7240(request.headers['prefer'])
@@ -528,10 +528,10 @@ def set_post_put_params():
 
 
 def is_accept_hdr_rdf_parsable():
-    '''
+    """
     Check if any of the 'Accept' header values provided is a RDF parsable
     format.
-    '''
+    """
     for mimetype in request.accept_mimetypes.values():
         if LdpFactory.is_rdf_parsable(mimetype):
             return True
@@ -539,14 +539,14 @@ def is_accept_hdr_rdf_parsable():
 
 
 def parse_repr_options(retr_opts):
-    '''
+    """
     Set options to retrieve IMR.
 
     Ideally, IMR retrieval is done once per request, so all the options
     are set once in the `imr()` property.
 
-    @param retr_opts (dict): Options parsed from `Prefer` header.
-    '''
+    :param dict retr_opts:: Options parsed from `Prefer` header.
+    """
     logger.debug('Parsing retrieval options: {}'.format(retr_opts))
     imr_options = {}
 
@@ -591,12 +591,12 @@ def parse_repr_options(retr_opts):
 
 
 def _headers_from_metadata(rsrc):
-    '''
+    """
     Create a dict of headers from a metadata graph.
 
-    @param rsrc (lakesuperior.model.ldpr.Ldpr) Resource to extract metadata
+    :param lakesuperior.model.ldpr.Ldpr rsrc: Resource to extract metadata
     from.
-    '''
+    """
     out_headers = defaultdict(list)
 
     digest = rsrc.metadata.value(nsc['premis'].hasMessageDigest)

+ 2 - 6
lakesuperior/endpoints/main.py

@@ -13,17 +13,13 @@ main = Blueprint('main', __name__, template_folder='templates',
 
 @main.route('/', methods=['GET'])
 def index():
-    '''
-    Homepage.
-    '''
+    """Homepage."""
     return render_template('index.html')
 
 
 @main.route('/debug', methods=['GET'])
 def debug():
-    '''
-    Debug page.
-    '''
+    """Debug page."""
     raise RuntimeError()
 
 

+ 5 - 5
lakesuperior/endpoints/query.py

@@ -20,9 +20,9 @@ query = Blueprint('query', __name__)
 
 @query.route('/term_search', methods=['GET'])
 def term_search():
-    '''
+    """
     Search by entering a search term and optional property and comparison term.
-    '''
+    """
     valid_operands = (
         ('=', 'Equals'),
         ('>', 'Greater Than'),
@@ -40,11 +40,11 @@ def term_search():
 
 @query.route('/sparql', methods=['GET', 'POST'])
 def sparql():
-    '''
+    """
     Perform a direct SPARQL query on the underlying triplestore.
 
-    @param qry SPARQL query string.
-    '''
+    :param str qry: SPARQL query string.
+    """
     accept_mimetypes = {
         'text/csv': 'csv',
         'application/sparql-results+json': 'json',

+ 14 - 18
lakesuperior/messaging/formatters.py

@@ -8,12 +8,12 @@ from lakesuperior.globals import RES_CREATED, RES_DELETED, RES_UPDATED
 
 
 class BaseASFormatter(metaclass=ABCMeta):
-    '''
+    """
     Format message as ActivityStreams.
 
     This is not really a `logging.Formatter` subclass, but a plain string
     builder.
-    '''
+    """
     ev_types = {
         RES_CREATED : 'Create',
         RES_DELETED : 'Delete',
@@ -28,7 +28,7 @@ class BaseASFormatter(metaclass=ABCMeta):
 
     def __init__(
             self, rsrc_uri, ev_type, timestamp, rsrc_type, actor, data=None):
-        '''
+        """
         Format output according to granularity level.
 
         NOTE: Granularity level does not refer to the logging levels, i.e.
@@ -36,14 +36,14 @@ class BaseASFormatter(metaclass=ABCMeta):
         are logged under the same level. This it is rather about *what* gets
         logged in a message.
 
-        @param rsrc_uri (rdflib.URIRef) URI of the resource.
-        @param ev_type (string) one of `create`, `delete` or `update`
-        @param timestamp (string) Timestamp of the event.
-        @param data (tuple(set)) if messaging is configured with `provenance`
+        :param rdflib.URIRef rsrc_uri: URI of the resource.
+        :param str ev_type: one of `create`, `delete` or `update`
+        :param str timestamp: Timestamp of the event.
+        :param  data: (tuple(set)) if messaging is configured with `provenance`
         level, this is a 2-tuple with one set (as 3-tuples of
         RDFlib.Identifier instances) for removed triples, and one set for
         added triples.
-        '''
+        """
         self.rsrc_uri = rsrc_uri
         self.ev_type = ev_type
         self.timestamp = timestamp
@@ -59,15 +59,13 @@ class BaseASFormatter(metaclass=ABCMeta):
 
 
 class ASResourceFormatter(BaseASFormatter):
-    '''
+    """
     Sends information about a resource being created, updated or deleted, by
     who and when, with no further information about what changed.
-    '''
+    """
 
     def __str__(self):
-        '''
-        Output structured data as string.
-        '''
+        """Output structured data as string."""
         ret = {
             '@context': 'https://www.w3.org/ns/activitystreams',
             'id' : 'urn:uuid:{}'.format(uuid.uuid4()),
@@ -86,15 +84,13 @@ class ASResourceFormatter(BaseASFormatter):
 
 
 class ASDeltaFormatter(BaseASFormatter):
-    '''
+    """
     Sends the same information as `ASResourceFormatter` with the addition of
     the triples that were added and the ones that were removed in the request.
     This may be used to send rich provenance data to a preservation system.
-    '''
+    """
     def __str__(self):
-        '''
-        Output structured data as string.
-        '''
+        """Output structured data as string."""
         ret = {
             '@context': 'https://www.w3.org/ns/activitystreams',
             'id' : 'urn:uuid:{}'.format(uuid.uuid4()),

+ 4 - 8
lakesuperior/messaging/handlers.py

@@ -4,13 +4,13 @@ import stomp
 
 
 class StompHandler(logging.Handler):
-    '''
+    """
     Send messages to a remote queue broker using the STOMP protocol.
 
     This module is named and configured separately from
     standard logging for clarity about its scope: while logging has an
     informational purpose, this module has a functional one.
-    '''
+    """
     def __init__(self, conf):
         self.conf = conf
         if self.conf['protocol'] == '11':
@@ -32,15 +32,11 @@ class StompHandler(logging.Handler):
 
 
     def __del_(self):
-        '''
-        Disconnect the client.
-        '''
+        """Disconnect the client."""
         self.conn.disconnect()
 
     def emit(self, record):
-        '''
-        Send the message to the destination endpoint.
-        '''
+        """Send the message to the destination endpoint."""
         self.conn.send(destination=self.conf['destination'],
                 body=bytes(self.format(record), 'utf-8'))
 

+ 6 - 8
lakesuperior/messaging/messenger.py

@@ -7,15 +7,15 @@ messenger = logging.getLogger('_messenger')
 
 
 class Messenger:
-    '''
+    """
     Very simple message sender using the standard Python logging facility.
-    '''
+    """
     def __init__(self, config):
-        '''
+        """
         Set up the messenger.
 
-        @param config (dict) Messenger configuration.
-        '''
+        :param dict config: Messenger configuration.
+        """
         def msg_routes():
             for route in config['routes']:
                 handler_cls = getattr(handlers, route['handler'])
@@ -31,8 +31,6 @@ class Messenger:
 
 
     def send(self, *args, **kwargs):
-        '''
-        Send one or more external messages.
-        '''
+        """Send one or more external messages."""
         for msg, fn in self.msg_routes:
             msg.info(fn(*args, **kwargs))

+ 20 - 18
lakesuperior/migrator.py

@@ -75,22 +75,23 @@ class Migrator:
         Set up base paths and clean up existing directories.
 
         :param rdflib.URIRef src: Webroot of source repository. This must
-        correspond to the LDP root node (for Fedora it can be e.g.
-        ``http://localhost:8080fcrepo/rest/``) and is used to determine if URIs
-        retrieved are managed by this repository.
+            correspond to the LDP root node (for Fedora it can be e.g.
+            ``http://localhost:8080fcrepo/rest/``) and is used to determine if
+            URIs retrieved are managed by this repository.
         :param str dest: Destination repository path. If the location exists
-        it must be a writable directory. It will be deleted and recreated. If
-        it does not exist, it will be created along with its parents if
-        missing.
+            it must be a writable directory. It will be deleted and recreated.
+            If it does not exist, it will be created along with its parents if
+            missing.
         :param str binary_handling: One of ``include``, ``truncate`` or
-        ``split``.
+            ``split``.
         :param bool compact_uris: NOT IMPLEMENTED. Whether the process should
-        attempt to compact URIs generated with broken up path segments. If the
-        UID matches a pattern such as `/12/34/56/123456...` it is converted to
-        `/123456...`. This would remove a lot of cruft caused by the pairtree
-        segments. Note that this will change the publicly exposed URIs. If
-        durability is a concern, a rewrite directive can be added to the HTTP
-        server that proxies the WSGI endpoint.
+            attempt to compact URIs generated with broken up path segments. If
+            the UID matches a pattern such as ``/12/34/56/123456...`` it is
+            converted to ``/123456...``. This would remove a lot of cruft
+            caused by the pairtree segments. Note that this will change the
+            publicly exposed URIs. If durability is a concern, a rewrite
+            directive can be added to the HTTP server that proxies the WSGI
+            endpoint.
         """
         # Set up repo folder structure and copy default configuration to
         # destination file.
@@ -137,11 +138,12 @@ class Migrator:
         data set contained in a folder from an LDP repository.
 
         :param start_pts: List of starting points to retrieve
-        :type start_pts: tuple or list 
-        resources from. It would typically be the repository root in case of a
-        full dump or one or more resources in the repository for a partial one.
+        :type start_pts: tuple or list
+            resources from. It would typically be the repository root in case
+            of a full dump or one or more resources in the repository for a
+            partial one.
         :param str listf_ile: path to a local file containing a list of URIs,
-        one per line.
+            one per line.
         """
         from lakesuperior.api import resource as rsrc_api
         self._ct = 0
@@ -179,7 +181,7 @@ class Migrator:
         managed by the repository is encountered.
 
         :param str uid: The path relative to the source server webroot
-        pointing to the resource to crawl, effectively the resource UID.
+            pointing to the resource to crawl, effectively the resource UID.
         """
         ibase = str(nsc['fcres'])
         # Public URI of source repo.

+ 28 - 26
lakesuperior/model/ldp_factory.py

@@ -26,10 +26,10 @@ logger = logging.getLogger(__name__)
 
 
 class LdpFactory:
-    '''
+    """
     Generate LDP instances.
     The instance classes are based on provided client data or on stored data.
-    '''
+    """
     @staticmethod
     def new_container(uid):
         if not uid.startswith('/') or uid == '/':
@@ -43,7 +43,7 @@ class LdpFactory:
 
     @staticmethod
     def from_stored(uid, repr_opts={}, **kwargs):
-        '''
+        """
         Create an instance for retrieval purposes.
 
         This factory method creates and returns an instance of an LDPR subclass
@@ -52,8 +52,8 @@ class LdpFactory:
 
         N.B. The resource must exist.
 
-        @param uid UID of the instance.
-        '''
+        :param  uid: UID of the instance.
+        """
         #logger.info('Retrieving stored resource: {}'.format(uid))
         imr_urn = nsc['fcres'][uid]
 
@@ -80,16 +80,17 @@ class LdpFactory:
     @staticmethod
     def from_provided(
             uid, mimetype=None, stream=None, provided_imr=None, **kwargs):
-        '''
+        r"""
         Determine LDP type from request content.
 
-        @param uid (string) UID of the resource to be created or updated.
-        @param mimetype (string) The provided content MIME type.
-        @param stream (IOStream | None) The provided data stream. This can be
-        RDF or non-RDF content, or None. In the latter case, an empty container
-        is created.
-        @param **kwargs Arguments passed to the LDP class constructor.
-        '''
+        :param str uid: UID of the resource to be created or updated.
+        :param str mimetype: The provided content MIME type.
+        :param stream: The provided data stream. This can be
+            RDF or non-RDF content, or None. In the latter case, an empty
+            container is created.
+        :type stream: IOStream or None
+        :param \*\*kwargs: Arguments passed to the LDP class constructor.
+        """
         uri = nsc['fcres'][uid]
 
         if not stream and not mimetype:
@@ -149,11 +150,11 @@ class LdpFactory:
 
     @staticmethod
     def is_rdf_parsable(mimetype):
-        '''
+        """
         Checks whether a MIME type support RDF parsing by a RDFLib plugin.
 
-        @param mimetype (string) MIME type to check.
-        '''
+        :param str mimetype: MIME type to check.
+        """
         try:
             plugin.get(mimetype, parser.Parser)
         except plugin.PluginException:
@@ -164,11 +165,11 @@ class LdpFactory:
 
     @staticmethod
     def is_rdf_serializable(mimetype):
-        '''
+        """
         Checks whether a MIME type support RDF serialization by a RDFLib plugin
 
-        @param mimetype (string) MIME type to check.
-        '''
+        :param str mimetype: MIME type to check.
+        """
         try:
             plugin.get(mimetype, serializer.Serializer)
         except plugin.PluginException:
@@ -179,7 +180,7 @@ class LdpFactory:
 
     @staticmethod
     def mint_uid(parent_uid, path=None):
-        '''
+        """
         Mint a new resource UID based on client directives.
 
         This method takes a parent ID and a tentative path and returns an LDP
@@ -188,13 +189,14 @@ class LdpFactory:
         This may raise an exception resulting in a 404 if the parent is not
         found or a 409 if the parent is not a valid container.
 
-        @param parent_uid (string) UID of the parent resource. It must be an
-        existing LDPC.
-        @param path (string) path to the resource, relative to the parent.
+        :param str parent_uid: UID of the parent resource. It must be an
+            existing LDPC.
+        :param str path: path to the resource, relative to the parent.
 
-        @return string The confirmed resource UID. This may be different from
-        what has been indicated.
-        '''
+        :rtype: str
+        :return: The confirmed resource UID. This may be different from
+            what has been indicated.
+        """
         def split_if_legacy(uid):
             if config['application']['store']['ldp_rs']['legacy_ptree_split']:
                 uid = tbox.split_uuid(uid)

+ 14 - 13
lakesuperior/model/ldp_nr.py

@@ -17,10 +17,10 @@ logger = logging.getLogger(__name__)
 
 
 class LdpNr(Ldpr):
-    '''LDP-NR (Non-RDF Source).
+    """LDP-NR (Non-RDF Source).
 
     Definition: https://www.w3.org/TR/ldp/#ldpnr
-    '''
+    """
 
     base_types = {
         nsc['fcrepo'].Binary,
@@ -31,9 +31,9 @@ class LdpNr(Ldpr):
 
     def __init__(self, uuid, stream=None, mimetype=None,
             disposition=None, **kwargs):
-        '''
+        """
         Extends Ldpr.__init__ by adding LDP-NR specific parameters.
-        '''
+        """
         super().__init__(uuid, **kwargs)
 
         self._imr_options = {}
@@ -68,11 +68,12 @@ class LdpNr(Ldpr):
 
 
     def create_or_replace(self, create_only=False):
-        '''
+        """
         Create a new binary resource with a corresponding RDF representation.
 
-        @param file (Stream) A Stream resource representing the uploaded file.
-        '''
+        :param bool create_only: Whether the resource is being created or
+            updated.
+        """
         # Persist the stream.
         self.digest, self.size = nonrdfly.persist(self.stream)
 
@@ -91,14 +92,14 @@ class LdpNr(Ldpr):
     ## PROTECTED METHODS ##
 
     def _add_srv_mgd_triples(self, create=False):
-        '''
+        """
         Add all metadata for the RDF representation of the LDP-NR.
 
-        @param stream (BufferedIO) The uploaded data stream.
-        @param mimetype (string) MIME type of the uploaded file.
-        @param disposition (defaultdict) The `Content-Disposition` header
-        content, parsed through `parse_rfc7240`.
-        '''
+        :param BufferedIO stream: The uploaded data stream.
+        :param str mimetype: MIME type of the uploaded file.
+        :param defaultdict disposition: The ``Content-Disposition`` header
+            content, parsed through ``parse_rfc7240``.
+        """
         super()._add_srv_mgd_triples(create)
 
         # File size.

+ 15 - 17
lakesuperior/model/ldp_rs.py

@@ -12,20 +12,21 @@ logger = logging.getLogger(__name__)
 
 
 class LdpRs(Ldpr):
-    '''
+    """
     LDP-RS (LDP RDF source).
 
     https://www.w3.org/TR/ldp/#ldprs
-    '''
+    """
     def __init__(self, uuid, repr_opts={}, handling='lenient', **kwargs):
-        '''
-        Extends Ldpr.__init__ by adding LDP-RS specific parameters.
-
-        @param handling (string) One of `strict`, `lenient` (the default) or
-        `none`. `strict` raises an error if a server-managed term is in the
-        graph. `lenient` removes all sever-managed triples encountered. `none`
-        skips all server-managed checks. It is used for internal modifications.
-        '''
+        """
+        Extends :meth:`Ldpr.__init__`by adding LDP-RS specific parameters.
+
+        :param str handling: One of ``strict``, ``lenient`` (the default) or
+        ``none``. ``strict`` raises an error if a server-managed term is in the
+        graph. ``lenient`` removes all sever-managed triples encountered.
+        ``none`` skips all server-managed checks. It is used for internal
+        modifications.
+        """
         super().__init__(uuid, **kwargs)
         self.base_types = super().base_types | {
             nsc['fcrepo'].Container,
@@ -44,8 +45,7 @@ class LdpRs(Ldpr):
 
 
 class Ldpc(LdpRs):
-    '''LDPC (LDP Container).'''
-
+    """LDPC (LDP Container)."""
     def __init__(self, uuid, *args, **kwargs):
         super().__init__(uuid, *args, **kwargs)
         self.base_types |= {
@@ -56,7 +56,7 @@ class Ldpc(LdpRs):
 
 
 class LdpBc(Ldpc):
-    '''LDP-BC (LDP Basic Container).'''
+    """LDP-BC (LDP Basic Container)."""
     def __init__(self, uuid, *args, **kwargs):
         super().__init__(uuid, *args, **kwargs)
         self.base_types |= {
@@ -66,8 +66,7 @@ class LdpBc(Ldpc):
 
 
 class LdpDc(Ldpc):
-    '''LDP-DC (LDP Direct Container).'''
-
+    """LDP-DC (LDP Direct Container)."""
     def __init__(self, uuid, *args, **kwargs):
         super().__init__(uuid, *args, **kwargs)
         self.base_types |= {
@@ -77,8 +76,7 @@ class LdpDc(Ldpc):
 
 
 class LdpIc(Ldpc):
-    '''LDP-IC (LDP Indirect Container).'''
-
+    """LDP-IC (LDP Indirect Container)."""
     def __init__(self, uuid, *args, **kwargs):
         super().__init__(uuid, *args, **kwargs)
         self.base_types |= {

+ 44 - 44
lakesuperior/model/ldpr.py

@@ -36,21 +36,16 @@ class Ldpr(metaclass=ABCMeta):
     the vanilla LDP specifications. This is extended by the
     `lakesuperior.fcrepo.Resource` class.
 
-    Inheritance graph: https://www.w3.org/TR/ldp/#fig-ldpc-types
+    See inheritance graph: https://www.w3.org/TR/ldp/#fig-ldpc-types
 
-    Note: Even though LdpNr (which is a subclass of Ldpr) handles binary files,
-    it still has an RDF representation in the triplestore. Hence, some of the
-    RDF-related methods are defined in this class rather than in the LdpRs
-    class.
+    **Note**: Even though LdpNr (which is a subclass of Ldpr) handles binary
+    files, it still has an RDF representation in the triplestore. Hence, some
+    of the RDF-related methods are defined in this class rather than in
+    :class:`~lakesuperior.model.ldp_rs.LdpRs`.
 
-    Convention notes:
-
-    All the methods in this class handle internal UUIDs (URN). Public-facing
-    URIs are converted from URNs and passed by these methods to the methods
-    handling HTTP negotiation.
-
-    The data passed to the store layout for processing should be in a graph.
-    All conversion from request payload strings is done here.
+    **Note:** Only internal facing (``info:fcres``-prefixed) URIs are handled
+    in this class. Public-facing URI conversion is handled in the
+    :mod:`~lakesuperior.endpoints.ldp` module.
     """
 
     EMBED_CHILD_RES_URI = nsc['fcrepo'].EmbedResources
@@ -67,33 +62,35 @@ class Ldpr(metaclass=ABCMeta):
     WRKF_INBOUND = '_workflow:inbound_'
     WRKF_OUTBOUND = '_workflow:outbound_'
 
-    # Default user to be used for the `createdBy` and `lastUpdatedBy` if a user
-    # is not provided.
     DEFAULT_USER = Literal('BypassAdmin')
+    """
+    Default user to be used for the `createdBy` and `lastUpdatedBy` if a user
+    is not provided.
+    """
 
-    # RDF Types that populate a new resource.
     base_types = {
         nsc['fcrepo'].Resource,
         nsc['ldp'].Resource,
         nsc['ldp'].RDFSource,
     }
+    """RDF Types that populate a new resource."""
 
-    # Predicates that do not get removed when a resource is replaced.
     protected_pred = (
         nsc['fcrepo'].created,
         nsc['fcrepo'].createdBy,
         nsc['ldp'].contains,
     )
+    """Predicates that do not get removed when a resource is replaced."""
 
-    # Server-managed RDF types ignored in the RDF payload if the resource is
-    # being created. N.B. These still raise an error if the resource exists.
     smt_allow_on_create = {
         nsc['ldp'].DirectContainer,
         nsc['ldp'].IndirectContainer,
     }
+    """
+    Server-managed RDF types ignored in the RDF payload if the resource is
+    being created. N.B. These still raise an error if the resource exists.
+    """
 
-
-    # Predicates to remove when a resource is replaced.
     delete_preds_on_replace = {
         nsc['ebucore'].hasMimeType,
         nsc['fcrepo'].lastModified,
@@ -101,13 +98,14 @@ class Ldpr(metaclass=ABCMeta):
         nsc['premis'].hasSize,
         nsc['premis'].hasMessageDigest,
     }
+    """Predicates to remove when a resource is replaced."""
 
 
     ## MAGIC METHODS ##
 
     def __init__(self, uid, repr_opts={}, provided_imr=None, **kwargs):
-        """Instantiate an in-memory LDP resource that can be loaded from and
-        persisted to storage.
+        """
+        Instantiate an in-memory LDP resource.
 
         :param str uid: uid of the resource. If None (must be explicitly
         set) it refers to the root node. It can also be the full URI or URN,
@@ -136,7 +134,7 @@ class Ldpr(metaclass=ABCMeta):
         The RDFLib resource representing this LDPR. This is a live
         representation of the stored data if present.
 
-        @return rdflib.resource.Resource
+        :rtype: rdflib.resource.Resource
         """
         if not hasattr(self, '_rsrc'):
             self._rsrc = rdfly.ds.resource(self.uri)
@@ -285,7 +283,7 @@ class Ldpr(metaclass=ABCMeta):
     def types(self):
         """All RDF types.
 
-        @return set(rdflib.term.URIRef)
+        :rtype: set(rdflib.term.URIRef)
         """
         if not hasattr(self, '_types'):
             if len(self.metadata.graph):
@@ -305,7 +303,7 @@ class Ldpr(metaclass=ABCMeta):
     def ldp_types(self):
         """The LDP types.
 
-        @return set(rdflib.term.URIRef)
+        :rtype: set(rdflib.term.URIRef)
         """
         if not hasattr(self, '_ldp_types'):
             self._ldp_types = {t for t in self.types if nsc['ldp'] in t}
@@ -385,9 +383,10 @@ class Ldpr(metaclass=ABCMeta):
         Delete a single resource and create a tombstone.
 
         :param boolean inbound: Whether to delete the inbound relationships.
-        :param URIRef tstone_pointer: If set to a URN, this creates a pointer
-        to the tombstone of the resource that used to contain the deleted
-        resource. Otherwise the deleted resource becomes a tombstone.
+        :param rdflib.URIRef tstone_pointer: If set to a URN, this creates a
+            pointer to the tombstone of the resource that used to contain the
+            deleted resource. Otherwise the deleted resource becomes a
+            tombstone.
         """
         logger.info('Burying resource {}'.format(self.uid))
         # Create a backup snapshot for resurrection purposes.
@@ -486,14 +485,14 @@ class Ldpr(metaclass=ABCMeta):
         """
         tstone_trp = set(rdfly.extract_imr(self.uid, strict=False).graph)
 
-        ver_rsp = self.version_info.graph.query("""
+        ver_rsp = self.version_info.graph.query('''
         SELECT ?uid {
           ?latest fcrepo:hasVersionLabel ?uid ;
             fcrepo:created ?ts .
         }
         ORDER BY DESC(?ts)
         LIMIT 1
-        """)
+        ''')
         ver_uid = str(ver_rsp.bindings[0]['uid'])
         ver_trp = set(rdfly.get_metadata(self.uid, ver_uid).graph)
 
@@ -520,12 +519,12 @@ class Ldpr(metaclass=ABCMeta):
         """
         Create a new version of the resource.
 
-        NOTE: This creates an event only for the resource being updated (due
-        to the added `hasVersion` triple and possibly to the `hasVersions` one)
-        but not for the version being created.
+        **Note:** This creates an event only for the resource being updated
+        (due to the added `hasVersion` triple and possibly to the
+        ``hasVersions`` one) but not for the version being created.
 
-        :param  ver_uid: Version ver_uid. If already existing, an exception is
-        raised.
+        :param str ver_uid: Version UID. If already existing, a new version UID
+            is minted.
         """
         if not ver_uid or ver_uid in self.version_uids:
             ver_uid = str(uuid4())
@@ -539,7 +538,7 @@ class Ldpr(metaclass=ABCMeta):
 
         :param str ver_uid: Version UID.
         :param boolean backup: Whether to create a backup snapshot. Default is
-        true.
+            True.
         """
         # Create a backup snapshot.
         if backup:
@@ -686,7 +685,7 @@ class Ldpr(metaclass=ABCMeta):
         """
         Add server-managed triples to a provided IMR.
 
-        :param  create: Whether the resource is being created.
+        :param create: Whether the resource is being created.
         """
         # Base LDP types.
         for t in self.base_types:
@@ -725,11 +724,11 @@ class Ldpr(metaclass=ABCMeta):
         is found.
 
         E.g. if only fcres:/a exists:
-        - If fcres:/a/b/c/d is being created, a becomes container of
-          fcres:/a/b/c/d. Also, containers are created for fcres:a/b and
-          fcres:/a/b/c.
-        - If fcres:/e is being created, the root node becomes container of
-          fcres:/e.
+        - If ``fcres:/a/b/c/d`` is being created, a becomes container of
+          ``fcres:/a/b/c/d``. Also, containers are created for fcres:a/b and
+          ``fcres:/a/b/c``.
+        - If ``fcres:/e`` is being created, the root node becomes container of
+          ``fcres:/e``.
 
         :param bool create: Whether the resource is being created. If false,
         the parent container is not updated.
@@ -775,7 +774,8 @@ class Ldpr(metaclass=ABCMeta):
         Remove duplicate triples from add and remove delta graphs, which would
         otherwise contain unnecessary statements that annul each other.
 
-        @return tuple 2 "clean" sets of respectively remove statements and
+        :rtype: tuple
+        :return: 2 "clean" sets of respectively remove statements and
         add statements.
         """
         return (

+ 7 - 13
lakesuperior/store/ldp_nr/base_non_rdf_layout.py

@@ -7,18 +7,18 @@ logger = logging.getLogger(__name__)
 
 
 class BaseNonRdfLayout(metaclass=ABCMeta):
-    '''
+    """
     Abstract class for setting the non-RDF (bitstream) store layout.
 
     Differerent layouts can be created by implementing all the abstract methods
     of this class. A non-RDF layout is not necessarily restricted to a
     traditional filesystem—e.g. a layout persisting to HDFS can be written too.
-    '''
+    """
 
     def __init__(self, config):
-        '''
+        """
         Initialize the base non-RDF store layout.
-        '''
+        """
         self.config = config
         self.root = config['path']
 
@@ -27,23 +27,17 @@ class BaseNonRdfLayout(metaclass=ABCMeta):
 
     @abstractmethod
     def persist(self, stream):
-        '''
-        Store the stream in the designated persistence layer for this layout.
-        '''
+        """Store the stream in the designated persistence layer."""
         pass
 
 
     @abstractmethod
     def delete(self, id):
-        '''
-        Delete a stream by its identifier (i.e. checksum).
-        '''
+        """Delete a stream by its identifier (i.e. checksum)."""
         pass
 
 
     @abstractmethod
     def local_path(self, uuid):
-        '''
-        Return the local path of a file.
-        '''
+        """Return the local path of a file."""
         pass

+ 15 - 18
lakesuperior/store/ldp_nr/default_layout.py

@@ -12,18 +12,21 @@ logger = logging.getLogger(__name__)
 
 
 class DefaultLayout(BaseNonRdfLayout):
-    '''
+    """
     Default file layout.
-    '''
+
+    This is a simple filesystem layout that stores binaries in pairtree folders
+    in a local filesystem. Parameters can be specified for the 
+    """
     @staticmethod
     def local_path(root, uuid, bl=4, bc=4):
-        '''
+        """
         Generate the resource path splitting the resource checksum according to
         configuration parameters.
 
-        @param uuid (string) The resource UUID. This corresponds to the content
+        :param str uuid: The resource UUID. This corresponds to the content
         checksum.
-        '''
+        """
         logger.debug('Generating path from uuid: {}'.format(uuid))
         term = len(uuid) if bc == 0 else min(bc * bl, len(uuid))
 
@@ -37,9 +40,7 @@ class DefaultLayout(BaseNonRdfLayout):
 
 
     def __init__(self, *args, **kwargs):
-        '''
-        Set up path segmentation parameters.
-        '''
+        """Set up path segmentation parameters."""
         super().__init__(*args, **kwargs)
 
         self.bl = self.config['pairtree_branch_length']
@@ -49,9 +50,7 @@ class DefaultLayout(BaseNonRdfLayout):
     ## INTERFACE METHODS ##
 
     def bootstrap(self):
-        '''
-        Initialize binary file store.
-        '''
+        """Initialize binary file store."""
         try:
             shutil.rmtree(self.root)
         except FileNotFoundError:
@@ -60,7 +59,7 @@ class DefaultLayout(BaseNonRdfLayout):
 
 
     def persist(self, stream, bufsize=8192):
-        '''
+        """
         Store the stream in the file system.
 
         This method handles the file in chunks. for each chunk it writes to a
@@ -68,9 +67,9 @@ class DefaultLayout(BaseNonRdfLayout):
         to disk and hashed, the temp file is moved to its final location which
         is determined by the hash value.
 
-        @param stream (IOstream): file-like object to persist.
-        @param bufsize (int) Chunk size. 2**12 to 2**15 is a good range.
-        '''
+        :param IOstream stream:: file-like object to persist.
+        :param int bufsize: Chunk size. 2**12 to 2**15 is a good range.
+        """
         tmp_file = '{}/tmp/{}'.format(self.root, uuid4())
         try:
             with open(tmp_file, 'wb') as f:
@@ -111,7 +110,5 @@ class DefaultLayout(BaseNonRdfLayout):
 
 
     def delete(self, uuid):
-        '''
-        See BaseNonRdfLayout.delete.
-        '''
+        """See BaseNonRdfLayout.delete."""
         os.unlink(__class__.local_path(self.root, uuid, self.bl, self.bc))

+ 184 - 184
lakesuperior/store/ldp_rs/lmdb_store.py

@@ -19,21 +19,21 @@ logger = logging.getLogger(__name__)
 
 
 def s2b(u, enc='UTF-8'):
-    '''
+    """
     Convert a string into a bytes object.
-    '''
+    """
     return u.encode(enc)
 
 
 def b2s(u, enc='UTF-8'):
-    '''
+    """
     Convert a bytes or memoryview object into a string.
-    '''
+    """
     return bytes(u).decode(enc)
 
 
 class TxnManager(ContextDecorator):
-    '''
+    """
     Handle ACID transactions with an LmdbStore.
 
     Wrap this within a `with` statement:
@@ -43,15 +43,15 @@ class TxnManager(ContextDecorator):
     >>>
 
     The transaction will be opened and handled automatically.
-    '''
+    """
     def __init__(self, store, write=False):
-        '''
+        """
         Begin and close a transaction in a store.
 
-        @param store (LmdbStore) The store to open a transaction on.
-        @param write (bool) Whether the transaction is read-write. Default is
+        :param LmdbStore store: The store to open a transaction on.
+        :param bool write: Whether the transaction is read-write. Default is
         False (read-only transaction).
-        '''
+        """
         self.store = store
         self.write = write
 
@@ -69,34 +69,34 @@ class TxnManager(ContextDecorator):
 
 
 class LexicalSequence:
-    '''
+    """
     Fixed-length lexicographically ordered byte sequence.
 
     Useful to generate optimized sequences of keys in LMDB.
-    '''
+    """
     def __init__(self, start=1, max_len=5):
-        '''
-        @param start (bytes) Starting byte value. Bytes below this value are
+        """
+        Create a new lexical sequence.
+
+        :param bytes start: Starting byte value. Bytes below this value are
         never found in this sequence. This is useful to allot special bytes
         to be used e.g. as separators.
-        @param max_len (int) Maximum number of bytes that a byte string can
+        :param int max_len: Maximum number of bytes that a byte string can
         contain. This should be chosen carefully since the number of all
         possible key combinations is determined by this value and the `start`
         value. The default args provide 255**5 (~1 Tn) unique combinations.
-        '''
+        """
         self.start = start
         self.length = max_len
 
 
     def first(self):
-        '''
-        First possible combination.
-        '''
+        """First possible combination."""
         return bytearray([self.start] * self.length)
 
 
     def next(self, n):
-        '''
+        """
         Calculate the next closest byte sequence in lexicographical order.
 
         This is used to fill the next available slot after the last one in
@@ -107,8 +107,8 @@ class LexicalSequence:
         This function assumes that all the keys are padded with the `start`
         value up to the `max_len` length.
 
-        @param n (bytes) Current byte sequence to add to.
-        '''
+        :param bytes n: Current byte sequence to add to.
+        """
         if not n:
             n = self.first()
         elif isinstance(n, bytes) or isinstance(n, memoryview):
@@ -137,7 +137,7 @@ class LexicalSequence:
 
 
 class LmdbStore(Store):
-    '''
+    """
     LMDB-backed store.
 
     This is an implementation of the RDFLib Store interface:
@@ -172,7 +172,7 @@ class LmdbStore(Store):
     (also in a SPARQL query) will look in the  union graph instead of in the
     default graph. Also, removing triples without specifying a context will
     remove triples from all contexts.
-    '''
+    """
 
     context_aware = True
     # This is a hassle to maintain for no apparent gain. If some use is devised
@@ -181,19 +181,18 @@ class LmdbStore(Store):
     graph_aware = True
     transaction_aware = True
 
-    '''
-    LMDB map size. See http://lmdb.readthedocs.io/en/release/#environment-class
-    '''
     MAP_SIZE = 1024 ** 4 # 1Tb
+    """
+    LMDB map size. See http://lmdb.readthedocs.io/en/release/#environment-class
+    """
 
-    '''
-    Key hashing algorithm. If you are paranoid, use SHA1. Otherwise, MD5 is
-    faster and takes up less space (16 bytes vs. 20 bytes). This may make a
-    visible difference because keys are generated and parsed very often.
-    '''
-    KEY_HASH_ALGO = 'sha1'
+    TERM_HASH_ALGO = 'sha1'
+    """
+    Term hashing algorithm. SHA1 is the default.
+    """
 
-    '''
+    KEY_LENGTH = 5
+    """
     Fixed length for term keys.
 
     4 or 5 is a safe range. 4 allows for ~4 billion (256 ** 4) unique terms
@@ -209,14 +208,13 @@ class LmdbStore(Store):
     could improve performance since keys make up the vast majority of record
     exchange between the store and the application. However it is sensible not
     to expose this value as a configuration option.
-    '''
-    KEY_LENGTH = 5
+    """
 
-    '''
-    Lexical sequence start. `\x01` is fine since no special characters are used,
-    but it's good to leave a spare for potential future use.
-    '''
     KEY_START = 1
+    """
+    Lexical sequence start. ``\\x01`` is fine since no special characters are
+    used, but it's good to leave a spare for potential future use.
+    """
 
     data_keys = (
         # Term key to serialized term content: 1:1
@@ -237,24 +235,24 @@ class LmdbStore(Store):
         's:po', 'p:so', 'o:sp', 'c:spo',
     )
 
-    '''
+    _lookup_rank = ('s', 'o', 'p')
+    """
     Order in which keys are looked up if two terms are bound.
     The indices with the smallest average number of values per key should be
     looked up first.
 
     If we want to get fancy, this can be rebalanced from time to time by
     looking up the number of keys in (s:po, p:so, o:sp).
-    '''
-    _lookup_rank = ('s', 'o', 'p')
+    """
 
-    '''
-    Order of terms in the lookup indices. Used to rebuild a triple from lookup.
-    '''
     _lookup_ordering = {
         's:po': (0, 1, 2),
         'p:so': (1, 0, 2),
         'o:sp': (2, 0, 1),
     }
+    """
+    Order of terms in the lookup indices. Used to rebuild a triple from lookup.
+    """
 
     data_env = None
     idx_env = None
@@ -279,19 +277,17 @@ class LmdbStore(Store):
 
 
     def __del__(self):
-        '''
-        Properly close store for garbage collection.
-        '''
+        """Properly close store for garbage collection."""
         self.close(True)
 
 
     def __len__(self, context=None):
-        '''
+        """
         Return length of the dataset.
 
-        @param context (rdflib.URIRef | rdflib.Graph) Context to restrict count
-        to.
-        '''
+        :param context: Context to restrict count to.
+        :type context: rdflib.URIRef or rdflib.Graph
+        """
         context = self._normalize_context(context)
 
         if context is not None:
@@ -311,7 +307,7 @@ class LmdbStore(Store):
 
 
     def open(self, configuration=None, create=True):
-        '''
+        """
         Open the database.
 
         The database is best left open for the lifespan of the server. Read
@@ -321,7 +317,7 @@ class LmdbStore(Store):
 
         This method is called outside of the main transaction. All cursors
         are created separately within the transaction.
-        '''
+        """
         self._init_db_environments(create)
         if self.data_env == NO_STORE:
             return NO_STORE
@@ -331,9 +327,9 @@ class LmdbStore(Store):
 
 
     def begin(self, write=False):
-        '''
+        """
         Begin the main write transaction and create cursors.
-        '''
+        """
         if not self.is_open:
             raise RuntimeError('Store must be opened first.')
         logger.debug('Beginning a {} transaction.'.format(
@@ -346,9 +342,7 @@ class LmdbStore(Store):
 
 
     def stats(self):
-        '''
-        Gather statistics about the database.
-        '''
+        """Gather statistics about the database."""
         stats = {
             'data_db_stats': {
                 db_label: self.data_txn.stat(self.dbs[db_label])
@@ -368,9 +362,7 @@ class LmdbStore(Store):
 
     @property
     def is_txn_open(self):
-        '''
-        Whether the main transaction is open.
-        '''
+        """Whether the main transaction is open."""
         try:
             self.data_txn.id()
             self.idx_txn.id()
@@ -383,9 +375,7 @@ class LmdbStore(Store):
 
 
     def cur(self, index):
-        '''
-        Return a new cursor by its index.
-        '''
+        """Return a new cursor by its index."""
         if index in self.idx_keys:
             txn = self.idx_txn
             src = self.idx_keys
@@ -399,14 +389,14 @@ class LmdbStore(Store):
 
 
     def get_data_cursors(self, txn):
-        '''
+        """
         Build the main data cursors for a transaction.
 
-        @param txn (lmdb.Transaction) This can be a read or write transaction.
+        :param lmdb.Transaction txn: This can be a read or write transaction.
 
-        @return dict(string, lmdb.Cursor) Keys are index labels, values are
-        index cursors.
-        '''
+        :rtype: dict(string, lmdb.Cursor)
+        :return: Keys are index labels, values are index cursors.
+        """
         return {
             'tk:t': txn.cursor(self.dbs['tk:t']),
             'tk:c': txn.cursor(self.dbs['tk:c']),
@@ -415,25 +405,25 @@ class LmdbStore(Store):
 
 
     def get_idx_cursors(self, txn):
-        '''
+        """
         Build the index cursors for a transaction.
 
-        @param txn (lmdb.Transaction) This can be a read or write transaction.
+        :param lmdb.Transaction txn: This can be a read or write transaction.
 
-        @return dict(string, lmdb.Cursor) Keys are index labels, values are
-        index cursors.
-        '''
+        :rtype: dict(string, lmdb.Cursor)
+        :return: dict of index labels, index cursors.
+        """
         return {
             key: txn.cursor(self.dbs[key])
             for key in self.idx_keys}
 
 
     def close(self, commit_pending_transaction=False):
-        '''
+        """
         Close the database connection.
 
         Do this at server shutdown.
-        '''
+        """
         self.__open = False
         if self.is_txn_open:
             if commit_pending_transaction:
@@ -446,26 +436,27 @@ class LmdbStore(Store):
 
 
     def destroy(self, path):
-        '''
+        """
         Destroy the store.
 
         https://www.youtube.com/watch?v=lIVq7FJnPwg
 
-        @param path (string) Path of the folder containing the database(s).
-        '''
+        :param str path: Path of the folder containing the database(s).
+        """
         if exists(path):
             rmtree(path)
 
 
     def add(self, triple, context=None, quoted=False):
-        '''
+        """
         Add a triple and start indexing.
 
-        @param triple (tuple:rdflib.Identifier) Tuple of three identifiers.
-        @param context (rdflib.Identifier | None) Context identifier.
-        'None' inserts in the default graph.
-        @param quoted (bool) Not used.
-        '''
+        :param tuple(rdflib.Identifier) triple: Tuple of three identifiers.
+        :param context: Context identifier. ``None`` inserts in the default
+        graph.
+        :type context: rdflib.Identifier or None
+        :param bool quoted: Not used.
+        """
         context = self._normalize_context(context)
         if context is None:
             context = RDFLIB_DEFAULT_GRAPH_URI
@@ -512,16 +503,16 @@ class LmdbStore(Store):
 
 
     def remove(self, triple_pattern, context=None):
-        '''
+        """
         Remove triples by a pattern.
 
-        @param triple_pattern (tuple:rdflib.term.Identifier|None) 3-tuple of
+        :param tuple:rdflib.term.Identifier|None triple_pattern: 3-tuple of
         either RDF terms or None, indicating the triple(s) to be removed.
         None is used as a wildcard.
-        @param context (rdflib.term.Identifier|None) Context to remove the
-        triples from. If None (the default) the matching triples are removed
-        from all contexts.
-        '''
+        :param context: Context to remove the triples from. If None (the
+        default) the matching triples are removed from all contexts.
+        :type context: rdflib.term.Identifier or None
+        """
         #logger.debug('Removing triples by pattern: {} on context: {}'.format(
         #    triple_pattern, context))
         context = self._normalize_context(context)
@@ -562,18 +553,18 @@ class LmdbStore(Store):
 
 
     def triples(self, triple_pattern, context=None):
-        '''
+        """
         Generator over matching triples.
 
-        @param triple_pattern (tuple) 3 RDFLib terms
-        @param context (rdflib.Graph | None) Context graph, if available.
+        :param tuple triple_pattern: 3 RDFLib terms
+        :param rdflib.Graph | None context: Context graph, if available.
 
-        @return Generator over triples and contexts in which each result has
+        :return: Generator over triples and contexts in which each result has
         the following format:
         > (s, p, o), generator(contexts)
         Where the contexts generator lists all context that the triple appears
         in.
-        '''
+        """
         #logger.debug('Getting triples for pattern: {} and context: {}'.format(
         #    triple_pattern, context))
         # This sounds strange, RDFLib should be passing None at this point,
@@ -620,12 +611,12 @@ class LmdbStore(Store):
 
 
     def bind(self, prefix, namespace):
-        '''
+        """
         Bind a prefix to a namespace.
 
-        @param prefix (string) Namespace prefix.
-        @param namespace (rdflib.URIRef) Fully qualified URI of namespace.
-        '''
+        :param str prefix: Namespace prefix.
+        :param rdflib.URIRef namespace: Fully qualified URI of namespace.
+        """
         prefix = s2b(prefix)
         namespace = s2b(namespace)
         if self.is_txn_rw:
@@ -643,44 +634,47 @@ class LmdbStore(Store):
 
 
     def namespace(self, prefix):
-        '''
+        """
         Get the namespace for a prefix.
-        @param prefix (string) Namespace prefix.
-        '''
+        :param str prefix: Namespace prefix.
+        """
         with self.cur('pfx:ns') as cur:
             ns = cur.get(s2b(prefix))
             return Namespace(b2s(ns)) if ns is not None else None
 
 
     def prefix(self, namespace):
-        '''
+        """
         Get the prefix associated with a namespace.
 
-        @NOTE A namespace can be only bound to one prefix in this
+        **Note:** A namespace can be only bound to one prefix in this
         implementation.
 
-        @param namespace (rdflib.URIRef) Fully qualified URI of namespace.
-        '''
+        :param rdflib.Namespace namespace: Fully qualified namespace.
+
+        :rtype: str or None
+        """
         with self.cur('ns:pfx') as cur:
             prefix = cur.get(s2b(namespace))
             return b2s(prefix) if prefix is not None else None
 
 
     def namespaces(self):
-        '''
-        Get an iterator of all prefix: namespace bindings.
-        '''
+        """Get an iterator of all prefix: namespace bindings.
+
+        :rtype: Iterator(tuple(str, rdflib.Namespace))
+        """
         with self.cur('pfx:ns') as cur:
             for pfx, ns in iter(cur):
                 yield (b2s(pfx), Namespace(b2s(ns)))
 
 
     def contexts(self, triple=None):
-        '''
+        """
         Get a list of all contexts.
 
-        @return generator(Graph)
-        '''
+        :rtype: Iterator(rdflib.Graph)
+        """
         if triple and any(triple):
             with self.cur('spo:c') as cur:
                 if cur.set_key(self._to_key(triple)):
@@ -695,7 +689,7 @@ class LmdbStore(Store):
 
 
     def add_graph(self, graph):
-        '''
+        """
         Add a graph to the database.
 
         This creates an empty graph by associating the graph URI with the
@@ -707,8 +701,8 @@ class LmdbStore(Store):
         Therefore it needs to open a write transaction. This is not ideal
         but the only way to handle datasets in RDFLib.
 
-        @param graph (URIRef) URI of the named graph to add.
-        '''
+        :param rdflib.URIRef graph: URI of the named graph to add.
+        """
         if isinstance(graph, Graph):
             graph = graph.identifier
         pk_c = self._pickle(graph)
@@ -738,11 +732,11 @@ class LmdbStore(Store):
 
 
     def remove_graph(self, graph):
-        '''
+        """
         Remove all triples from graph and the graph itself.
 
-        @param graph (URIRef) URI of the named graph to remove.
-        '''
+        :param rdflib.URIRef graph: URI of the named graph to remove.
+        """
         if isinstance(graph, Graph):
             graph = graph.identifier
         self.remove((None, None, None), graph)
@@ -753,9 +747,7 @@ class LmdbStore(Store):
 
 
     def commit(self):
-        '''
-        Commit main transaction and push action queue.
-        '''
+        """Commit main transaction."""
         logger.debug('Committing transaction.')
         try:
             self.data_txn.commit()
@@ -769,9 +761,7 @@ class LmdbStore(Store):
 
 
     def rollback(self):
-        '''
-        Roll back main transaction.
-        '''
+        """Roll back main transaction."""
         logger.debug('Rolling back transaction.')
         try:
             self.data_txn.abort()
@@ -787,16 +777,17 @@ class LmdbStore(Store):
     ## PRIVATE METHODS ##
 
     def _triple_keys(self, triple_pattern, context=None):
-        '''
+        """
         Generator over matching triple keys.
 
         This method is used by `triples` which returns native Python tuples,
         as well as by other methods that need to iterate and filter triple
         keys without incurring in the overhead of converting them to triples.
 
-        @param triple_pattern (tuple) 3 RDFLib terms
-        @param context (rdflib.Graph | None) Context graph or URI, or None.
-        '''
+        :param tuple triple_pattern: 3 RDFLib terms
+        :param context: Context graph or URI, or None.
+        :type context: rdflib.term.Identifier or None
+        """
         if context == self:
             context = None
 
@@ -842,16 +833,16 @@ class LmdbStore(Store):
 
 
     def _init_db_environments(self, create=True):
-        '''
+        """
         Initialize the DB environment.
 
         The main database is kept in one file, the indices in a separate one
         (these may be even further split up depending on performance
         considerations).
 
-        @param create (bool) If True, the environment and its databases are
+        :param bool create: If True, the environment and its databases are
         created.
-        '''
+        """
         path = self.path
         if not exists(path):
             if create is True:
@@ -892,14 +883,17 @@ class LmdbStore(Store):
 
 
     def _from_key(self, key):
-        '''
+        """
         Convert a key into one or more terms.
 
-        @param key (bytes | memoryview) The key to be converted. It can be a
+        :param key: The key to be converted. It can be a
+        :type key: bytes or memoryview
         compound one in which case the function will return multiple terms.
 
-        @return tuple
-        '''
+        :rtype: tuple(rdflib.term.Identifier)
+        :return: The term(s) associated with the key(s). The result is always
+        a tuple even for single results.
+        """
         with self.cur('t:st') as cur:
             return tuple(
                    self._unpickle(cur.get(k))
@@ -907,20 +901,21 @@ class LmdbStore(Store):
 
 
     def _to_key(self, obj):
-        '''
+        """
         Convert a triple, quad or term into a key.
 
         The key is the checksum of the pickled object, therefore unique for
-        that object. The hashing algorithm is specified in `KEY_HASH_ALGO`.
+        that object. The hashing algorithm is specified in `TERM_HASH_ALGO`.
 
-        @param obj (Object) Anything that can be reduced to terms stored in the
+        :param Object obj: Anything that can be reduced to terms stored in the
         database. Pairs of terms, as well as triples and quads, are expressed
         as tuples.
 
         If more than one term is provided, the keys are concatenated.
 
-        @return bytes
-        '''
+        :rtype: memoryview
+        :return: Keys stored for the term(s)
+        """
         if not isinstance(obj, list) and not isinstance(obj, tuple):
             obj = (obj,)
         key = []
@@ -936,33 +931,33 @@ class LmdbStore(Store):
 
 
     def _hash(self, s):
-        '''
-        Get the hash value of a serialized object.
-        '''
-        return hashlib.new(self.KEY_HASH_ALGO, s).digest()
+        """Get the hash value of a serialized object."""
+        return hashlib.new(self.TERM_HASH_ALGO, s).digest()
 
 
     def _split_key(self, keys):
-        '''
+        """
         Split a compound key into individual keys.
 
         This method relies on the fixed length of all term keys.
 
-        @param keys (bytes | memoryview) Concatenated keys.
+        :param keys: Concatenated keys.
+        :type keys: bytes or memoryview
 
-        @return tuple: bytes | memoryview
-        '''
+        :rtype: tuple(memoryview)
+        """
         return tuple(
                 keys[i:i+self.KEY_LENGTH]
                 for i in range(0, len(keys), self.KEY_LENGTH))
 
 
     def _normalize_context(self, context):
-        '''
+        """
         Normalize a context parameter to conform to the model expectations.
 
-        @param context (URIRef | Graph | None) Context URI or graph.
-        '''
+        :param context: Context URI or graph.
+        :type context: URIRef or Graph or None
+        """
         if isinstance(context, Graph):
             if context == self or isinstance(context.identifier, Variable):
                 context = None
@@ -974,11 +969,12 @@ class LmdbStore(Store):
 
 
     def _lookup(self, triple_pattern):
-        '''
+        """
         Look up triples in the indices based on a triple pattern.
 
-        @return iterator of matching triple keys.
-        '''
+        :rtype: Iterator
+        :return: Matching triple keys.
+        """
         s, p, o = triple_pattern
 
         if s is not None:
@@ -1022,15 +1018,16 @@ class LmdbStore(Store):
 
 
     def _lookup_1bound(self, label, term):
-        '''
+        """
         Lookup triples for a pattern with one bound term.
 
-        @param label (string) Which term is being searched for. One of `s`,
+        :param str label: Which term is being searched for. One of `s`,
         `p`, or `o`.
-        @param term (rdflib.URIRef) Bound term to search for.
+        :param rdflib.URIRef term: Bound term to search for.
 
-        @return iterator(bytes) SPO keys matching the pattern.
-        '''
+        :rtype: iterator(bytes)
+        :return: SPO keys matching the pattern.
+        """
         k = self._to_key(term)
         if not k:
             return iter(())
@@ -1051,15 +1048,16 @@ class LmdbStore(Store):
 
 
     def _lookup_2bound(self, bound_terms):
-        '''
+        """
         Look up triples for a pattern with two bound terms.
 
-        @param bound terms (dict) Triple labels and terms to search for,
+        :param  bound: terms (dict) Triple labels and terms to search for,
         in the format of, e.g. {'s': URIRef('urn:s:1'), 'o':
         URIRef('urn:o:1')}
 
-        @return iterator(bytes) SPO keys matching the pattern.
-        '''
+        :rtype: iterator(bytes)
+        :return: SPO keys matching the pattern.
+        """
         if len(bound_terms) != 2:
             raise ValueError(
                     'Exactly 2 terms need to be bound. Got {}'.format(
@@ -1112,14 +1110,15 @@ class LmdbStore(Store):
 
 
     def _append(self, cur, values, **kwargs):
-        '''
+        """
         Append one or more values to the end of a database.
 
-        @param cur (lmdb.Cursor) The write cursor to act on.
-        @param data (list(bytes)) Value(s) to append.
+        :param lmdb.Cursor cur: The write cursor to act on.
+        :param list(bytes) values: Value(s) to append.
 
-        @return list(bytes) Last key(s) inserted.
-        '''
+        :rtype: list(memoryview)
+        :return: Last key(s) inserted.
+        """
         if not isinstance(values, list) and not isinstance(values, tuple):
             raise ValueError('Input must be a list or tuple.')
         data = []
@@ -1134,13 +1133,12 @@ class LmdbStore(Store):
 
 
     def _index_triple(self, action, spok):
-        '''
+        """
         Update index for a triple and context (add or remove).
 
-        @param action (string) 'add' or 'remove'.
-        @param spok (bytes) Triple key.
-        indexed. Context MUST be specified for 'add'.
-        '''
+        :param str action: 'add' or 'remove'.
+        :param bytes spok: Triple key.
+        """
         # Split and rearrange-join keys for association and indices.
         triple = self._split_key(spok)
         sk, pk, ok = triple
@@ -1173,13 +1171,14 @@ class LmdbStore(Store):
     ## debugging.
 
     def _keys_in_ctx(self, pk_ctx):
-        '''
+        """
         Convenience method to list all keys in a context.
 
-        @param pk_ctx (bytes) Pickled context URI.
+        :param bytes pk_ctx: Pickled context URI.
 
-        @return Iterator:tuple Generator of triples.
-        '''
+        :rtype: Iterator(tuple)
+        :return: Generator of triples.
+        """
         with self.cur('c:spo') as cur:
             if cur.set_key(pk_ctx):
                 tkeys = cur.iternext_dup()
@@ -1189,13 +1188,14 @@ class LmdbStore(Store):
 
 
     def _ctx_for_key(self, tkey):
-        '''
+        """
         Convenience method to list all contexts that a key is in.
 
-        @param tkey (bytes) Triple key.
+        :param bytes tkey: Triple key.
 
-        @return Iterator:URIRef Generator of context URIs.
-        '''
+        :rtype: Iterator(rdflib.URIRef)
+        :return: Generator of context URIs.
+        """
         with self.cur('spo:c') as cur:
             if cur.set_key(tkey):
                 ctx = cur.iternext_dup()

+ 90 - 85
lakesuperior/store/ldp_rs/rsrc_centric_layout.py

@@ -33,7 +33,7 @@ logger = logging.getLogger(__name__)
 
 
 class RsrcCentricLayout:
-    '''
+    """
     This class exposes an interface to build graph store layouts. It also
     provides the basics of the triplestore connection.
 
@@ -53,7 +53,7 @@ class RsrcCentricLayout:
     E.g. if the configuration indicates `simple_layout` the application will
     look for
     `lakesuperior.store.rdf.simple_layout.SimpleLayout`.
-    '''
+    """
     _graph_uids = ('fcadmin', 'fcmain', 'fcstruct')
 
     # @TODO Move to a config file?
@@ -116,14 +116,14 @@ class RsrcCentricLayout:
     ## MAGIC METHODS ##
 
     def __init__(self, config):
-        '''Initialize the graph store and a layout.
+        """Initialize the graph store and a layout.
 
         NOTE: `rdflib.Dataset` requires a RDF 1.1 compliant store with support
         for Graph Store HTTP protocol
         (https://www.w3.org/TR/sparql11-http-rdf-update/). Blazegraph supports
         this only in the (currently unreleased) 2.2 branch. It works with Jena,
         which is currently the reference implementation.
-        '''
+        """
         self.config = config
         self.store = plugin.get('Lmdb', Store)(config['location'])
         self.ds = Dataset(self.store, default_union=True)
@@ -132,30 +132,30 @@ class RsrcCentricLayout:
 
     @property
     def attr_routes(self):
-        '''
+        """
         This is a map that allows specific triples to go to certain graphs.
         It is a machine-friendly version of the static attribute `attr_map`
         which is formatted for human readability and to avoid repetition.
         The attributes not mapped here (usually user-provided triples with no
         special meaning to the application) go to the `fcmain:` graph.
 
-        The output of this is a dict with a similar structure:
-
-        {
-            'p': {
-                <Predicate P1>: <destination graph G1>,
-                <Predicate P2>: <destination graph G1>,
-                <Predicate P3>: <destination graph G1>,
-                <Predicate P4>: <destination graph G2>,
-                [...]
-            },
-            't': {
-                <RDF Type T1>: <destination graph G1>,
-                <RDF Type T2>: <destination graph G3>,
-                [...]
+        The output of this is a dict with a similar structure::
+
+            {
+                'p': {
+                    <Predicate P1>: <destination graph G1>,
+                    <Predicate P2>: <destination graph G1>,
+                    <Predicate P3>: <destination graph G1>,
+                    <Predicate P4>: <destination graph G2>,
+                    [...]
+                },
+                't': {
+                    <RDF Type T1>: <destination graph G1>,
+                    <RDF Type T2>: <destination graph G3>,
+                    [...]
+                }
             }
-        }
-        '''
+        """
         if not hasattr(self, '_attr_routes'):
             self._attr_routes = {'p': {}, 't': {}}
             for dest in self.attr_map.keys():
@@ -168,9 +168,9 @@ class RsrcCentricLayout:
 
 
     def bootstrap(self):
-        '''
+        """
         Delete all graphs and insert the basic triples.
-        '''
+        """
         logger.info('Deleting all data from the graph store.')
         store = self.ds.store
         if getattr(store, 'is_txn_open', False):
@@ -186,25 +186,25 @@ class RsrcCentricLayout:
 
 
     def get_raw(self, uri, ctx=None):
-        '''
+        """
         Get a raw graph of a non-LDP resource.
 
         The graph is queried across all contexts or within a specific one.
 
-        @param s(rdflib.term.URIRef) URI of the subject.
-        @param ctx (rdflib.term.URIRef) URI of the optional context. If None,
+        :param rdflib.term.URIRef s: URI of the subject.
+        :param rdflib.term.URIRef ctx: URI of the optional context. If None,
         all named graphs are queried.
 
-        return rdflib.Graph
-        '''
+        :rtype: rdflib.Graph
+        """
         return self.store.triples((nsc['fcres'][uid], None, None), ctx)
 
 
     def count_rsrc(self):
-        '''
+        """
         Return a count of first-class resources, subdivided in "live" and
         historic snapshots.
-        '''
+        """
         with TxnManager(self.ds.store) as txn:
             main = set(
                     self.ds.graph(META_GR_URI)[ : nsc['foaf'].primaryTopic : ])
@@ -215,18 +215,18 @@ class RsrcCentricLayout:
 
 
     def raw_query(self, qry_str):
-        '''
+        """
         Perform a straight query to the graph store.
-        '''
+        """
         return self.ds.query(qry_str)
 
 
     def extract_imr(
                 self, uid, ver_uid=None, strict=True, incl_inbound=False,
                 incl_children=True, embed_children=False, **kwargs):
-        '''
+        """
         See base_rdf_layout.extract_imr.
-        '''
+        """
         if ver_uid:
             uid = self.snapshot_uid(uid, ver_uid)
 
@@ -260,9 +260,9 @@ class RsrcCentricLayout:
 
 
     def ask_rsrc_exists(self, uid):
-        '''
+        """
         See base_rdf_layout.ask_rsrc_exists.
-        '''
+        """
         logger.debug('Checking if resource exists: {}'.format(uid))
         meta_gr = self.ds.graph(nsc['fcadmin'][uid])
         return bool(
@@ -270,9 +270,9 @@ class RsrcCentricLayout:
 
 
     def get_metadata(self, uid, ver_uid=None, strict=True):
-        '''
+        """
         This is an optimized query to get only the administrative metadata.
-        '''
+        """
         logger.debug('Getting metadata for: {}'.format(uid))
         if ver_uid:
             uid = self.snapshot_uid(uid, ver_uid)
@@ -287,12 +287,12 @@ class RsrcCentricLayout:
 
 
     def get_user_data(self, uid):
-        '''
+        """
         Get all the user-provided data.
 
-        @param uid (string) Resource UID.
-        '''
-        # @TODO This only works as long as there is only one user-provided
+        :param string uid: Resource UID.
+        """
+        # *TODO* This only works as long as there is only one user-provided
         # graph. If multiple user-provided graphs will be supported, this
         # should use another query to get all of them.
         userdata_gr = self.ds.graph(nsc['fcmain'][uid])
@@ -301,18 +301,19 @@ class RsrcCentricLayout:
 
 
     def get_version_info(self, uid, strict=True):
-        '''
+        """
         Get all metadata about a resource's versions.
-        '''
-        # @NOTE This pretty much bends the ontology—it replaces the graph URI
+        """
+        # **Note:** This pretty much bends the ontology—it replaces the graph URI
         # with the subject URI. But the concepts of data and metadata in Fedora
         # are quite fluid anyways...
+
         # WIP—Is it worth to replace SPARQL here?
         #versions = self.ds.graph(nsc['fcadmin'][uid]).triples(
         #        (nsc['fcres'][uid], nsc['fcrepo'].hasVersion, None))
         #for version in versions:
         #    version_meta = self.ds.graph(HIST_GRAPH_URI).triples(
-        qry = '''
+        qry = """
         CONSTRUCT {
           ?s fcrepo:hasVersion ?v .
           ?v ?p ?o .
@@ -325,13 +326,13 @@ class RsrcCentricLayout:
             ?vm  ?p ?o .
             FILTER (?o != ?v)
           }
-        }'''
+        }"""
         gr = self._parse_construct(qry, init_bindings={
             'ag': nsc['fcadmin'][uid],
             'hg': HIST_GR_URI,
             's': nsc['fcres'][uid]})
         rsrc = Resource(gr, nsc['fcres'][uid])
-        # @TODO Should return a graph.
+        # TODO Should return a graph.
         if strict:
             self._check_rsrc_status(rsrc)
 
@@ -339,19 +340,19 @@ class RsrcCentricLayout:
 
 
     def get_inbound_rel(self, subj_uri, full_triple=True):
-        '''
+        """
         Query inbound relationships for a subject.
 
         This can be a list of either complete triples, or of subjects referring
         to the given URI. It excludes historic version snapshots.
 
-        @param subj_uri (rdflib.URIRef) Subject URI.
-        @param full_triple (boolean) Whether to return the full triples found
+        :param rdflib.URIRef subj_uri: Subject URI.
+        :param boolean full_triple: Whether to return the full triples found
         or only the subjects. By default, full triples are returned.
 
-        @return iterator(tuple(rdflib.term.Identifier) | rdflib.URIRef)
-        Inbound triples or subjects.
-        '''
+        :rtype: Iterator(tuple(rdflib.term.Identifier) or rdflib.URIRef)
+        :return: Inbound triples or subjects.
+        """
         # Only return non-historic graphs.
         meta_gr = self.ds.graph(META_GR_URI)
         ptopic_uri = nsc['foaf'].primaryTopic
@@ -364,14 +365,15 @@ class RsrcCentricLayout:
 
 
     def get_descendants(self, uid, recurse=True):
-        '''
+        """
         Get descendants (recursive children) of a resource.
 
-        @param uid (string) Resource UID.
+        :param string uid: Resource UID.
         result set.
 
-        @return iterator(rdflib.URIRef) Subjects of descendant resources.
-        '''
+        :rtype: iterator(rdflib.URIRef)
+        :return: Subjects of descendant resources.
+        """
         ds = self.ds
         subj_uri = nsc['fcres'][uid]
         ctx_uri = nsc['fcstruct'][uid]
@@ -391,15 +393,15 @@ class RsrcCentricLayout:
 
 
     def patch_rsrc(self, uid, qry):
-        '''
+        """
         Patch a resource with SPARQL-Update statements.
 
         The statement(s) is/are executed on the user-provided graph only
         to ensure that the scope is limited to the resource.
 
-        @param uid (string) UID of the resource to be patched.
-        @param qry (dict) Parsed and translated query, or query string.
-        '''
+        :param string uid: UID of the resource to be patched.
+        :param dict qry: Parsed and translated query, or query string.
+        """
         # Add meta graph for user-defined triples. This may not be used but
         # it's simple and harmless to add here.
         self.ds.graph(META_GR_URI).add(
@@ -413,12 +415,12 @@ class RsrcCentricLayout:
 
 
     def forget_rsrc(self, uid, inbound=True, children=True):
-        '''
+        """
         Completely delete a resource and (optionally) its children and inbound
         references.
 
         NOTE: inbound references in historic versions are not affected.
-        '''
+        """
         # Localize variables to be used in loops.
         uri = nsc['fcres'][uid]
         topic_uri = nsc['foaf'].primaryTopic
@@ -447,23 +449,23 @@ class RsrcCentricLayout:
 
 
     def truncate_rsrc(self, uid):
-        '''
+        """
         Remove all user-provided data from a resource and only leave admin and
         structure data.
-        '''
+        """
         userdata = set(self.get_user_data(uid))
 
         return self.modify_rsrc(uid, remove_trp=userdata)
 
 
     def modify_rsrc(self, uid, remove_trp=set(), add_trp=set()):
-        '''
+        """
         Modify triples about a subject.
 
         This method adds and removes triple sets from specific graphs,
         indicated by the term router. It also adds metadata about the changed
         graphs.
-        '''
+        """
         remove_routes = defaultdict(set)
         add_routes = defaultdict(set)
         historic = VERS_CONT_LABEL in uid
@@ -502,7 +504,7 @@ class RsrcCentricLayout:
                 ver_uid = uid.split(VERS_CONT_LABEL)[1].lstrip('/')
                 meta_gr.set((
                     gr_uri, nsc['fcrepo'].hasVersionLabel, Literal(ver_uid)))
-            # @TODO More provenance metadata can be added here.
+            # *TODO* More provenance metadata can be added here.
 
         # Add graph RDF types.
         for gr_uri, gr_type in graph_types:
@@ -510,12 +512,12 @@ class RsrcCentricLayout:
 
 
     def delete_rsrc(self, uid, historic=False):
-        '''
+        """
         Delete all aspect graphs of an individual resource.
 
-        @param uid Resource UID.
-        @param historic (bool) Whether the UID is of a historic version.
-        '''
+        :param uid: Resource UID.
+        :param bool historic: Whether the UID is of a historic version.
+        """
         meta_gr_uri = HIST_GR_URI if historic else META_GR_URI
         for gr_uri in self.ds.graph(meta_gr_uri)[
                 : nsc['foaf'].primaryTopic : nsc['fcres'][uid]]:
@@ -524,9 +526,9 @@ class RsrcCentricLayout:
 
 
     def snapshot_uid(self, uid, ver_uid):
-        '''
+        """
         Create a versioned UID string from a main UID and a version UID.
-        '''
+        """
         if VERS_CONT_LABEL in uid:
             raise InvalidResourceError(uid,
                     'Resource \'{}\' is already a version.')
@@ -535,9 +537,9 @@ class RsrcCentricLayout:
 
 
     def uri_to_uid(self, uri):
-        '''
+        """
         Convert an internal URI to a UID.
-        '''
+        """
         return str(uri).replace(nsc['fcres'], '')
 
 
@@ -566,9 +568,9 @@ class RsrcCentricLayout:
     ## PROTECTED MEMBERS ##
 
     def _check_rsrc_status(self, rsrc):
-        '''
+        """
         Check if a resource is not existing or if it is a tombstone.
-        '''
+        """
         uid = self.uri_to_uid(rsrc.identifier)
         if not len(rsrc.graph):
             raise ResourceNotExistsError(uid)
@@ -585,9 +587,11 @@ class RsrcCentricLayout:
 
 
     def _parse_construct(self, qry, init_bindings={}):
-        '''
-        Parse a CONSTRUCT query and return a Graph.
-        '''
+        """
+        Parse a CONSTRUCT query.
+
+        :rtype: rdflib.Graph
+        """
         try:
             qres = self.ds.query(qry, initBindings=init_bindings)
         except ResultException:
@@ -598,11 +602,12 @@ class RsrcCentricLayout:
 
 
     def _map_graph_uri(self, t, uid):
-        '''
+        """
         Map a triple to a namespace prefix corresponding to a graph.
 
-        @return Tuple with a graph URI and an associated RDF type.
-        '''
+        :rtype: tuple
+        :return: 2-tuple with a graph URI and an associated RDF type.
+        """
         if t[1] in self.attr_routes['p'].keys():
             pfx = self.attr_routes['p'][t[1]]
         elif t[1] == RDF.type and t[2] in self.attr_routes['t'].keys():

+ 32 - 27
lakesuperior/toolbox.py

@@ -24,11 +24,11 @@ class Toolbox:
         '''
         Replace the domain of a term.
 
-        @param term (URIRef) The term (URI) to change.
-        @param search (string) Domain string to replace.
-        @param replace (string) Domain string to use for replacement.
+        :param rdflib.URIRef term: The term (URI) to change.
+        :param str search: Domain string to replace.
+        :param str replace: Domain string to use for replacement.
 
-        @return URIRef
+        :rtype: rdflib.URIRef
         '''
         s = str(term)
         if s.startswith(search):
@@ -40,7 +40,7 @@ class Toolbox:
     def uid_to_uri(self, uid):
         '''Convert a UID to a URI.
 
-        @return URIRef
+        :rtype: rdflib.URIRef
         '''
         return URIRef(g.webroot + uid)
 
@@ -48,7 +48,7 @@ class Toolbox:
     def uri_to_uid(self, uri):
         '''Convert an absolute URI (internal or external) to a UID.
 
-        @return string
+        :rtype: str
         '''
         if uri.startswith(nsc['fcres']):
             return str(uri).replace(nsc['fcres'], '')
@@ -59,9 +59,9 @@ class Toolbox:
     def localize_uri_string(self, s):
         '''Convert URIs into URNs in a string using the application base URI.
 
-        @param string s Input string.
+        :param str: s Input string.
 
-        @return string
+        :rtype: str
         '''
         if s.strip('/') == g.webroot:
             return str(ROOT_RSRC_URI)
@@ -74,9 +74,9 @@ class Toolbox:
         '''
         Localize an individual term.
 
-        @param rdflib.term.URIRef urn Input URI.
+        :param rdflib.URIRef: urn Input URI.
 
-        @return rdflib.term.URIRef
+        :rtype: rdflib.URIRef
         '''
         return URIRef(self.localize_uri_string(str(uri)))
 
@@ -85,9 +85,9 @@ class Toolbox:
         '''
         Localize terms in a triple.
 
-        @param trp (tuple(rdflib.term.URIRef)) The triple to be converted
+        :param tuple(rdflib.URIRef) trp: The triple to be converted
 
-        @return tuple(rdflib.term.URIRef)
+        :rtype: tuple(rdflib.URIRef)
         '''
         s, p, o = trp
         if s.startswith(g.webroot):
@@ -114,9 +114,9 @@ class Toolbox:
         '''
         Localize an RDF stream with domain-specific URIs.
 
-        @param data (bytes) Binary RDF data.
+        :param bytes data: Binary RDF data.
 
-        @return bytes
+        :rtype: bytes
         '''
         return data.replace(
             (g.webroot + '/').encode('utf-8'),
@@ -159,9 +159,9 @@ class Toolbox:
     def globalize_string(self, s):
         '''Convert URNs into URIs in a string using the application base URI.
 
-        @param string s Input string.
+        :param string s: Input string.
 
-        @return string
+        :rtype: string
         '''
         return s.replace(str(nsc['fcres']), g.webroot)
 
@@ -170,9 +170,9 @@ class Toolbox:
         '''
         Convert an URN into an URI using the application base URI.
 
-        @param rdflib.term.URIRef urn Input URN.
+        :param rdflib.URIRef urn: Input URN.
 
-        @return rdflib.term.URIRef
+        :rtype: rdflib.URIRef
         '''
         return URIRef(self.globalize_string(str(urn)))
 
@@ -181,9 +181,9 @@ class Toolbox:
         '''
         Globalize terms in a triple.
 
-        @param trp (tuple(rdflib.term.URIRef)) The triple to be converted
+        :param tuple(rdflib.URIRef) trp: The triple to be converted
 
-        @return tuple(rdflib.term.URIRef)
+        :rtype: tuple(rdflib.URIRef)
         '''
         s, p, o = trp
         if s.startswith(nsc['fcres']):
@@ -221,13 +221,13 @@ class Toolbox:
 
     def parse_rfc7240(self, h_str):
         '''
-        Parse `Prefer` header as per https://tools.ietf.org/html/rfc7240
+        Parse ``Prefer`` header as per https://tools.ietf.org/html/rfc7240
 
-        The `cgi.parse_header` standard method does not work with all possible
-        use cases for this header.
+        The ``cgi.parse_header`` standard method does not work with all
+        possible use cases for this header.
 
-        @param h_str (string) The header(s) as a comma-separated list of Prefer
-        statements, excluding the `Prefer: ` token.
+        :param str h_str: The header(s) as a comma-separated list of Prefer
+            statements, excluding the ``Prefer:`` token.
         '''
         parsed_hdr = defaultdict(dict)
 
@@ -267,9 +267,10 @@ class Toolbox:
 
         @TODO This can be later reworked to use a custom hashing algorithm.
 
-        @param rdflib.Graph gr The graph to be hashed.
+        :param rdflib.Graph: gr The graph to be hashed.
 
-        @return string SHA1 checksum.
+        :rtype: str
+        :return: SHA1 checksum.
         '''
         # Remove the messageDigest property, which very likely reflects the
         # previous state of the resource.
@@ -283,6 +284,10 @@ class Toolbox:
     def split_uuid(self, uuid):
         '''
         Split a UID into pairtree segments. This mimics FCREPO4 behavior.
+
+        :param str uuid: UUID to split.
+
+        :rtype: str
         '''
         path = '{}/{}/{}/{}/{}'.format(uuid[:2], uuid[2:4],
                 uuid[4:6], uuid[6:8], uuid)