C and Python data repository library. ALPHA
scossu 5765a1e558 WIP | 2 лет назад | |
---|---|---|
bin | 2 лет назад | |
doc | 3 лет назад | |
ext | 2 лет назад | |
include | 2 лет назад | |
src | 2 лет назад | |
test | 2 лет назад | |
.gitignore | 2 лет назад | |
.gitmodules | 2 лет назад | |
CODE_OF_CONDUCT | 3 лет назад | |
Doxyfile | 3 лет назад | |
LICENSE | 3 лет назад | |
Makefile | 2 лет назад | |
NOTES.md | 3 лет назад | |
README.md | 2 лет назад | |
TODO.md | 3 лет назад | |
test.c | 3 лет назад | |
valgrind-python.supp | 3 лет назад |
lsup_repo
This software is work in progress.
lsup_repo
is a C and Python library providing embedded (server-less)
data repository services. It builds upon a foundational library, lsup_rdf
which handles RDF and graph data.
Lakesuperior was initially built upon the Fedora repository software. This new version is a complete reingeineering and repurposing of the previous software, seeking to provide similar repository services with a simplified set of concepts and constrains, focused on usability and simplicity of design. Unlike Fedora, it does not aim to adhere to, alter, or set any API standard. Long-term sustainability of the handled data is provided by transparent exports into commonly parsable data formats (RDF) and tools to rebuild a repository from data files.
That said, nothing would prevent someone to add support for LDP, OCFL, Memento, etc. and even create a complete Fedora implementation by using this library as a foundation the basic repository functionality.
lsup_repo
can be included in a C or Python program to manage the life cycle
of RDF and non-RDF data. It allows to store and manage documents of any format
and size, and catalog them via RDF metadata.
lsup_repo
does not need to run a server for its core functionality. The
interaction with the library is done via a C or Python API. That said, a REST
API or any other type of server can be built with relative ease on top of this
library. A separate project, based on this library, may in the future provide a
REST API for generic resource management, likely based on some existing
standard.
The current goal of this development is to build a minimum-viable product (MVP) to replace the essential functionality of a previous project, Lakesuperior.
Pre-alpha. Currently at the beginning of the implementation phase. The structure of the code may change radically. Features mentioned below are to be intended as goals.
lsup_rdf
development)At the center of Lakesuperior is the goal of storing and organizing arbitrary files that can be found in a hard drive, remote server, etc. These files are called Data Resources (DATA-R). Their contents are entirely opaque to Lakesuperior, therefore any type of document can be handled.
Each data resource is accompanied by a Descriptive Resource (DESC-R). In the first iteration of Lakesuperior this is a RDF named graph which at a minimum contains a pointer to the data location and basic technical metadata. the URI of the named graph is globally unique. Such resource stands for the non-RDF resource in a Linked Data context. It can also be added user-defined metadata.
Descriptive resources may also exist independently of data resources for cataloging and organizational purposes. They have a few characteristics in common:
Partitioning a DESC-R into multiple graphs allows individual data sets to be
annotated, e.g. to establish provenance or versioning information about the
asserted facts. Future developments of lsup_repo
or software built upon it
may take advantage of this structure.
Triples in a DESC-R can have any subject; however it is recommended to maintain some consistency about which subjects are treated in each resource. Specifically, the use of a resource as an aggregation or container of triples about multiple independent entities is discouraged in favor of the use of dedicated data structures, as described below.
Descriptive resources can be organized in varios aggregation forms. The aggregating resources are normal descriptive resources, with specific predicates pointing to other descriptive resources.
It is important to notice that, unlike in Fedora or other LDP implementations, the life cycle of resource aggregations is entirely independent of the aggregated resources. In LDP, deleting a container would remove its contained resources. Also, in LDP a resource can be only contained by a single container (except in the case of indirect containers, to some extent). In Lakesuperior an aggregation only "contains" pointers to other entirely independent resources, which can be pointed to by an arbitrary number of other aggregations, which can be removed at any time without changing the state of the aggregated resources.
On the other hand, deleting a resource that is part of some structure causes a scan of all inbound links (see "Referential Integrity" below) and the removal of all links to it present in other structures; therefore, the deletion of an aggregated resource changes the state of its aggregations.
The types of structures foreseen for the first implementation of lsup_repo
are:
A set is simply a descriptive resource containing an unordered number of unique links to other descriptive resources. Any descriptive resource, including other structures, can be used. Shorthand functions for counting and iterating over Set members, as well as performing boolean operations on them, shall be made available. As it is a descriptive resource, a Set may have descriptive metadata added to it, such as taxonomy, descriptions, labels, etc.
A Lakesuperior List is the implementation of a "Linked List" data structure. It contains a link to a single descriptive resource. This resource, called a List Item, represents the first item in the list. Each list item, except for the last one, contains a single link to the next list item.
In addition, every list item has either:
A link to the resource it stands for: the List Item is a proxy for an existing resource, which makes it possible to make the same resource part of multiple list; or
A link to another list, which results in a nested list.
Shorthand functions to perform common list operations shall be made available. As with other descriptive resources, Lists and List Items can have any type of user-defined metadata and relationships added.
A List Item is a special case of a Proxy, which is a descriptive resource standing for another descriptive resource. This indirection is useful for adding a specific context to a resource, e.g. additional information on a document in the context of a curated collection that is only valid or relevant to that collection. Proxies can be aggregated in sets or other structures as well, as one sees fit.
Proxy definitions follow the OAI ORE ontology.
The concept of Linked data, which lsup_repo
is partly built upon, does not
mandate the guarantee that a link pointing to a resource resolves to an actual
resource, since it is often impossible to determine which system is responsible
for managing that resource, let alone having any agency upon it. Therefore,
"broken links" are not excluded.
lsup_repo
, however, relies on the assumption that a specific set of resources
is under its full control, and therefore guarantees that all references to
internally managed resources are maintained at all times. This means that when
a resource is deleted, all links pointing to it are identified and removed.
This is called referential integrity.
Tools shall be developed to perform periodical referential integrity checks and to notify of dangling links and/or repair them.
Some Lakesuperior resources (DSC-R) and RDF terms may be managed by the repository and are handled in a special way under most circumstances.
Examples of such resources can be:
Some managed resources may be only handled by the user in different way
depending on the state of a resource. For example, a RDF type of lsup:List
can be specified by the user on creation, but after that it may not be modified
manually.
TODO A more detailed list of these managed resources and their behaviour will be included in an expanded version of this documentation.