|
@@ -37,21 +37,26 @@ perspective.
|
|
|
|
|
|
"From a standpoint of preserving human cultural heritage at large, does it make
|
|
|
more sense to design very large repositories for very rich institutions, with a
|
|
|
-lot of layers of safety but also a lot of bureaucracy and redundancy, or
|
|
|
+lot of layers of safety but also a lot of bureaucracy and redundancy, or rather
|
|
|
contribute to many decentralized projects that are highly efficient, small,
|
|
|
representing periferal cultures, and most importantly, that are at much higher
|
|
|
-risk of loss than the large institutions"?
|
|
|
+risk of loss than large institutions'"?
|
|
|
|
|
|
Both: this software has been conceived with the experience of large-scale
|
|
|
repositories as the background to decide what works and what doesn't, what is
|
|
|
necessary and what is superfluous, and what catalogers and archivists need to
|
|
|
do their job.
|
|
|
|
|
|
+It is not inconceivable that if many Pocket Archives were to sprout all over
|
|
|
+the place one day, they could be periodically harvested, linked together, and
|
|
|
+presented in one large, central archive (it's Linked Data, after all), without
|
|
|
+any detriment to the indepencence of the individual archives.
|
|
|
+
|
|
|
## Basic concepts
|
|
|
|
|
|
Until some proper reference is written, this should serve as a high-level
|
|
|
-documentation to help evaluate the functionality and for the author to stay on
|
|
|
-track. Some of these ideas have been ripped right off my day job, so there is
|
|
|
+documentation to help evaluate the functionality and to help me to stay on
|
|
|
+track. Many of these ideas have been ripped right off my day job, so there is
|
|
|
a good chance they work.
|
|
|
|
|
|
### General philosophy
|
|
@@ -61,9 +66,9 @@ both a user's and a maintainer's perspectives. These two properties are usually
|
|
|
seen as conflicting, but within reason, they can coexist.
|
|
|
|
|
|
Pocket Archive is built upon a minimalistic framework: C and Lua, with very few
|
|
|
-dependencies. As with these foundational elements, it strives to offer few
|
|
|
-tools that can be combined in a multitude of ways to achieve many goals, rather
|
|
|
-than many tools each doing a specific thing.
|
|
|
+dependencies. Similarly to these foundational elements, Pocket Archive strives
|
|
|
+to offer few tools that can be combined in a multitude of ways to achieve many
|
|
|
+goals, rather than many tools each doing a specific thing.
|
|
|
|
|
|
### Resource
|
|
|
|
|
@@ -72,10 +77,10 @@ too much by taking the concept to the Linked Data extremes, the term *resource*
|
|
|
is used in this project to describe individual, self-contained units of
|
|
|
information such as:
|
|
|
|
|
|
-- Files;
|
|
|
+- Digital files;
|
|
|
- Intellectual or physical artifacts (artworks, documents, books, etc.);
|
|
|
- Structural elements inside or around an entity, such as the order of pages in
|
|
|
- a book, the two sides of a postcard, a collection of artifacts, etc.
|
|
|
+ a book, the two sides of a postcard, a collection of oher resources, etc.
|
|
|
|
|
|
Files are called *opaque resources*. They are viewed by Pocket Archive as
|
|
|
"opaque" in that the system doesn't care about their contents. It only ensures
|
|
@@ -97,133 +102,28 @@ A submission is directed by a *laundry list*, which is a spreadsheet listing
|
|
|
all the resources (both opaque and descriptive) to be created, and the metadata
|
|
|
assigned to them. The laundry list, formatted as a CSV (comma-separated value)
|
|
|
file, can be edited by several free and open source applications, such as
|
|
|
-LibreOffice. For repetitive, high- volume submissions, templates can be set to
|
|
|
+LibreOffice. For repetitive, high-volume submissions, templates can be set to
|
|
|
facilitate filling in metadata fields. An [example submission
|
|
|
](test/sample_submission/postcard-bag/data/), which includes a laundry list, is
|
|
|
available.
|
|
|
|
|
|
-Detailed instructions on how to write a laundry list shall be added later. For
|
|
|
-now, the following are the basic guidelines to build a submission package:
|
|
|
-
|
|
|
-- Resources are arranged in files and folders on a local filesystem that Pocket
|
|
|
- Archive can access.
|
|
|
-- File and folder arrangement is important. A folder represents a descriptive
|
|
|
- resource, and can have metadata attached to it. A file of folder under a
|
|
|
- parent folder is automatically added as a *child* of the parent resource.
|
|
|
- This relationship is intended to present the parent as a container of other
|
|
|
- sub-resources (descriptive and/or opaque). With this method, hierarchies of
|
|
|
- any complexity can be built.
|
|
|
-- File and folder order in the submission folder is *not* important. No need to
|
|
|
- rename files and folders to force a specific ordering. This is specified via
|
|
|
- laundry list instead. See below.
|
|
|
-- The laundry list file is placed under the submission package folder and must
|
|
|
- be named `pkar_submission.csv`.
|
|
|
-
|
|
|
-A laundry list is thus formatted:
|
|
|
-
|
|
|
-- The first row is reserved for the headers, which indicate the field names.
|
|
|
-- Each subsequent row represents a resource (except in a multi-value case,
|
|
|
- described below). The `pas:sourcePath` and `pas:contentType` fields are
|
|
|
- mandatory for each resource. All other fields are optional for the
|
|
|
- submission, however, some type definitions may have constraints in this
|
|
|
- regard.
|
|
|
-- All field names, except for `id`, have a namespace prefix among the ones
|
|
|
- defined in the configuration. See dedicated section for details about
|
|
|
- namespaces.
|
|
|
-- Fields with a special meaning:
|
|
|
- - `id`: optional and single-valued. If provided, it becomes the primary
|
|
|
- identifier for the resource, which is used anywhere information about the
|
|
|
- resource is retrieved. The depositor is responmsible for ensuring that
|
|
|
- the provided ID is unique across the system. If left blank, the system
|
|
|
- generates an identifier that is guaranteed to be unique.
|
|
|
- - `pas:sourcePath`: mandatory and single-valued. It refers to the file or
|
|
|
- folder path relative to the package.
|
|
|
- - `pas:contentType`: mandatory and single-valued. It defines the content
|
|
|
- type assigned to the resource. For files, it should be `pas:File` or a
|
|
|
- sub-type thereof. For folders it must not be a `pas:File` or sub-type.
|
|
|
-- To provide multiple values for one or more fields, additional values are
|
|
|
- added to rows below the previous. For these additional rows, the `sourcePath`
|
|
|
- field **must not** be filled, and additional values for single-valued fields
|
|
|
- are ignored.
|
|
|
-- The ordering of the rows determines the ordering of the resources in their
|
|
|
- container. The system automatically assigns an order to the resources, using
|
|
|
- their source path and their position in the laundry list. Resources at the
|
|
|
- top are not assigned an order, as they are considered self-standing. If an
|
|
|
- order is needed for those, the `pas:next` field can be set to the desired
|
|
|
- resource (see point below about relationships), or they can be put in an
|
|
|
- enclosing folder that acts as a collection.
|
|
|
-- Relationships can be established between resources. These are stored as
|
|
|
- persistent links and appear as hyperlinks in the discovery interface. A
|
|
|
- relationship can only be set for a field that is configured as "resource"
|
|
|
- type. To set a relationship with a resource in the
|
|
|
- same laundry list that doesn't have an explicit ID set, insert the source
|
|
|
- path of the resource. For a resource that has already an ID, either by being
|
|
|
- assigned one manually or by being already deposited, insert the full ID
|
|
|
- including the `par:` namespace (e.g. for ID `12345`, insert `par:12345`).
|
|
|
-
|
|
|
-### Update
|
|
|
-
|
|
|
-A submission is also used to update existing resources. Each resource update is
|
|
|
-a full replacement of all the resource's metadata, so a submission must include
|
|
|
-a full representation of each of the resources updated.
|
|
|
-
|
|
|
-To facilitate this task while avoiding the need to hold on to all of the
|
|
|
-archive's laundry lists, Pocket Archive can generate a laundry list for one or
|
|
|
-more selected resources. This list, which represents the current state of the
|
|
|
-resources requested, can be edited and submitted for an update. This method is
|
|
|
-much faster and intuitive than clicking around an alien user interface filled
|
|
|
-with icons and terms that one has never seen before.
|
|
|
+Using spreadsheets is for most users much faster and intuitive than clicking
|
|
|
+around an alien user interface filled with icons and terms that one has never
|
|
|
+seen before.
|
|
|
|
|
|
-### Metadata & content model
|
|
|
+Detailed instructions on how to write a laundry list are under the
|
|
|
+[submission documentation](doc/submission.md).
|
|
|
|
|
|
-**Note:** The scope of this functional area is currently under review. Things
|
|
|
-may change.
|
|
|
+### Metadata & content model
|
|
|
|
|
|
Metadata are (yes, it's a *plural* noun) controlled by a *content model*, which
|
|
|
in this project is intended as the entirety of definitions of content types
|
|
|
-recognized by the system, and how they relate to one another. Each *type
|
|
|
-definition* is encoded in a configuration file defining a single content
|
|
|
-category type. This configuration is specific to each individual Pocket Archive
|
|
|
-installation, which can use the baseline one provided by default, or extend it
|
|
|
-via additional configurations. Please look at the [default model
|
|
|
-configuration](config/model/typedef) files that come with Pocket Archive.
|
|
|
-
|
|
|
-One doesn't have to define all possible types in detail. Pocket Archive
|
|
|
-provides some basic types, e.g.: `Anything` (the super-class of them all),
|
|
|
-`Artifact`, `File`, `Part`, which can be used in a very basic installation and
|
|
|
-should not be radically altered, because some basic functionality of the system
|
|
|
-relies on them. To add more specific definitions, *subtypes* can be defined. A
|
|
|
-subtype inherits all the property definitions of its broader model, and adds
|
|
|
-more specific behavior. An example classification could be: Anything -> File ->
|
|
|
-Image File -> Scientific Image. Each of the sub-types would only define the
|
|
|
-special properties of that definition, which add to, or replace, the properties
|
|
|
-of its broader definitions.
|
|
|
-
|
|
|
-All resources in Pocket Archive must be assigned a content type. If someone has
|
|
|
-to deal with a resource that doesn't fit in any of the predefined content
|
|
|
-models, they can asign it the most specific type that they can. At worst, they
|
|
|
-can put it under Anything. Of course, if one starts dealing with many
|
|
|
-unclassifiable resources that look similar, it's probably best to define a
|
|
|
-model for them; but that is not mandatory.
|
|
|
-
|
|
|
-Each metadata field can be specified by constraints. These constraints can be
|
|
|
-on:
|
|
|
-
|
|
|
-- Type: the data type for the field, e.g. string, number, resource
|
|
|
- (relationship), etc.
|
|
|
-- Cardinality: how many values can be set for a field, for each resource. These
|
|
|
- values can be adjusted to set mandatory fields, single-valued fields, etc.
|
|
|
-- Range: the range of values allowed. How this is interpreted depends on the
|
|
|
- data type: for a number can be a min/max range, for a string a regular
|
|
|
- expression pattern, for a resource the type(s) of the resources pointed to,
|
|
|
- etc.
|
|
|
-
|
|
|
-All of these constraints are optionals. Fields that are not defined may accept
|
|
|
-any number of values, and are optional. So it's up to the repository manager
|
|
|
-to decide how specific or how free-form their archive should be.
|
|
|
-
|
|
|
-Note that fields that are not defined at least by a label, may be hard to
|
|
|
-understand by users browsing the discovery interface.
|
|
|
+recognized by the system, and how they relate to one another. Each individual
|
|
|
+Pocket Archive installation can use the baseline one provided by default, or
|
|
|
+extend it via additional configuration.
|
|
|
+
|
|
|
+See the [content model configuration manual](doc/content_model.md) for details
|
|
|
+on how to set up a custom content model.
|
|
|
|
|
|
### Site generation
|
|
|
|
|
@@ -251,7 +151,7 @@ Simple road map for a rough prototype:
|
|
|
- ⚒ Content model
|
|
|
- ⎊ Validation rules
|
|
|
- ⎊ Relationship inference rules
|
|
|
- - Local overrides
|
|
|
+ - ⎊ Local overrides
|
|
|
- ⚒ Submission module
|
|
|
- ✓ SIP building
|
|
|
- ✓ Metadata from LL
|