Browse Source

Split off verbose sections of README.

scossu 1 week ago
parent
commit
7a8a914221
4 changed files with 149 additions and 128 deletions
  1. 28 128
      README.md
  2. 50 0
      doc/content_model.md
  3. 70 0
      doc/submission.md
  4. 1 0
      src/core.lua

+ 28 - 128
README.md

@@ -37,21 +37,26 @@ perspective.
 
 
 "From a standpoint of preserving human cultural heritage at large, does it make
 "From a standpoint of preserving human cultural heritage at large, does it make
 more sense to design very large repositories for very rich institutions, with a
 more sense to design very large repositories for very rich institutions, with a
-lot of layers of safety but also a lot of bureaucracy and redundancy, or
+lot of layers of safety but also a lot of bureaucracy and redundancy, or rather
 contribute to many decentralized projects that are highly efficient, small,
 contribute to many decentralized projects that are highly efficient, small,
 representing periferal cultures, and most importantly, that are at much higher
 representing periferal cultures, and most importantly, that are at much higher
-risk of loss than the large institutions"?
+risk of loss than large institutions'"?
 
 
 Both: this software has been conceived with the experience of large-scale
 Both: this software has been conceived with the experience of large-scale
 repositories as the background to decide what works and what doesn't, what is
 repositories as the background to decide what works and what doesn't, what is
 necessary and what is superfluous, and what catalogers and archivists need to
 necessary and what is superfluous, and what catalogers and archivists need to
 do their job.
 do their job.
 
 
+It is not inconceivable that if many Pocket Archives were to sprout all over
+the place one day, they could be periodically harvested, linked together, and
+presented in one large, central archive (it's Linked Data, after all), without
+any detriment to the indepencence of the individual archives.
+
 ## Basic concepts
 ## Basic concepts
 
 
 Until some proper reference is written, this should serve as a high-level
 Until some proper reference is written, this should serve as a high-level
-documentation to help evaluate the functionality and for the author to stay on
-track. Some of these ideas have been ripped right off my day job, so there is
+documentation to help evaluate the functionality and to help me to stay on
+track. Many of these ideas have been ripped right off my day job, so there is
 a good chance they work.
 a good chance they work.
 
 
 ### General philosophy
 ### General philosophy
@@ -61,9 +66,9 @@ both a user's and a maintainer's perspectives. These two properties are usually
 seen as conflicting, but within reason, they can coexist.
 seen as conflicting, but within reason, they can coexist.
 
 
 Pocket Archive is built upon a minimalistic framework: C and Lua, with very few
 Pocket Archive is built upon a minimalistic framework: C and Lua, with very few
-dependencies. As with these foundational elements, it strives to offer few
-tools that can be combined in a multitude of ways to achieve many goals, rather
-than many tools each doing a specific thing.
+dependencies. Similarly to these foundational elements, Pocket Archive strives
+to offer few tools that can be combined in a multitude of ways to achieve many
+goals, rather than many tools each doing a specific thing.
 
 
 ### Resource
 ### Resource
 
 
@@ -72,10 +77,10 @@ too much by taking the concept to the Linked Data extremes, the term *resource*
 is used in this project to describe individual, self-contained units of
 is used in this project to describe individual, self-contained units of
 information such as:
 information such as:
 
 
-- Files;
+- Digital files;
 - Intellectual or physical artifacts (artworks, documents, books, etc.);
 - Intellectual or physical artifacts (artworks, documents, books, etc.);
 - Structural elements inside or around an entity, such as the order of pages in
 - Structural elements inside or around an entity, such as the order of pages in
-  a book, the two sides of a postcard, a collection of artifacts, etc.
+  a book, the two sides of a postcard, a collection of oher resources, etc.
 
 
 Files are called *opaque resources*. They are viewed by Pocket Archive as
 Files are called *opaque resources*. They are viewed by Pocket Archive as
 "opaque" in that the system doesn't care about their contents. It only ensures
 "opaque" in that the system doesn't care about their contents. It only ensures
@@ -97,133 +102,28 @@ A submission is directed by a *laundry list*, which is a spreadsheet listing
 all the resources (both opaque and descriptive) to be created, and the metadata
 all the resources (both opaque and descriptive) to be created, and the metadata
 assigned to them. The laundry list, formatted as a CSV (comma-separated value)
 assigned to them. The laundry list, formatted as a CSV (comma-separated value)
 file, can be edited by several free and open source applications, such as
 file, can be edited by several free and open source applications, such as
-LibreOffice. For repetitive, high- volume submissions, templates can be set to
+LibreOffice. For repetitive, high-volume submissions, templates can be set to
 facilitate filling in metadata fields. An [example submission
 facilitate filling in metadata fields. An [example submission
 ](test/sample_submission/postcard-bag/data/), which includes a laundry list, is
 ](test/sample_submission/postcard-bag/data/), which includes a laundry list, is
 available.
 available.
 
 
-Detailed instructions on how to write a laundry list shall be added later. For
-now, the following are the basic guidelines to build a submission package:
-
-- Resources are arranged in files and folders on a local filesystem that Pocket
-  Archive can access.
-- File and folder arrangement is important. A folder represents a descriptive
-  resource, and can have metadata attached to it. A file of folder under a
-  parent folder is automatically added as a *child* of the parent resource.
-  This relationship is intended to present the parent as a container of other
-  sub-resources (descriptive and/or opaque). With this method, hierarchies of
-  any complexity can be built.
-- File and folder order in the submission folder is *not* important. No need to
-  rename files and folders to force a specific ordering. This is specified via
-  laundry list instead. See below.
-- The laundry list file is placed under the submission package folder and must
-  be named `pkar_submission.csv`.
-
-A laundry list is thus formatted:
-
-- The first row is reserved for the headers, which indicate the field names.
-- Each subsequent row represents a resource (except in a multi-value case,
-  described below). The `pas:sourcePath` and `pas:contentType` fields are
-  mandatory for each resource. All other fields are optional for the
-  submission, however, some type definitions may have constraints in this
-  regard.
-- All field names, except for `id`, have a namespace prefix among the ones
-  defined in the configuration. See dedicated section for details about
-  namespaces.
-- Fields with a special meaning:
-    - `id`: optional and single-valued. If provided, it becomes the primary
-      identifier for the resource, which is used anywhere information about the
-      resource is retrieved. The depositor is responmsible for ensuring that
-      the provided ID is unique across the system. If left blank, the system
-      generates an identifier that is guaranteed to be unique.
-    - `pas:sourcePath`: mandatory and single-valued. It refers to the file or
-      folder path relative to the package.
-    - `pas:contentType`: mandatory and single-valued. It defines the content
-      type assigned to the resource. For files, it should be `pas:File` or a
-      sub-type thereof. For folders it must not be a `pas:File` or sub-type.
-- To provide multiple values for one or more fields, additional values are
-  added to rows below the previous. For these additional rows, the `sourcePath`
-  field **must not** be filled, and additional values for single-valued fields
-  are ignored.
-- The ordering of the rows determines the ordering of the resources in their
-  container. The system automatically assigns an order to the resources, using
-  their source path and their position in the laundry list. Resources at the
-  top are not assigned an order, as they are considered self-standing. If an
-  order is needed for those, the `pas:next` field can be set to the desired
-  resource (see point below about relationships), or they can be put in an
-  enclosing folder that acts as a collection.
-- Relationships can be established between resources. These are stored as
-  persistent links and appear as hyperlinks in the discovery interface. A
-  relationship can only be set for a field that is configured as "resource"
-  type. To set a relationship with a resource in the
-  same laundry list that doesn't have an explicit ID set, insert the source
-  path of the resource. For a resource that has already an ID, either by being
-  assigned one manually or by being already deposited, insert the full ID
-  including the `par:` namespace (e.g. for ID `12345`, insert `par:12345`).
-
-### Update
-
-A submission is also used to update existing resources. Each resource update is
-a full replacement of all the resource's metadata, so a submission must include
-a full representation of each of the resources updated.
-
-To facilitate this task while avoiding the need to hold on to all of the
-archive's laundry lists, Pocket Archive can generate a laundry list for one or
-more selected resources. This list, which represents the current state of the
-resources requested, can be edited and submitted for an update. This method is
-much faster and intuitive than clicking around an alien user interface filled
-with icons and terms that one has never seen before.
+Using spreadsheets is for most users much faster and intuitive than clicking
+around an alien user interface filled with icons and terms that one has never
+seen before.
 
 
-### Metadata & content model
+Detailed instructions on how to write a laundry list are under the 
+[submission documentation](doc/submission.md).
 
 
-**Note:** The scope of this functional area is currently under review. Things
-may change.
+### Metadata & content model
 
 
 Metadata are (yes, it's a *plural* noun) controlled by a *content model*, which
 Metadata are (yes, it's a *plural* noun) controlled by a *content model*, which
 in this project is intended as the entirety of definitions of content types
 in this project is intended as the entirety of definitions of content types
-recognized by the system, and how they relate to one another. Each *type
-definition* is encoded in a configuration file defining a single content
-category type. This configuration is specific to each individual Pocket Archive
-installation, which can use the baseline one provided by default, or extend it
-via additional configurations. Please look at the [default model
-configuration](config/model/typedef) files that come with Pocket Archive.
-
-One doesn't have to define all possible types in detail. Pocket Archive
-provides some basic types, e.g.: `Anything` (the super-class of them all),
-`Artifact`, `File`, `Part`, which can be used in a very basic installation and
-should not be radically altered, because some basic functionality of the system
-relies on them. To add more specific definitions, *subtypes* can be defined. A
-subtype inherits all the property definitions of its broader model, and adds
-more specific behavior. An example classification could be: Anything -> File ->
-Image File -> Scientific Image. Each of the sub-types would only define the
-special properties of that definition, which add to, or replace, the properties
-of its broader definitions.
-
-All resources in Pocket Archive must be assigned a content type. If someone has
-to deal with a resource that doesn't fit in any of the predefined content
-models, they can asign it the most specific type that they can. At worst, they
-can put it under Anything. Of course, if one starts dealing with many
-unclassifiable resources that look similar, it's probably best to define a
-model for them; but that is not mandatory.
-
-Each metadata field can be specified by constraints. These constraints can be
-on:
-
-- Type: the data type for the field, e.g. string, number, resource
-  (relationship), etc.
-- Cardinality: how many values can be set for a field, for each resource. These
-  values can be adjusted to set mandatory fields, single-valued fields, etc.
-- Range: the range of values allowed. How this is interpreted depends on the
-  data type: for a number can be a min/max range, for a string a regular
-  expression pattern, for a resource the type(s) of the resources pointed to,
-  etc.
-
-All of these constraints are optionals. Fields that are not defined may accept
-any number of values, and are optional. So it's up to the repository manager
-to decide how specific or how free-form their archive should be.
-
-Note that fields that are not defined at least by a label, may be hard to
-understand by users browsing the discovery interface.
+recognized by the system, and how they relate to one another. Each individual
+Pocket Archive installation can use the baseline one provided by default, or
+extend it via additional configuration.
+
+See the [content model configuration manual](doc/content_model.md) for details
+on how to set up a custom content model.
 
 
 ### Site generation
 ### Site generation
 
 
@@ -251,7 +151,7 @@ Simple road map for a rough prototype:
   - ⚒ Content model
   - ⚒ Content model
     - ⎊ Validation rules
     - ⎊ Validation rules
     - ⎊ Relationship inference rules
     - ⎊ Relationship inference rules
-  - Local overrides
+  - Local overrides
 - ⚒ Submission module
 - ⚒ Submission module
   - ✓ SIP building
   - ✓ SIP building
   - ✓ Metadata from LL
   - ✓ Metadata from LL

+ 50 - 0
doc/content_model.md

@@ -0,0 +1,50 @@
+# Content model configuration
+
+**Note:** The scope of this functional area is currently under review. Things
+may change.
+
+Pocket Archive ships with some predefined content types. For some very simple
+archives, this may be enough to get started with little or no customization.
+For a setup which needs to define more numerous or complex content types in a
+more articulated way, additional types can be defined. Please look at the
+[default model configuration](../config/model/typedef) files that come with
+Pocket Archive. 
+
+Each *type definition* is encoded in a configuration file defining a single
+content category type. One doesn't have to define all possible types in detail.
+Pocket Archive provides some basic types, e.g.: `Anything` (the super-class of
+them all), `Artifact`, `File`, `Part`, which should not be radically altered,
+because some basic functionality of the system relies on them. To add more
+specific definitions, *subtypes* can be defined. A subtype inherits all the
+property definitions of its broader model, and adds more specific behavior. An
+example classification could be: Anything -> File -> Image File -> Scientific
+Image.  Each of the sub-types would only define the special properties of that
+definition, which add to, or replace, the properties of its broader
+definitions.
+
+All resources in Pocket Archive must be assigned a content type. If someone has
+to deal with a resource that doesn't fit in any of the predefined content
+models, they can asign it the most specific type that they can. At worst, they
+can put it under Anything. Of course, if one starts dealing with many
+unclassifiable resources that look similar, it's probably best to define a
+model for them; but that is not mandatory.
+
+Each metadata field can be specified by constraints. These constraints can be
+on:
+
+- Type: the data type for the field, e.g. string, number, resource
+  (relationship), etc.
+- Cardinality: how many values can be set for a field, for each resource. These
+  values can be adjusted to set mandatory fields, single-valued fields, etc.
+- Range: the range of values allowed. How this is interpreted depends on the
+  data type: for a number can be a min/max range, for a string a regular
+  expression pattern, for a resource the type(s) of the resources pointed to,
+  etc.
+
+All of these constraints are optionals. Fields that are not defined may accept
+any number of values, and are optional. So it's up to the repository manager
+to decide how specific or how free-form their archive should be.
+
+Note that fields that are not defined at least by a label, may be hard to
+understand by users browsing the discovery interface.
+

+ 70 - 0
doc/submission.md

@@ -0,0 +1,70 @@
+# Submission process
+
+The following are basic guidelines to build a submission package:
+
+- Resources are arranged in files and folders on a local filesystem that Pocket
+  Archive can access.
+- File and folder arrangement is important. A folder represents a descriptive
+  resource, and can have metadata attached to it. A file of folder under a
+  parent folder is automatically added as a *child* of the parent resource.
+  This relationship is intended to present the parent as a container of other
+  sub-resources (descriptive and/or opaque). With this method, hierarchies of
+  any complexity can be built.
+- File and folder order in the submission folder is *not* important. No need to
+  rename files and folders to force a specific ordering. This is specified via
+  laundry list instead. See below.
+- The laundry list file is placed under the submission package folder and must
+  be named `pkar_submission.csv`.
+
+A laundry list is thus formatted:
+
+- The first row is reserved for the headers, which indicate the field names.
+- Each subsequent row represents a resource (except in a multi-value case,
+  described below). The `pas:sourcePath` and `pas:contentType` fields are
+  mandatory for each resource. All other fields are optional for the
+  submission, however, some type definitions may have constraints in this
+  regard.
+- All field names, except for `id`, have a namespace prefix among the ones
+  defined in the configuration. See dedicated section for details about
+  namespaces.
+- Fields with a special meaning:
+    - `id`: optional and single-valued. If provided, it becomes the primary
+      identifier for the resource, which is used anywhere information about the
+      resource is retrieved. The depositor is responmsible for ensuring that
+      the provided ID is unique across the system. If left blank, the system
+      generates an identifier that is guaranteed to be unique.
+    - `pas:sourcePath`: mandatory and single-valued. It refers to the file or
+      folder path relative to the package.
+    - `pas:contentType`: mandatory and single-valued. It defines the content
+      type assigned to the resource. For files, it should be `pas:File` or a
+      sub-type thereof. For folders it must not be a `pas:File` or sub-type.
+- To provide multiple values for one or more fields, additional values are
+  added to rows below the previous. For these additional rows, the `sourcePath`
+  field **must not** be filled, and additional values for single-valued fields
+  are ignored.
+- The ordering of the rows determines the ordering of the resources in their
+  container. The system automatically assigns an order to the resources, using
+  their source path and their position in the laundry list. Resources at the
+  top are not assigned an order, as they are considered self-standing. If an
+  order is needed for those, the `pas:next` field can be set to the desired
+  resource (see point below about relationships), or they can be put in an
+  enclosing folder that acts as a collection.
+- Relationships can be established between resources. These are stored as
+  persistent links and appear as hyperlinks in the discovery interface. A
+  relationship can only be set for a field that is configured as "resource"
+  type. To set a relationship with a resource in the
+  same laundry list that doesn't have an explicit ID set, insert the source
+  path of the resource. For a resource that has already an ID, either by being
+  assigned one manually or by being already deposited, insert the full ID
+  including the `par:` namespace (e.g. for ID `12345`, insert `par:12345`).
+
+### Update
+
+A submission is also used to update existing resources. Each resource update is
+a full replacement of all the resource's metadata, so a submission must include
+a full representation of each of the resources updated.
+
+To facilitate this task while avoiding the need to hold on to all of the
+archive's laundry lists, Pocket Archive can generate a laundry list for one or
+more selected resources. This list, which represents the current state of the
+resources requested, can be edited and submitted for an update. 

+ 1 - 0
src/core.lua

@@ -11,6 +11,7 @@ local config_path = os.getenv("PA_CONFIG_DIR") or (root_path .. "/config")
 
 
 
 
 local M = {
 local M = {
+    -- Project root path.
     root = root_path,
     root = root_path,
     config = dofile(config_path .. "/app.lua"),
     config = dofile(config_path .. "/app.lua"),