Sfoglia il codice sorgente

Parole parole parole.

scossu 1 settimana fa
parent
commit
f5b8a53cb4
3 ha cambiato i file con 283 aggiunte e 0 eliminazioni
  1. 3 0
      CODE_OF_CONDUCT
  2. 11 0
      LICENSE
  3. 269 0
      README.md

+ 3 - 0
CODE_OF_CONDUCT

@@ -0,0 +1,3 @@
+Be curious.
+Be persistent.
+Be generous.

+ 11 - 0
LICENSE

@@ -0,0 +1,11 @@
+----------------------------------------------------------------------------
+"REROUTED BEER-WARE LICENSE" (Revision 42a):
+Stefano Cossu wrote this project. As long as you retain this notice you
+can do whatever you want with this stuff. If we meet some day, and you think
+this stuff is worth it, you can buy Poul-Henning Kamp <phk@FreeBSD.ORG> a beer
+in return.
+----------------------------------------------------------------------------
+
+DISCLAIMER: Most of this code was written on the 234 and 734 Los Angeles Metro
+bus lines. The author is not to be held responsible for potholes on Sepulveda
+Boulevard that may have caused typing errors and malfunction of the software. 

+ 269 - 0
README.md

@@ -0,0 +1,269 @@
+# Pocket Archive
+
+## The idea
+
+Stick it in your pocket and carry it around. Install it on a cloud server.
+Install it on a Raspberry Pi. Browse it offline. Browse it online. Duplicate
+it, share it, harvest it and aggregate it. Feed it non-GMO spreadsheets
+regularly and it will thrive.
+
+## A more sensical description
+
+Pocket Archive is a digital archival system and static site generator for
+small- to medium-(?) sized archives. It is designed to function in environments
+with unreliable connectivity and requires very low technical and human
+resources to set up, run, and use.
+
+Pocket Archive fulfills the following functions:
+
+- Storage and management of files and metadata-only resources
+- Management of descriptive, administrative, and technical metadata
+- Dynamic relationships between resources
+- Static site generation (discovery interface)
+
+In spite of its design simplicity, Pocket Archive strives to be highly
+flexible. It is based on [Volksdata
+](https://git.knowledgetx.com/scossu/volksdata), a very compact Linked Data
+store written in C. There is no restriction to the types and schema of metadata
+allowed, or the file types supported. A file-based configuration allows to set
+up content types and validation rules, or to have (almost) no rules at all.
+
+## Why
+
+Several years ago, the author of this project believed that he should work in
+larger and larger institutions, with larger and larger data sets. One day, he
+came across a [project](https://zenodo.org/records/8111569) that changed his
+perspective.
+
+"From a standpoint of preserving human cultural heritage at large, does it make
+more sense to design very large repositories for very rich institutions, with a
+lot of layers of safety but also a lot of bureaucracy and redundancy, or
+contribute to many decentralized projects that are highly efficient, small,
+representing periferal cultures, and most importantly, that are at much higher
+risk of loss than the large institutions"?
+
+Both: this software has been conceived with the experience of large-scale
+repositories as the background to decide what works and what doesn't, what is
+necessary and what is superfluous, and what catalogers and archivists need to
+do their job.
+
+## Basic concepts
+
+Until some proper reference is written, this should serve as a high-level
+documentation to help evaluate the functionality and for the author to stay on
+track. Some of these ideas have been ripped right off my day job, so there is
+a good chance they work.
+
+### General philosophy
+
+The functional goals of Pocket Archive are simplicity and flexibility, from
+both a user's and a maintainer's perspectives. These two properties are usually
+seen as conflicting, but within reason, they can coexist.
+
+Pocket Archive is built upon a minimalistic framework: C and Lua, with very few
+dependencies. As with these foundational elements, it strives to offer few
+tools that can be combined in a multitude of ways to achieve many goals, rather
+than many tools each doing a specific thing.
+
+### Resource
+
+The Linked Data adage goes, "everything is a Resource". Without confusing users
+too much by taking the concept to the Linked Data extremes, the term *resource*
+is used in this project to describe individual, self-contained units of
+information such as:
+
+- Files;
+- Intellectual or physical artifacts (artworks, documents, books, etc.);
+- Structural elements inside or around an entity, such as the order of pages in
+  a book, the two sides of a postcard, a collection of artifacts, etc.
+
+Files are called *opaque resources*. They are viewed by Pocket Archive as
+"opaque" in that the system doesn't care about their contents. It only ensures
+that files are stored as they were submitted, and keeps checksums to guard
+against data corruption.
+
+All other entities are called *descriptive resources*. These are effectively
+Linked Data, which can be queried and searched for. Each file also has its own
+descriptive resource, so that it can be classified, discovered, and described.
+
+### Submission
+
+A Pocket Archive repository is populated via *submissions*. A submission is
+performed by telling the archive to pick up some files from a folder it can
+access, push them into storage, add metadata to them, and index them so that
+they can be found later.
+
+A submission is directed by a *laundry list*, which is a spreadsheet listing
+all the resources (both opaque and descriptive) to be created, and the metadata
+assigned to them. The laundry list, formatted as a CSV (comma-separated value)
+file, can be edited by several free and open source applications, such as
+LibreOffice. For repetitive, high- volume submissions, templates can be set to
+facilitate filling in metadata fields. An [example submission
+](test/sample_submission/postcard-bag/data/), which includes a laundry list, is
+available.
+
+Detailed instructions on how to write a laundry list shall be added later. For
+now, the following are the basic guidelines to build a submission package:
+
+- Resources are arranged in files and folders on a local filesystem that Pocket
+  Archive can access.
+- File and folder arrangement is important. A folder represents a descriptive
+  resource, and can have metadata attached to it. A file of folder under a
+  parent folder is automatically added as a *child* of the parent resource.
+  This relationship is intended to present the parent as a container of other
+  sub-resources (descriptive and/or opaque). With this method, hierarchies of
+  any complexity can be built.
+- File and folder order in the submission folder is *not* important. No need to
+  rename files and folders to force a specific ordering. This is specified via
+  laundry list instead. See below.
+- The laundry list file is placed under the submission package folder and must
+  be named `pkar_submission.csv`.
+
+A laundry list is thus formatted:
+
+- The first row is reserved for the headers, which indicate the field names.
+- Each subsequent row represents a resource (except in a multi-value case,
+  described below). The `pas:sourcePath` and `pas:contentType` fields are
+  mandatory for each resource. All other fields are optional for the
+  submission, however, some type definitions may have constraints in this
+  regard.
+- All field names, except for `id`, have a namespace prefix among the ones
+  defined in the configuration. See dedicated section for details about
+  namespaces.
+- Fields with a special meaning:
+    - `id`: optional and single-valued. If provided, it becomes the primary
+      identifier for the resource, which is used anywhere information about the
+      resource is retrieved. The depositor is responmsible for ensuring that
+      the provided ID is unique across the system. If left blank, the system
+      generates an identifier that is guaranteed to be unique.
+    - `pas:sourcePath`: mandatory and single-valued. It refers to the file or
+      folder path relative to the package.
+    - `pas:contentType`: mandatory and single-valued. It defines the content
+      type assigned to the resource. For files, it should be `pas:File` or a
+      sub-type thereof. For folders it must not be a `pas:File` or sub-type.
+- To provide multiple values for one or more fields, additional values are
+  added to rows below the previous. For these additional rows, the `sourcePath`
+  field **must not** be filled, and additional values for single-valued fields
+  are ignored.
+- The ordering of the rows determines the ordering of the resources in their
+  container. The system automatically assigns an order to the resources, using
+  their source path and their position in the laundry list. Resources at the
+  top are not assigned an order, as they are considered self-standing. If an
+  order is needed for those, the `pas:next` field can be set to the desired
+  resource (see point below about relationships), or they can be put in an
+  enclosing folder that acts as a collection.
+- Relationships can be established between resources. These are stored as
+  persistent links and appear as hyperlinks in the discovery interface. A
+  relationship can only be set for a field that is configured as "resource"
+  type. To set a relationship with a resource in the
+  same laundry list that doesn't have an explicit ID set, insert the source
+  path of the resource. For a resource that has already an ID, either by being
+  assigned one manually or by being already deposited, insert the full ID
+  including the `par:` namespace (e.g. for ID `12345`, insert `par:12345`).
+
+### Update
+
+A submission is also used to update existing resources. Each resource update is
+a full replacement of all the resource's metadata, so a submission must include
+a full representation of each of the resources updated.
+
+To facilitate this task while avoiding the need to hold on to all of the
+archive's laundry lists, Pocket Archive can generate a laundry list for one or
+more selected resources. This list, which represents the current state of the
+resources requested, can be edited and submitted for an update. This method is
+much faster and intuitive than clicking around an alien user interface filled
+with icons and terms that one has never seen before.
+
+### Metadata & content model
+
+**Note:** The scope of this functional area is currently under review. Things
+may change.
+
+Metadata are (yes, it's a *plural* noun) controlled by a *content model*, which
+in this project is intended as the entirety of definitions of content types
+recognized by the system, and how they relate to one another. Each *type
+definition* is encoded in a configuration file defining a single content
+category type. This configuration is specific to each individual Pocket Archive
+installation, which can use the baseline one provided by default, or extend it
+via additional configurations. Please look at the [default model
+configuration](config/model/typedef) files that come with Pocket Archive.
+
+One doesn't have to define all possible types in detail. Pocket Archive
+provides some basic types, e.g.: `Anything` (the super-class of them all),
+`Artifact`, `File`, `Part`, which can be used in a very basic installation and
+should not be radically altered, because some basic functionality of the system
+relies on them. To add more specific definitions, *subtypes* can be defined. A
+subtype inherits all the property definitions of its broader model, and adds
+more specific behavior. An example classification could be: Anything -> File ->
+Image File -> Scientific Image. Each of the sub-types would only define the
+special properties of that definition, which add to, or replace, the properties
+of its broader definitions.
+
+All resources in Pocket Archive must be assigned a content type. If someone has
+to deal with a resource that doesn't fit in any of the predefined content
+models, they can asign it the most specific type that they can. At worst, they
+can put it under Anything. Of course, if one starts dealing with many
+unclassifiable resources that look similar, it's probably best to define a
+model for them; but that is not mandatory.
+
+Each metadata field can be specified by constraints. These constraints can be
+on:
+
+- Type: the data type for the field, e.g. string, number, resource
+  (relationship), etc.
+- Cardinality: how many values can be set for a field, for each resource. These
+  values can be adjusted to set mandatory fields, single-valued fields, etc.
+- Range: the range of values allowed. How this is interpreted depends on the
+  data type: for a number can be a min/max range, for a string a regular
+  expression pattern, for a resource the type(s) of the resources pointed to,
+  etc.
+
+All of these constraints are optionals. Fields that are not defined may accept
+any number of values, and are optional. So it's up to the repository manager
+to decide how specific or how free-form their archive should be.
+
+Note that fields that are not defined at least by a label, may be hard to
+understand by users browsing the discovery interface.
+
+### Site generation
+
+Pocket Archive can generate HTML pages and all the related assets to
+run a complete static website. The advantages of a static website over a
+dynamic one are that it's much simpler and economical to set up and run, and
+it's impervious to malicious attacks.
+
+The entire site must be generated every time resources are created or updated.
+This is usually very fast, but on large archives it can take a while. This is
+the downside of static website: they are static.
+
+## Status
+
+**ALPHA**. Pocket Archive is a very recent project, in fast development. Its
+foundational library, Volksdata, has been developed as a spare-time project for
+6 years and it just entered in beta status.
+
+### Road map
+
+Simple road map for a rough prototype:
+
+- ⚒ Configuration + config parser
+  - ✓ Application
+  - ⚒ Content model
+    - ⎊ Validation rules
+    - ⎊ Relationship inference rules
+  - Local overrides
+- ⚒ Submission module
+  - ✓ SIP building
+  - ✓ Metadata from LL
+  - ✓ Structure inference
+  - Relatioships inference
+- ⚒ HTML generator
+  - ✓ Index
+  - ✓ Resource
+  - ✓ Static assets
+  - ⚒ Transformers
+- ⎊ Non-HTML generators
+  - LL
+  - RDF (turtle)
+- ⎊ Front end
+  - JS search engine