Compact, fast, low-maintenance digital repository.

scossu 064d11b4d6 Add collection to index.		il y a 2 mois
bin	a97a8479de Add collections.	il y a 2 mois
config	4506660abc WIP bricks: add implicit bricks from LL.	il y a 2 mois
data	98e61f840e Move type definition configs; various improvements.	il y a 4 mois
doc	7a8a914221 Split off verbose sections of README.	il y a 2 mois
ext	f4e60f450a Bunch of fixes:	il y a 2 mois
src	064d11b4d6 Add collection to index.	il y a 2 mois
templates	064d11b4d6 Add collection to index.	il y a 2 mois
test	eabab4f811 WIP structural bricks: add ordered prop values to SIP.	il y a 2 mois
.gitignore	eabab4f811 WIP structural bricks: add ordered prop values to SIP.	il y a 2 mois
.gitmodules	f4e60f450a Bunch of fixes:	il y a 2 mois
CODE_OF_CONDUCT	f5b8a53cb4 Parole parole parole.	il y a 2 mois
LICENSE	f5b8a53cb4 Parole parole parole.	il y a 2 mois
README.md	064d11b4d6 Add collection to index.	il y a 2 mois
pkar.lua	e6cf4ddea9 Rearrange modules.	il y a 2 mois
pocket_archive-scm-1.rockspec	e6cf4ddea9 Rearrange modules.	il y a 2 mois
scratch.lua	a97a8479de Add collections.	il y a 2 mois

Pocket Archive

The idea

Stick it in your pocket and carry it around. Install it on a cloud server. Install it on a Raspberry Pi. Browse it offline. Browse it online. Duplicate it, share it, harvest it and aggregate it. Feed it non-GMO spreadsheets regularly and it will thrive.

A more sensical description

Pocket Archive is a digital archival system and static site generator for small- to medium-(?) sized archives. It is designed to function in environments with unreliable connectivity and requires very low technical and human resources to set up, run, and use.

Pocket Archive fulfills the following functions:

Storage and management of files and metadata-only resources
Management of descriptive, administrative, and technical metadata
Dynamic relationships between resources
Static site generation (discovery interface)

In spite of its design simplicity, Pocket Archive strives to be highly flexible. It is based on Volksdata , a very compact Linked Data store written in C. There is no restriction to the types and schema of metadata allowed, or the file types supported. A file-based configuration allows to set up content types and validation rules, or to have (almost) no rules at all.

Why

Several years ago, the author of this project believed that he should work in larger and larger institutions, with larger and larger data sets. One day, he came across a project that changed his perspective.

"From a standpoint of preserving human cultural heritage at large, does it make more sense to design very large repositories for very rich institutions, with a lot of layers of safety but also a lot of bureaucracy and redundancy, or rather contribute to many decentralized projects that are highly efficient, small, representing periferal cultures, and most importantly, that are at much higher risk of loss than large institutions'"?

The answer was: both. This software has been conceived with the experience of large-scale repositories as the background to decide what works and what doesn't, what is necessary and what is superfluous, and what catalogers and archivists need to do their job.

It is not inconceivable that if many Pocket Archives were to sprout all over the place one day, they could be periodically harvested, linked together, and presented in one large, central archive (it's Linked Data, after all), without any detriment to the indepencence of the individual archives.

Basic concepts

Until some proper reference is written, this should serve as a high-level documentation to help evaluate the functionality and to help me to stay on track. Many of these ideas have been ripped right off my day job, so there is a good chance they work.

General philosophy

The functional goals of Pocket Archive are simplicity and flexibility, from both a user's and a maintainer's perspectives. These two properties are usually seen as conflicting, but within reason, they can coexist.

Pocket Archive is built upon a minimalistic framework: C and Lua, with very few dependencies. Similarly to these foundational elements, Pocket Archive strives to offer few tools that can be combined in a multitude of ways to achieve many goals, rather than many tools each doing a specific thing.

Resource

The Linked Data adage goes, "everything is a Resource". Without confusing users too much by taking the concept to the Linked Data extremes, the term resource is used in this project to describe individual, self-contained units of information such as:

Digital files;
Intellectual or physical artifacts (artworks, documents, books, etc.);
Structural elements inside or around an entity, such as the order of pages in a book, the two sides of a postcard, a collection of oher resources, etc.

Files are called opaque resources. They are viewed by Pocket Archive as "opaque" in that the system doesn't care about their contents. It only ensures that files are stored as they were submitted, and keeps checksums to guard against data corruption.

All other entities are called descriptive resources. These are effectively Linked Data, which can be queried and searched for. Each file also has its own descriptive resource, so that it can be classified, discovered, and described.

Submission

A Pocket Archive repository is populated via submissions. A submission is performed by telling the archive to pick up some files from a folder it can access, push them into storage, add metadata to them, and index them so that they can be found later.

A submission is directed by a laundry list, which is a spreadsheet listing all the resources (both opaque and descriptive) to be created, and the metadata assigned to them. The laundry list, formatted as a CSV (comma-separated values) file, can be edited by several free and open source applications, such as LibreOffice or Google Sheets. For repetitive, high-volume submissions, templates can be set to facilitate filling in metadata fields. An example submission , which includes a laundry list, is available.

Using spreadsheets is for most users much faster and intuitive than clicking around an alien user interface filled with icons and terms that one has never seen before.

Detailed instructions on how to write a laundry list are under the submission documentation.

Metadata & content model

Metadata are (yes, it's a plural noun) controlled by a content model, which in this project is intended as the entirety of definitions of content types recognized by the system, and how they relate to one another. Each individual Pocket Archive installation can use the baseline one provided by default, or extend it via additional configuration.

See the content model configuration manual for details on how to set up a custom content model.

Site generation

Pocket Archive can generate HTML pages and all the related assets to run a complete static website. The advantages of a static website over a dynamic one are that it's much simpler and economical to set up and run, and it's impervious to malicious attacks.

The entire site must be generated every time resources are created or updated. This is usually very fast, but on large archives it can take a while. This is the downside of static website: they are static.

Functionality

CLI

Pocket Archive can be managed via a command line interface (CLI) when installed locally (e.g. via Luarocks).

The pkar script contains several useful commands, e.g.

pkar init [--wipe]

Initialize the Pocket Archive store and database at the location indicated by the $PKAR_DRES and $PKAR_ORES environment variables. The --wipe option deletes all preexisting data found in those directories.

pkar deposit <path>

Deposit resources in the <path> folder. This folder must contain a laundry list named pkar_submission.csv with file paths relative to that folder.

pkar gen_site

Generate static site at out/html. This includes all HTML pages, derivative media, thumbnails, ancillary assets, and RDF representations of all resources. The html folder can be pointed to by a static HTTP server for local testing, or copied to a remote HTTP server. For local testing, I have been using darkhttpd which does the job in only 55Kb. TODO serverless deployment option

pkar gen_rdf <id> [-f, --format <format>]

Generate the RDF representation of one resource. Useful for debugging and inspecting.

More detailed information can be obtain with pkar --help.

Environment variables

The following environment variables should be set before using Pocket Archive:

PKAR_ROOT: Root of Pocket Archive data. It defaults to ..
PKAR_ORES: Directory of opaque resources (content files). It defaults to ${PKAR_ROOT}/data/ores.
PKAR_DRES: Directory of descriptive resources (metadata). It defaults to ${PKAR_ROOT}/data/dres.
PKAR_CONFIG_DIR: configuration directory. This should be a directory containing the model directory with the content mode configuration and app.lua with general application configuration. It defaults to ./config.

Status

ALPHA. Pocket Archive is a very recent project, in fast development. Its foundational library, Volksdata, has been developed as a spare-time project for 6 years and it just entered in beta status.

Road map

The first goal is to build a working prototype, with all the basic functional components, even if not entirely developed or only usable in a specific development environment, to demonstrate the overall workflows and functionality.

The second step is to produce a minimum viable product, which is fully functional and available for use by the intended audience.

Prototype

✓ Configuration + config parser
- ✓ Application
- ✓ Content model
- ✓ Validation rules
✓ Submission module
- ✓ SIP building
- ✓ Metadata from LL
- ✓ Brick structures
- ✓ Structure inference
✓ HTML generator
- ✓ Index
- ✓ Resource
- ✓ Static assets
✓ Non-HTML generators
- ✓ RDF (turtle)
- ✓ Transformers
- ✓ JS search engine index
✓ CLI
- ✓ Init archive
- ✓ Deposit
- ✓ Generate site
- ✓ Generate LL (single resource)
- ✓ Generate RDF (single resource)
⚒ Front end
- ✓ JS search engine
- ✓ Add collections to index page
- ⚒ Basic styling
  - ✓ Default type icons
⎊ QA
- ⎊ ~50 resource data set

MVP

Multilingual support
Management UI & API
- Deposit via S3 source pointer
- Deposit via single tar or zip file submission
- Dump & restore (whole archive & individual resources)
Content model
- Local overrides
- Relatioships inference
htmlgen option for local file or webserver URL generation
Improve search indexing
Category browsing
CLI
- Generate LL (multi)
- Generate RDF (multi)
Front end
- Enhanced styling and access
Testing

Post-MVP

Incremental build

README.md