|
@@ -24,308 +24,14 @@ Pocket Archive fulfills the following functions:
|
|
|
In spite of its design simplicity, Pocket Archive strives to be highly
|
|
|
flexible. It is based on [Volksdata
|
|
|
](https://git.knowledgetx.com/scossu/volksdata), a very compact Linked Data
|
|
|
-store written in C. There is no restriction to the types and schema of metadata
|
|
|
-allowed, or the file types supported. A file-based configuration allows to set
|
|
|
-up content types and validation rules, or to have (almost) no rules at all.
|
|
|
+store and manipulation library. There is no restriction to the types and schema
|
|
|
+of metadata allowed, or the file types supported. A file-based configuration
|
|
|
+allows to set up content types and validation rules, or to have (almost) no
|
|
|
+rules at all.
|
|
|
|
|
|
-## Why
|
|
|
+## Documentation
|
|
|
|
|
|
-Several years ago, the author of this project believed that he should work in
|
|
|
-larger and larger institutions, with larger and larger data sets. One day, he
|
|
|
-came across a [project](https://zenodo.org/records/8111569) that changed his
|
|
|
-perspective.
|
|
|
+The full user documentation sources are in the [user
|
|
|
+guide](./doc/user_guide/docs) folder and published at https://pkar-doc.bmll.cc
|
|
|
|
|
|
-"From a standpoint of preserving human cultural heritage at large, does it make
|
|
|
-more sense to design very large repositories for very rich institutions, with a
|
|
|
-lot of layers of safety but also a lot of bureaucracy and redundancy, or rather
|
|
|
-contribute to many decentralized projects that are highly efficient, small,
|
|
|
-representing periferal cultures, and most importantly, that are at much higher
|
|
|
-risk of loss than large institutions'"?
|
|
|
-
|
|
|
-The answer was: both. This software has been conceived with the experience of
|
|
|
-large-scale repositories as the background to decide what works and what
|
|
|
-doesn't, what is necessary and what is superfluous, and what catalogers and
|
|
|
-archivists need to do their job.
|
|
|
-
|
|
|
-It is not inconceivable that if many Pocket Archives were to sprout all over
|
|
|
-the place one day, they could be periodically harvested, linked together, and
|
|
|
-presented in one large, central archive (it's Linked Data, after all), without
|
|
|
-any detriment to the indepencence of the individual archives.
|
|
|
-
|
|
|
-## Quickstart
|
|
|
-
|
|
|
-This has been tested on Linux only. It's not guaranteed to work on other
|
|
|
-systems at the moment.
|
|
|
-
|
|
|
-### System prerequisites
|
|
|
-
|
|
|
-- A build environment (at least Git, libc, a C compiler, and Make)
|
|
|
-- UUID library (`uuid/uuid.h` - util-linux or linux-headers in most distros)
|
|
|
-- xxhash development package
|
|
|
-- lmdb development package
|
|
|
-- libvips development package
|
|
|
-- Lua 5.4 development package (lua-dev in some distros)
|
|
|
-- Luarocks 5.4
|
|
|
-
|
|
|
-If using Arch Linux:
|
|
|
-
|
|
|
-```
|
|
|
-pacman -Syu
|
|
|
-pacman -S base-devel util-linux-libs git xxhash lmdb libvips lua luarocks
|
|
|
-```
|
|
|
-
|
|
|
-### Install Volksdata & Pocket Archive
|
|
|
-
|
|
|
-Pocket Archive and Volksdata are still alpha and not in the Luarocks artifact
|
|
|
-repo yet, so the rocks must be installed manually for the time being.
|
|
|
-
|
|
|
-Installing locally or in a dedicated container is strongly recommended at this
|
|
|
-stage.
|
|
|
-
|
|
|
-When Pocket Archive will get into beta status and be published on Luarocks,
|
|
|
-the commands below will be replaced by a one-line command. Until then...
|
|
|
-
|
|
|
-```
|
|
|
-# Note: tested on Archlinux. Other distros (especially Alpine) may need tweaks.
|
|
|
-eval $(luarocks path)
|
|
|
-luarocks install --local debugger # Not in dependencies file but temporarily required
|
|
|
-git clone --recurse-submodules https://git.knowledgetx.com/scossu/volksdata_lua.git
|
|
|
-cd volksdata_lua
|
|
|
-luarocks build --local
|
|
|
-lua test.lua # optional
|
|
|
-cd ../
|
|
|
-git clone --recurse-submodules https://git.knowledgetx.com/scossu/pocket_archive.git
|
|
|
-cd pocket_archive
|
|
|
-luarocks build --local
|
|
|
-# Add Luarocks paths to your login script
|
|
|
-luarocks path >> ~/.bashrc
|
|
|
-# Penlight.clonetree is not working properly. A pull request is in progress:
|
|
|
-# https://github.com/lunarmodules/Penlight/pull/496
|
|
|
-# In the meantime, clone the fork and install from the local repo:
|
|
|
-cd ../
|
|
|
-git clone https://github.com/scossu/Penlight.git
|
|
|
-cd Penlight
|
|
|
-git checkout clonetree
|
|
|
-luarocks build --local
|
|
|
-cd ../pocket_archive
|
|
|
-```
|
|
|
-
|
|
|
-### Run demo submission
|
|
|
-
|
|
|
-Initialize the archive first:
|
|
|
-
|
|
|
-```
|
|
|
-pkar init
|
|
|
-```
|
|
|
-
|
|
|
-This, after user confirmation, will create the required folders and database
|
|
|
-file in the archive root (a temporary folder by default). Then:
|
|
|
-
|
|
|
-```
|
|
|
-pkar submission test/sample_submission/pkar_submission.csv
|
|
|
-```
|
|
|
-
|
|
|
-### Generate static site from archive
|
|
|
-
|
|
|
-```
|
|
|
-pkar gen-site
|
|
|
-```
|
|
|
-
|
|
|
-Will generate the static site in `out/http/`. Note that this is static HTML but
|
|
|
-it needs a web server to resolve links (completely server-less version is in
|
|
|
-the works).
|
|
|
-
|
|
|
-If you don't have a configured web server yet, the provided `darkhttpd` will
|
|
|
-work in a pinch:
|
|
|
-
|
|
|
-```
|
|
|
-cd ext/darkhttpd/ # Archlinux: sudo pacman -S darkhttpd
|
|
|
-make
|
|
|
-cd -
|
|
|
-```
|
|
|
-
|
|
|
-Serve the site:
|
|
|
-
|
|
|
-```
|
|
|
-./ext/darkhttpd/darkhttpd out/html
|
|
|
-```
|
|
|
-
|
|
|
-(see more options with `darkhttpd --help`)
|
|
|
-
|
|
|
-Point your browser to `localhost:8080` and enjoy.
|
|
|
-
|
|
|
-## Basic concepts
|
|
|
-
|
|
|
-Until some proper reference is written, this should serve as a high-level
|
|
|
-documentation to help evaluate the functionality and to help me to stay on
|
|
|
-track. Many of these ideas have been ripped right off my day job, so there is
|
|
|
-a good chance they work.
|
|
|
-
|
|
|
-### General philosophy
|
|
|
-
|
|
|
-The functional goals of Pocket Archive are simplicity and flexibility, from
|
|
|
-both a user's and a maintainer's perspectives. These two properties are usually
|
|
|
-seen as conflicting, but within reason, they can coexist.
|
|
|
-
|
|
|
-Pocket Archive is built upon a minimalistic framework: C and Lua, with very few
|
|
|
-dependencies. Similarly to these foundational elements, Pocket Archive strives
|
|
|
-to offer few tools that can be combined in a multitude of ways to achieve many
|
|
|
-goals, rather than many tools each doing a specific thing.
|
|
|
-
|
|
|
-### Resource
|
|
|
-
|
|
|
-The Linked Data adage goes, "everything is a Resource". Without confusing users
|
|
|
-too much by taking the concept to the Linked Data extremes, the term *resource*
|
|
|
-is used in this project to describe individual, self-contained units of
|
|
|
-information such as:
|
|
|
-
|
|
|
-- Digital files;
|
|
|
-- Intellectual or physical artifacts (artworks, documents, books, etc.);
|
|
|
-- Structural elements inside or around an entity, such as the order of pages in
|
|
|
- a book, the two sides of a postcard, a collection of oher resources, etc.
|
|
|
-
|
|
|
-Files are called *opaque resources*. They are viewed by Pocket Archive as
|
|
|
-"opaque" in that the system doesn't care about their contents. It only ensures
|
|
|
-that files are stored as they were submitted, and keeps checksums to guard
|
|
|
-against data corruption.
|
|
|
-
|
|
|
-All other entities are called *descriptive resources*. These are effectively
|
|
|
-Linked Data, which can be queried and searched for. Each file also has its own
|
|
|
-descriptive resource, so that it can be classified, discovered, and described.
|
|
|
-
|
|
|
-### Submission
|
|
|
-
|
|
|
-A Pocket Archive repository is populated via *submissions*. A submission is
|
|
|
-performed by telling the archive to pick up some files from a folder it can
|
|
|
-access, push them into storage, add metadata to them, and index them so that
|
|
|
-they can be found later.
|
|
|
-
|
|
|
-A submission is directed by a *laundry list*, which is a spreadsheet listing
|
|
|
-all the resources (both opaque and descriptive) to be created, and the metadata
|
|
|
-assigned to them. The laundry list, formatted as a CSV (comma-separated values)
|
|
|
-file, can be edited by several free and open source applications, such as
|
|
|
-LibreOffice or Google Sheets. For repetitive, high-volume submissions,
|
|
|
-templates can be set to facilitate filling in metadata fields. An [example
|
|
|
-submission ](test/sample_submission/postcard-bag/data/), which includes a
|
|
|
-laundry list, is available.
|
|
|
-
|
|
|
-Using spreadsheets is for most users much faster and intuitive than clicking
|
|
|
-around an alien user interface filled with icons and terms that one has never
|
|
|
-seen before.
|
|
|
-
|
|
|
-Detailed instructions on how to write a laundry list are under the
|
|
|
-[submission documentation](doc/submission.md).
|
|
|
-
|
|
|
-### Metadata & content model
|
|
|
-
|
|
|
-Metadata are (yes, it's a *plural* noun) controlled by a *content model*, which
|
|
|
-in this project is intended as the entirety of definitions of content types
|
|
|
-recognized by the system, and how they relate to one another. Each individual
|
|
|
-Pocket Archive installation can use the baseline one provided by default, or
|
|
|
-extend it via additional configuration.
|
|
|
-
|
|
|
-See the [content model configuration manual](doc/content_model.md) for details
|
|
|
-on how to set up a custom content model.
|
|
|
-
|
|
|
-### Site generation
|
|
|
-
|
|
|
-Pocket Archive can generate HTML pages and all the related assets to
|
|
|
-run a complete static website. The advantages of a static website over a
|
|
|
-dynamic one are that it's much simpler and economical to set up and run, and
|
|
|
-it's impervious to malicious attacks.
|
|
|
-
|
|
|
-The entire site must be generated every time resources are created or updated.
|
|
|
-This is usually very fast, but on large archives it can take a while. This is
|
|
|
-the downside of static website: they are static.
|
|
|
-
|
|
|
-## Functionality
|
|
|
-
|
|
|
-### CLI
|
|
|
-
|
|
|
-Pocket Archive can be managed via a command line interface (CLI) when
|
|
|
-installed locally (e.g. via Luarocks).
|
|
|
-
|
|
|
-The `pkar` script contains several useful commands, e.g.
|
|
|
-
|
|
|
-```
|
|
|
-pkar init
|
|
|
-```
|
|
|
-
|
|
|
-Initialize the Pocket Archive store and database at the location indicated by
|
|
|
-the `$PKAR_DRES` and `$PKAR_ORES` environment variables. This operation deletes
|
|
|
-all preexisting data found in those directories.
|
|
|
-
|
|
|
-```
|
|
|
-pkar deposit <path>
|
|
|
-```
|
|
|
-
|
|
|
-Deposit resources in the `<path>` folder. This folder must contain a
|
|
|
-laundry list named `pkar_submission.csv` with file paths relative to that
|
|
|
-folder.
|
|
|
-
|
|
|
-```
|
|
|
-pkar gen-site
|
|
|
-```
|
|
|
-
|
|
|
-Generate static site at `$PKAR_ROOT/out/html`. This includes all HTML pages,
|
|
|
-derivative media, thumbnails, ancillary assets, and RDF representations of all
|
|
|
-resources. The `html` folder can be pointed to by a static HTTP server for
|
|
|
-local testing, or copied to a remote HTTP server. For local testing, I have
|
|
|
-been using `darkhttpd` which does the job in only 55Kb. *TODO serverless
|
|
|
-deployment option*
|
|
|
-
|
|
|
-```
|
|
|
-pkar gen-rdf <id> [-f, --format <format>]
|
|
|
-```
|
|
|
-
|
|
|
-Generate the RDF representation of one resource. Useful for debugging and
|
|
|
-inspecting.
|
|
|
-
|
|
|
-More detailed information can be obtain with `pkar --help`.
|
|
|
-
|
|
|
-### Remote submissions
|
|
|
-
|
|
|
-Obviously, the CLI only works if one has command line access (e.g. via SSH) to
|
|
|
-the machine hosting Pocket Archive. It is often the case that a contributor has
|
|
|
-neither shell access nor expertise to run the Pocket Archive CLI. Remote
|
|
|
-submissions provide a more user-friendly way to submit contents to any Pocket
|
|
|
-Archive instance on the WWW.
|
|
|
-
|
|
|
-Read the [remote submission guide](doc/remote_submission.md) for more details.
|
|
|
-
|
|
|
-### Environment variables
|
|
|
-
|
|
|
-The following environment variables are available to modify the application
|
|
|
-behavior:
|
|
|
-
|
|
|
-- `PKAR_ROOT`: Root of Pocket Archive data. It defaults to `.`.
|
|
|
-- `PKAR_ORES`: Directory of opaque resources (content files). It defaults to
|
|
|
- `${PKAR_ROOT}/data/ores`.
|
|
|
-- `PKAR_DRES`: Directory of descriptive resources (metadata). It defaults to
|
|
|
- `${PKAR_ROOT}/data/dres`.
|
|
|
-- `PKAR_CONFIG_DIR`: configuration directory. This should be a directory
|
|
|
- containing the `model` directory with the content mode configuration and
|
|
|
- `app.lua` with general application configuration. It defaults to the `config`
|
|
|
- directory installed by Luarocks.
|
|
|
-
|
|
|
-## Other documentation
|
|
|
-
|
|
|
-See the [doc](doc) folder for complete documentation (currently under
|
|
|
-construction) in different sections aimed at archivists, system administrators,
|
|
|
-and/or developers.
|
|
|
-
|
|
|
-## Compatibility
|
|
|
-
|
|
|
-Linux only. The submission watchdog relies on `inotify` which is not portable.
|
|
|
-Adopters not using the watchdog or willing to re-implement it may have success
|
|
|
-with other POSIX environments, but these have not been tested.
|
|
|
-
|
|
|
-## Status
|
|
|
-
|
|
|
-**ALPHA**. Pocket Archive is a very recent project, in fast development. Its
|
|
|
-foundational library, Volksdata, has been developed as a spare-time project for
|
|
|
-6 years and it just entered in beta status.
|
|
|
-
|
|
|
-### Road map
|
|
|
-
|
|
|
-See [Road map doc](doc/roadmap.md).
|
|
|
+API documentation is in progress.
|