소스 검색

Mode doc updates.

scossu 1 주 전
부모
커밋
c600e0d636
3개의 변경된 파일195개의 추가작업 그리고 10개의 파일을 삭제
  1. 17 0
      doc/bricks_coll.graphml
  2. 9 7
      doc/content_model_primer.md
  3. 169 3
      doc/glossary.md

+ 17 - 0
doc/bricks_coll.graphml

@@ -134,6 +134,23 @@
         </y:GenericNode>
       </data>
     </node>
+    <node id="n7">
+      <data key="d6">
+        <y:GenericNode configuration="com.yworks.flowchart.predefinedProcess">
+          <y:Geometry height="49.0" width="90.0" x="245.0" y="61.161720275879816"/>
+          <y:Fill color="#CAEC64" color2="#99CC00" transparent="false"/>
+          <y:BorderStyle color="#3F5400" type="line" width="1.0"/>
+          <y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="36.688228607177734" horizontalTextPosition="center" iconTextGap="4" modelName="custom" textColor="#000000" verticalTextPosition="bottom" visible="true" width="69.52046203613281" x="10.239768981933594" y="6.155885696411133">collection_x
+[Collection]<y:LabelModel>
+              <y:SmartNodeLabelModel distance="4.0"/>
+            </y:LabelModel>
+            <y:ModelParameter>
+              <y:SmartNodeLabelModelParameter labelRatioX="0.0" labelRatioY="0.0" nodeRatioX="0.0" nodeRatioY="0.0" offsetX="0.0" offsetY="0.0" upX="0.0" upY="-1.0"/>
+            </y:ModelParameter>
+          </y:NodeLabel>
+        </y:GenericNode>
+      </data>
+    </node>
     <edge id="e0" source="n2" target="n1">
       <data key="d10">
         <y:PolyLineEdge>

+ 9 - 7
doc/content_model_primer.md

@@ -4,7 +4,7 @@
 
 **WORK IN PROGRESS**
 
-Terms in **bold** are referenced in the [glossary](./glossary.md).
+Terms showing in **bold** are referenced in the [glossary](./glossary.md).
 
 This document is a general-purpose introduction to content modeling concepts in
 Pocket Archive. For detailed technical information on how to set up a content
@@ -50,7 +50,7 @@ that the resource being viewed has a "Still Image" primary type (the most
 specific type), which is a specialization of "Artifact", which is in its turn a
 specialization of "Anything". Searching for all Still Images will find this
 resource. Also searching for Artifacts, or for Anything, will find this
-resource. [WIP note: the links in the classification are not yet working.
+resource. [**WIP note:** the links in the classification are not yet working.
 Eventually they will resolve to listings of all resources in a given content
 type.]
 
@@ -216,9 +216,9 @@ with usage over time.
 ### Setting properties
 
 Only properties that are defined in the resource **schema** can be added to a
-resource. Tools [WIP] shall be made available to write out the complete schema
-of a given instance of Pocket Archive to a file, that can be used as a
-reference.
+resource. Tools [**WIP note:** not yet implemented] shall be made available to
+write out the complete schema of a given instance of Pocket Archive to a file,
+that can be used as a reference.
 
 The only system-mandated properties for all resources are `content_type` and,
 for files, `source_path`. `content_type` determines the schema to be
@@ -295,9 +295,9 @@ scenarios: a minimum cardinality of 1, for example, means that at least one
 value must be provided, which means, that property is *mandatory*. A maximum
 cardinality of 1 means that the property is *single-valued*; etc.
 
-### Range
+#### Range
 
-[NOTE: not yet implemented]
+[**WIP note:** not yet implemented]
 
 The range of a property depends on its data type: for a number or a date, it
 can be a minimum and/or maximum value range; for a string, a specific pattern
@@ -312,3 +312,5 @@ If validation passes, the submission process continues as expected; if it
 fails, the whole submission fails and the process stops. In both cases, a
 report is generated, so that in case of failure, the depositor can inspect the
 validation results and adjust the metadata before re-submitting the SIP.
+
+[**WIP note:** report generation and delivery is not yet implemented.]

+ 169 - 3
doc/glossary.md

@@ -20,6 +20,13 @@ of the US Federal Agency Digital Guidelines Inititative (FADGI).
 
 ## \*Artifact
 
+A human-made object with a cultural value. It can be a physical object, such as
+a book, a scuplture, or a document, or also digital data (e.g.  a born-digital
+photograph or video clip, a software application, etc.). It roughly corresponds
+to the Intellectual Entity concept in the
+[PREMIS](https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf) data
+dictionary for preservation metadata.
+
 ## Atomic, atomicity
 An atomic operation is an operation on data that either succeeds or fails
 completely. Complex data structures can be handled via atomic operations, if
@@ -28,6 +35,20 @@ parts of the data structure are intact.
 
 ## Checksum
 
+A sequence of bytes, usually visualized as an alphanumeric sequence (e.g.,
+`blake2:e974d0e881f151ee293519e[…]`), that represents the "fingerprint" of a
+digital file. Many algorithms are available for generating a checksum for a
+file, but for each algorithm, a file has only one checksum. If even one bit
+changes in the file, the checksum changes completely. It is a fundamental tool
+for digital preservation, as it can easily indicate if a file has changed on
+the storage medium (due to storage corruption) or in transit (due to network
+glitches), or if it may have been forged.
+
+Pocket Archive calculates and stores checksums in the
+[BLAKE2b](https://www.blake2.net/) format, which is a less popular, but vey
+fast and secure algorithm. In future releases, it may support multiple
+algorithms.
+
 ## \*Brick
 
 ## \*Collection
@@ -46,11 +67,23 @@ indicate implicit memberships (e.g. a file inside a folder), which are added
 automatically by the submission process, and explicit ones, which are defined
 via the `has_member` **property**.
 
+## \*Codename
+
+The name used to reference a **field** in a **laundry list** by content
+managers. It is by convention made of lowercase letters, numbers, and
+underscores, e.g. `path_name`, `submission_date`, `creator`…
+
 ## \*Content model
 
+In Pocket Archive, a content model is the complete set of definitions of
+all the **content types** in a Pocket Archive instance, the **properties** that
+define them, and how the interact with one another via **relationships**.
+Otherwise known as an *ontology*.
+
 ## \*Content type
 
-## \*Content type definition
+Classification of a resource according to a **content model**.  Each resource
+in Pocket Archive is assigned one and only one content type.
 
 ## CSV
 
@@ -68,30 +101,125 @@ application file (usually employed by the "Save" command).
 
 ## \*Descriptive resource
 
+In Pocket Archive, this is a **resource** that the system may parse and
+understand as meaningul data, i.e., as **RDF** data. The **Artifact** and
+**Brick** content types are descriptive resources. Files, which are **opaque
+resources**, are paired with an implicit descriptive resource that presents the
+file's metadata so that the file can be described and fund in searches.
+
 ## \*Drop box
 
+A folder, on a local or remote filesystem, that is being watched by a running
+Pocket Archive (`pkar_watch`) instance. Any **laundry list** that is put into
+this folder will trigger a **submission** process.
+
 ## \*Field
 
+The description of a **resource** **property** as a column in a
+**laundry list** CSV. A field has a name (in the laundry list, the
+**codename**) and one or multiple values.
+
 ## Fixity
 
+The assurance that a digital file is intact and bit-by-bit identical to how
+it was submitted. Fixity is checked by verifying a **checksum**.
+
 ## \*Laundry list
 
+A CSV file with tabular data listing all the resources included in a
+**submission** and their metadata. The Laundry list is produced by the
+depositor of a **SIP** and triggers an automatic submission process.
+
 ## Linked Data, Linked Open Data
 
+Data (in the case of Linked Open Data, published and freely accessible on the
+Web) in the **RDF** format. Linked (Open) Data is a popular publishing format
+among cultural heritage, humanitarian, and scientific institutions, and other
+organizations that value interoperability and the free exchange of data sets.
+Linked Data facilitates the aggregation and reconciliation of heterogeneous
+data sets produces by different sources, by relying on controlled vocabularies
+and unambiguous, globally unique identifiers.
+
 ## Markdown
 
+Plain-text [writing format](https://daringfireball.net/projects/markdown/) that
+can be converted to HTML or other formatted text by using conventional marks
+and embedded HTML. Markdown is very popular among technical documentation
+writers because it doesn't need a specialized application to write. This
+glossary and the other Pocket Archive documentation are written in Markdown.
+
+Pocket Archive supports writing Markdown documents for its "long description"
+**property** that can be used to create content-rich introduction pages for
+**Collections**.
+
 ## Metadata
 
+Literally, data about data. Metadata are administrative and technical
+information about a physical or digital object that do not constitute the
+object itself, but are helpful to classify, inventory, find, and relate it.
+
+## Namespace
+
+The prefix of a group of **UIDs** or **URIs** that is constant for a whole
+organization or business unit. It is a convention used to separate identifiers
+into broad categories for administrative purposes. Namespaces are used
+extensively in **RDF** and in Pocket Archive, however, they are a more
+technical aspect of archiving that is not easily visible by occasional users.
+
+Namespaces in RDF can be shortened within a contained system, as they can be
+lengthy, and the mapping between the short prefix and the full-length namespace
+is maintained by that system. URIs published on the Web must be either in their
+fully-qualified form, or accompanied by the namespace mapping in the same
+document.
+
+E.g.: the URI `http://purl.org/dc/terms/contributor` can be represented
+internally in Pocket Archive as `dc:contributor`, as long as the relation
+between `dc:` and `http://purl.org/dc/terms/` is registered.
+
+Pocket Archive supports user-defined namespaces and mappings, that can be
+configured by the archive administrator.
+
 ## Ontology
 
 See **Content model**.
 
 ## \*Opaque resource
 
+In Pocket Archive, a digital file preserved in the archive. It is "opaque" in
+the sense that Pocket Archive is only aware of its presence and **fixity**, but
+it doesn't know about its contents. Each opaque resource is accompanied by a
+**descriptive resource** that contains its **metadata** and points to it.
+
+## \*Presentation
+
+In Pocket Archive, this is the whole package of Web pages, **presentation
+files**, and ancillary digital assets that make up a **static site**
+generated by Pocket Archive.
+
+Presentation data are disposable and can be regenerated on demand. Pocket
+Archive does not decide whether or how a presentation should be published, or
+who has access to it. That is a decision left to the archive owners and system
+administrators.
+
 ## \*Presentation file
 
+Also known as *Derivative file* and other names by
+[FADGI](https://www.digitizationguidelines.gov/term.php?term=derivativefile).
+This is a file derived from a **production master** file that is fit for
+**presentation**. It often has a lower quality and lossy compression than its
+source, and it does not need to be preserved, as it can be regenerated without
+manual intervention.
+
+Pocket Archive automatically generates presentation files and thumbnails during
+its static site generation process.
+
 ## \*Property
 
+A **metadata** element that can be attributed to a **resource**. Properties are
+more or less strictly defined in the **content model** by the archive
+administrator and they may have a data type, a cardinality, and a range. See
+the [content model primer](./content_model_primer.md) for more information.
+
 ## \*Production master
 
 A file fit for generating **presentation files**. This is usually a file
@@ -119,7 +247,14 @@ to a **resource** managed by Pocket Archive itself. Unlike hyperlinks in the
 WWW, which do not always own the resource pointed to and do not guarantee its
 existence, Pocket Archive guarantees the consistency of relationship links.
 
-## \*Resource
+## Resource
+
+In **RDF** parlance, "everything is a resource", which means, every unit of
+information can be represented by a globally unique document on the Web.
+
+In Pocket Archive, the definition of resource is more specific, and it
+refers to any record individually retrievable in the archive. Every resource is
+assigned a **content type**.
 
 ## RDF
 
@@ -179,8 +314,39 @@ Pocket Archive generates static sites for presentation. It also has the option
 user's local computer with a web browser, without any web server or even any
 Internet connection.
 
-## Submission
+## \*Submission
+
+The act of assembling and sending a curated data set to Pocket Archive for
+archival.
+
+A submission is made up of files, often arranged in folder hierarchies (the
+data), and an accompanying inventory, or **laundry list**, that contains the
+**metadata**. A submission has a unique identifier that gets assigned to all
+the **resources** included in it.
 
 ## UID
 
+Unique Identifier. Usually, this identifier is intended to be unique only in
+the system it is working in. By default, Pocket Archive resources are assigned
+16-character random strings, prefixed by a namespace to denote a resource. This
+is sufficient to keep millions of records in the archive without collision
+(i.e. duplicate IDs).
+
+## URI
+
+Universal Resource Identifier. It is a globally unique identifier that is able
+to pinpoint a specific **resource** on the WWW. A URI may or may not resolve
+to an actual location on the Web. URIs are a key component of the **RDF**.
+
+Pocket Archive uses URIs to identify individual resources, metadata properties,
+content types, and other entities. These are usually hidden from the end user
+but viewable in the resources' raw data representation.
+
 ## UUID
+
+Universally Unique Identifier. Similar to a **UID**, but with a reasonable
+guarantee of uniqueness in the global space (WWW). Uniqueness is usually
+guaranteed by a **namespace** prefix that is a Web domain name owned by the
+UUID publisher, and/or by a long string of random characters that make the
+chance of collision (overlap) small enough to be negligible, or by a
+progressive sequence controlled by the publishing system.