.cfg
filesThe .cfg
format seems to follow a INI-like syntax that is ad-hoc-parsed by
the Transliterator.
So far only top-level section names have been encountered.
Key-value pairs may express either a transliteration operation, e.g.
U+182CU+1820=qU+0307a
Or a passthrough (verbatim copy) operation, such as
At head of title=At head of title
Or a configuration directive, e.g.
SubfieldsAlwaysExcluded=uvxy0123456789
This last option may appear in any section, the first two only in the transliteration sections.
It is unclear how configuration directives can be distinguished from transliteration rules, except by naming all the possible verbatim copy options. A more readable and efficent format would have discrete subsections for configuration and transliteration; if possible, vebatim copy should be implicit, which would make maintenance easier.
It is unclear at the moment if spaces around the =
sign are ignored.
ReRomanizeRecord.bas
Much of the code deals with MARC records. No need to concern about that since the new Transliterator is meant to convert text strings to text strings.
Load cfg file (line by line, we can do the whole thing) and parse table metadata.
Skip lines starting with #
(comments).
Lines starting with [
are section defiinitions. The supported sections are
General
, ScriptToRoman
, and RomanToScript
.
General
SectionPotentially relevant variables:
Likely irrelevant:
RomanToScript
sectionIf there is no =
sign, it is assumed to be a multi-line directive, and the
next line should be loaded and merged with the previous content.
Otherwise, a keyword indicating a configuration directive is looked up.
Currently supported, and potentially relevant, keywords are:
Likely irrelevant keywords:
If no keyword is detected, proceed to transliteration. [TODO transliteration logic details still to be looked at]
ScriptToRoman
sectionThe logic is the same as the RomanToscript
section, but the configuration
keyword are different.
Currently supported, potentially relevant:
Likely irrelevant:
[TODO Complete other function analysis]
GET /tables
Returns all available transliterations, and which directions are available for each.
200 OK
; body: K/V pairs of: script name, list of r2s
(Roman to script),
s2r
(Script to Roman), or both.
POST /trans/<script>/<direction>
<script>
: script name as obtained by the /tables
endpoint.<direction>
: transliteration direction as obtained by the /tables
endpoint.data
: Input text (UTF-8) to transliterate.200 OK
if transliteration was successful; response body: transliterated
string (UTF-8)400 Bad Request
if a script name is not available in the requested
direction; response body: details of failure.500 Server Error
if an internal error occurred; response body: generic
error message (no details about the error)Reload the tables if they have been modified. This is done internally at server start. This should be auth-protected.
POST /reload_config
API token (probably just a hard-coded value in a .env file should suffice)
204 No content
if the tables were reloaded successfully; no response body.500 Server Error
on internal error.Load all translation table metadata on server startup. This is equivalent
to invoking reload_config
via REST API (see above) and is done by
scanning a designated directory containing only the translation table,
finding the metadata in the General
section, disccovering the
ScriptToRoman
and RomanToScript
sections, and storing these metadata in
a variable available to all requests.
Upon invocation of the trans
method: load the relevant configuration file
(this operation will be cached in order to save expensive parsing) and apply
the relevant ScriptToRoman
or RomanToScript
transliteration to the
provided string.
Upon invocation of the /reload_config
method: reload the table metadata
as on startup; invalidate the cache for all the configurations.