4 Commits 7eb916f8c2 ... 9512e42b36

Author SHA1 Message Date
  scossu 9512e42b36 Merge branch 'main' into chinese 6 months ago
  Stefano Cossu fe42e40f4e Korean (#48) 7 months ago
  rrroche 77081df37f Update azerbaijani.yml (#44) 8 months ago
  rrroche 839a6836d7 Update armenian.yml (#45) 8 months ago
5 changed files with 73 additions and 42 deletions
  1. 16 0
      doc/config.md
  2. 14 32
      doc/hooks.md
  3. 28 9
      doc/rest_api.md
  4. 13 1
      scriptshifter/tables/data/armenian.yml
  5. 2 0
      scriptshifter/tables/data/azerbaijani.yml

+ 16 - 0
doc/config.md

@@ -127,6 +127,22 @@ Type: list
 A list of parents that the configuration inherits from. See "Inheritance"
 above.
 
+### `options`
+
+A list of additional options that may be passed in a request. This is
+useful when developing custom hooks. Any values are ignored by the core
+transliteration process.
+
+Each list member is an object is expected to contain the following keys that
+are used in the built-in API:
+
+- `id`: the option ID used as a HTML tag ID and as a variable name.
+- `label`: human-readable label usable in a UI.
+- `description`: description usable in a UI. Optional.
+- `type`: unused at the moment.
+- `default`: The default value that should be set for the option in a UI. Note
+  that this does not set a default value in an API call [TODO].
+
 ### `roman_to_script`
 
 Roman-to-script transliteration section. If absent, the application will raise

+ 14 - 32
doc/hooks.md

@@ -114,9 +114,18 @@ after the hook function is executed. Possible return values are defined below
 for each hook. Some special return values, such as `BREAK` and `CONT`, are
 registered as constants under `scriptshifter.exceptions`.
 
-**[TODO]** These hooks are being implemented in a vacuum, without much of a
-real-world use case. Modifications to these capabilities may change as actual
-challenges arise.
+### Always available context members
+
+The following members of the context object are available in all the hooks:
+
+- `ctx.src`: Source text. It should not be reassigned.
+- `ctx.general`: Configuration general options.
+- `ctx.langsec`: language section (S2R or R2S) of configuration.
+- `ctx.options`: language-specific options defined in configuration and set
+    at the beginning of the request.
+
+Other members are available in different hooks. See the individual hooks
+reference below.
 
 ### `post_config`
 
@@ -129,16 +138,13 @@ or REST API.
 
 #### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position. It should be 0 at this point.
 - `ctx.dest_ls`: destination token list. It should be empty at this point.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 #### Return
 
 `None` or `BREAK`. In the former case the application proceeds to the usual
-translteration process; in the latter case, it returns the value of
+transliteration process; in the latter case, it returns the value of
 `ctx.dest`, which the hook function should have set.
 
 ### `begin_input_token`
@@ -152,13 +158,10 @@ of multiple symbols based on logical rules rather than a dictionary.
 
 #### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 #### Return
 
@@ -178,13 +181,10 @@ ignore term and when or when not to trigger a match.
 
 #### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 #### Output
 
@@ -204,13 +204,10 @@ scanning for more ignore tokens past the match.
 
 #### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.tk`: matching ignore token.
 - `ctx.ignoring`: whether an ignore token matched. If set to `False`, the rest
   of the workflow will assume a non-match.
@@ -231,15 +228,12 @@ may take a broader context into consideration. They may also take over the
 substitution step for the current position, skip the scanning for an arbitrary
 number of characters, and/or exit the text scanning loop altogether.
 
-#### Available context member
+#### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.src_tk`: the input token being looked up.
 - `ctx.dest_tk`: the transliterated string associated with the current token.
 
@@ -260,14 +254,11 @@ also inject additional conditions and logic for the match, and revoke the
 
 #### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list. The matching token will be added to it
   after this hook is run.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.src_tk`: the matching input token.
 - `ctx.dest_tk`: the transliterated string to be added to the output.
 - `ctx.match`: whether there was a match. If set to `False`, the rest of the
@@ -291,13 +282,10 @@ cursor position to the destination list, verbatim.
 
 #### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 #### Output
 
@@ -316,10 +304,7 @@ bypass any further output handling.
 
 #### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 #### Output
 
@@ -337,11 +322,8 @@ and return it before any further default processing is done.
 
 #### Available context members
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.dest`: output string.
 
 #### Output

+ 28 - 9
doc/rest_api.md

@@ -45,21 +45,38 @@ Content: language configuration as a JSON object with all the transliteration
 rules as they are read by the application. If the table inherits from a parent,
 the computed values from the merged tables are shown.
 
-## `POST /transliterate/<lang>[/r2s]`
+## `GET /options/<lang>`
 
-Transliterate an input string in a given language.
+Get options available for a script.
 
 ### URI parameters
 
 - `<lang>`: Language code as given by the `/languages` endpoint. 
-- `r2s`: if appended to the URI, the transliteration is intended to be
-  Roman-to-script, and the input string should be Latin text. If not, the
-  default behavior is followed, which is interpreting the input as a script
-  in the given language, and returning the Romanized text.
+
+### Response code
+
+`200 OK`
+
+### Response body
+
+MIME type: `application/json`
+
+Content: list of options as a JSON object.
+
+## `POST /trans`
+
+Transliterate an input string into a given language.
 
 ### POST body
 
+- `lang`: Language code as given by the `/languages` endpoint. 
 - `text`: Input text to be transliterated.
+- `capitalize`: One of `first` (capitalize the first letter of the input),
+  `all` (capitalize all words separated by spaces), or null (default: apply no
+  additional capitalization). All options leave any existing capitalization
+  unchanged.
+- `t_dir`: Direction of the transliteration or transcription: either `s2r`
+  (default: script to Roman) or `r2s` (Roman to script).
 
 ### Response code
 
@@ -69,7 +86,9 @@ Transliterate an input string in a given language.
 
 ### Response body
 
-MIME Type: `text/plain`
+MIME Type: `application/json`
 
-Content: transliterated string. Characters not found in the mapping are copied
-verbatim (see "Configuration files" section for more information).
+Content: JSON object containing two keys: `ouput` containing the transliterated
+string; and `warnings` containing a list of warnings. Characters not found in
+the mapping are copied verbatim in the transliterated string (see
+"Configuration files" section for more information).

+ 13 - 1
scriptshifter/tables/data/armenian.yml

@@ -146,6 +146,12 @@ roman_to_script:
     "F": "\u0556"
     "f": "\u0586"
     "\u02B9": ""
+    #Punctuation
+    ",": "\u055D"
+    ".": "\u0589"
+    "!": "\u055C"
+    "?": "\u055E"
+    "-": "\u058A"
 
 script_to_roman:
   map:
@@ -201,6 +207,7 @@ script_to_roman:
     "\u0565\u057E": "ev"
     "\u0535": "E"
     "\u0565": "e"
+    "\u0587": "ev"
     # T uppercase with ayn
     "\u0539": "T\u02BB"
     # t lowercase with ayn
@@ -287,4 +294,9 @@ script_to_roman:
     "\u0582": "w"
     "\u0556": "F"
     "\u0586": "f"
-
+    #Punctuation
+    "\u055D": ","
+    "\u0589": "."
+    "\u055C": "!"
+    "\u055E": "?"
+    "\u058A": "-"

+ 2 - 0
scriptshifter/tables/data/azerbaijani.yml

@@ -61,3 +61,5 @@ script_to_roman:
     "\u04B9": "j"
     "\u042B": "Y"
     "\u044B": "y"
+    "\u0259": "a\u0306"
+    "\u018F": "A\u0306"