Browse Source

Merge branch 'main' into chinese

scossu 6 months ago
parent
commit
9512e42b36
5 changed files with 73 additions and 42 deletions
  1. 16 0
      doc/config.md
  2. 14 32
      doc/hooks.md
  3. 28 9
      doc/rest_api.md
  4. 13 1
      scriptshifter/tables/data/armenian.yml
  5. 2 0
      scriptshifter/tables/data/azerbaijani.yml

+ 16 - 0
doc/config.md

@@ -127,6 +127,22 @@ Type: list
 A list of parents that the configuration inherits from. See "Inheritance"
 A list of parents that the configuration inherits from. See "Inheritance"
 above.
 above.
 
 
+### `options`
+
+A list of additional options that may be passed in a request. This is
+useful when developing custom hooks. Any values are ignored by the core
+transliteration process.
+
+Each list member is an object is expected to contain the following keys that
+are used in the built-in API:
+
+- `id`: the option ID used as a HTML tag ID and as a variable name.
+- `label`: human-readable label usable in a UI.
+- `description`: description usable in a UI. Optional.
+- `type`: unused at the moment.
+- `default`: The default value that should be set for the option in a UI. Note
+  that this does not set a default value in an API call [TODO].
+
 ### `roman_to_script`
 ### `roman_to_script`
 
 
 Roman-to-script transliteration section. If absent, the application will raise
 Roman-to-script transliteration section. If absent, the application will raise

+ 14 - 32
doc/hooks.md

@@ -114,9 +114,18 @@ after the hook function is executed. Possible return values are defined below
 for each hook. Some special return values, such as `BREAK` and `CONT`, are
 for each hook. Some special return values, such as `BREAK` and `CONT`, are
 registered as constants under `scriptshifter.exceptions`.
 registered as constants under `scriptshifter.exceptions`.
 
 
-**[TODO]** These hooks are being implemented in a vacuum, without much of a
-real-world use case. Modifications to these capabilities may change as actual
-challenges arise.
+### Always available context members
+
+The following members of the context object are available in all the hooks:
+
+- `ctx.src`: Source text. It should not be reassigned.
+- `ctx.general`: Configuration general options.
+- `ctx.langsec`: language section (S2R or R2S) of configuration.
+- `ctx.options`: language-specific options defined in configuration and set
+    at the beginning of the request.
+
+Other members are available in different hooks. See the individual hooks
+reference below.
 
 
 ### `post_config`
 ### `post_config`
 
 
@@ -129,16 +138,13 @@ or REST API.
 
 
 #### Available context members
 #### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position. It should be 0 at this point.
 - `ctx.cur`: cursor position. It should be 0 at this point.
 - `ctx.dest_ls`: destination token list. It should be empty at this point.
 - `ctx.dest_ls`: destination token list. It should be empty at this point.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 
 #### Return
 #### Return
 
 
 `None` or `BREAK`. In the former case the application proceeds to the usual
 `None` or `BREAK`. In the former case the application proceeds to the usual
-translteration process; in the latter case, it returns the value of
+transliteration process; in the latter case, it returns the value of
 `ctx.dest`, which the hook function should have set.
 `ctx.dest`, which the hook function should have set.
 
 
 ### `begin_input_token`
 ### `begin_input_token`
@@ -152,13 +158,10 @@ of multiple symbols based on logical rules rather than a dictionary.
 
 
 #### Available context members
 #### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 
 #### Return
 #### Return
 
 
@@ -178,13 +181,10 @@ ignore term and when or when not to trigger a match.
 
 
 #### Available context members
 #### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 
 #### Output
 #### Output
 
 
@@ -204,13 +204,10 @@ scanning for more ignore tokens past the match.
 
 
 #### Available context members
 #### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.tk`: matching ignore token.
 - `ctx.tk`: matching ignore token.
 - `ctx.ignoring`: whether an ignore token matched. If set to `False`, the rest
 - `ctx.ignoring`: whether an ignore token matched. If set to `False`, the rest
   of the workflow will assume a non-match.
   of the workflow will assume a non-match.
@@ -231,15 +228,12 @@ may take a broader context into consideration. They may also take over the
 substitution step for the current position, skip the scanning for an arbitrary
 substitution step for the current position, skip the scanning for an arbitrary
 number of characters, and/or exit the text scanning loop altogether.
 number of characters, and/or exit the text scanning loop altogether.
 
 
-#### Available context member
+#### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.src_tk`: the input token being looked up.
 - `ctx.src_tk`: the input token being looked up.
 - `ctx.dest_tk`: the transliterated string associated with the current token.
 - `ctx.dest_tk`: the transliterated string associated with the current token.
 
 
@@ -260,14 +254,11 @@ also inject additional conditions and logic for the match, and revoke the
 
 
 #### Available context members
 #### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list. The matching token will be added to it
 - `ctx.dest_ls`: destination token list. The matching token will be added to it
   after this hook is run.
   after this hook is run.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.src_tk`: the matching input token.
 - `ctx.src_tk`: the matching input token.
 - `ctx.dest_tk`: the transliterated string to be added to the output.
 - `ctx.dest_tk`: the transliterated string to be added to the output.
 - `ctx.match`: whether there was a match. If set to `False`, the rest of the
 - `ctx.match`: whether there was a match. If set to `False`, the rest of the
@@ -291,13 +282,10 @@ cursor position to the destination list, verbatim.
 
 
 #### Available context members
 #### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur`: cursor position.
 - `ctx.cur_flags`: flags associated with the current position. They are reset
 - `ctx.cur_flags`: flags associated with the current position. They are reset
   at every character iteration. See "Cursor Flags" below.
   at every character iteration. See "Cursor Flags" below.
 - `ctx.dest_ls`: destination token list.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 
 #### Output
 #### Output
 
 
@@ -316,10 +304,7 @@ bypass any further output handling.
 
 
 #### Available context members
 #### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.dest_ls`: destination token list.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 
 
 #### Output
 #### Output
 
 
@@ -337,11 +322,8 @@ and return it before any further default processing is done.
 
 
 #### Available context members
 #### Available context members
 
 
-- `ctx.src`: Source text. It should not be reassigned.
 - `ctx.cur`: cursor position.
 - `ctx.cur`: cursor position.
 - `ctx.dest_ls`: destination token list.
 - `ctx.dest_ls`: destination token list.
-- `ctx.general`: Configuration general options.
-- `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.dest`: output string.
 - `ctx.dest`: output string.
 
 
 #### Output
 #### Output

+ 28 - 9
doc/rest_api.md

@@ -45,21 +45,38 @@ Content: language configuration as a JSON object with all the transliteration
 rules as they are read by the application. If the table inherits from a parent,
 rules as they are read by the application. If the table inherits from a parent,
 the computed values from the merged tables are shown.
 the computed values from the merged tables are shown.
 
 
-## `POST /transliterate/<lang>[/r2s]`
+## `GET /options/<lang>`
 
 
-Transliterate an input string in a given language.
+Get options available for a script.
 
 
 ### URI parameters
 ### URI parameters
 
 
 - `<lang>`: Language code as given by the `/languages` endpoint. 
 - `<lang>`: Language code as given by the `/languages` endpoint. 
-- `r2s`: if appended to the URI, the transliteration is intended to be
-  Roman-to-script, and the input string should be Latin text. If not, the
-  default behavior is followed, which is interpreting the input as a script
-  in the given language, and returning the Romanized text.
+
+### Response code
+
+`200 OK`
+
+### Response body
+
+MIME type: `application/json`
+
+Content: list of options as a JSON object.
+
+## `POST /trans`
+
+Transliterate an input string into a given language.
 
 
 ### POST body
 ### POST body
 
 
+- `lang`: Language code as given by the `/languages` endpoint. 
 - `text`: Input text to be transliterated.
 - `text`: Input text to be transliterated.
+- `capitalize`: One of `first` (capitalize the first letter of the input),
+  `all` (capitalize all words separated by spaces), or null (default: apply no
+  additional capitalization). All options leave any existing capitalization
+  unchanged.
+- `t_dir`: Direction of the transliteration or transcription: either `s2r`
+  (default: script to Roman) or `r2s` (Roman to script).
 
 
 ### Response code
 ### Response code
 
 
@@ -69,7 +86,9 @@ Transliterate an input string in a given language.
 
 
 ### Response body
 ### Response body
 
 
-MIME Type: `text/plain`
+MIME Type: `application/json`
 
 
-Content: transliterated string. Characters not found in the mapping are copied
-verbatim (see "Configuration files" section for more information).
+Content: JSON object containing two keys: `ouput` containing the transliterated
+string; and `warnings` containing a list of warnings. Characters not found in
+the mapping are copied verbatim in the transliterated string (see
+"Configuration files" section for more information).

+ 13 - 1
scriptshifter/tables/data/armenian.yml

@@ -146,6 +146,12 @@ roman_to_script:
     "F": "\u0556"
     "F": "\u0556"
     "f": "\u0586"
     "f": "\u0586"
     "\u02B9": ""
     "\u02B9": ""
+    #Punctuation
+    ",": "\u055D"
+    ".": "\u0589"
+    "!": "\u055C"
+    "?": "\u055E"
+    "-": "\u058A"
 
 
 script_to_roman:
 script_to_roman:
   map:
   map:
@@ -201,6 +207,7 @@ script_to_roman:
     "\u0565\u057E": "ev"
     "\u0565\u057E": "ev"
     "\u0535": "E"
     "\u0535": "E"
     "\u0565": "e"
     "\u0565": "e"
+    "\u0587": "ev"
     # T uppercase with ayn
     # T uppercase with ayn
     "\u0539": "T\u02BB"
     "\u0539": "T\u02BB"
     # t lowercase with ayn
     # t lowercase with ayn
@@ -287,4 +294,9 @@ script_to_roman:
     "\u0582": "w"
     "\u0582": "w"
     "\u0556": "F"
     "\u0556": "F"
     "\u0586": "f"
     "\u0586": "f"
-
+    #Punctuation
+    "\u055D": ","
+    "\u0589": "."
+    "\u055C": "!"
+    "\u055E": "?"
+    "\u058A": "-"

+ 2 - 0
scriptshifter/tables/data/azerbaijani.yml

@@ -61,3 +61,5 @@ script_to_roman:
     "\u04B9": "j"
     "\u04B9": "j"
     "\u042B": "Y"
     "\u042B": "Y"
     "\u044B": "y"
     "\u044B": "y"
+    "\u0259": "a\u0306"
+    "\u018F": "A\u0306"