Browse Source

Add explanations to hook functions.

Stefano Cossu 2 years ago
parent
commit
94ee9ff1b1
1 changed files with 65 additions and 25 deletions
  1. 65 25
      doc/hooks.md

+ 65 - 25
doc/hooks.md

@@ -111,7 +111,8 @@ defined in the configuration.
 Each function must also return an output that the process is able to handle as
 Each function must also return an output that the process is able to handle as
 expected. the output may instruct the application to make a specific decision
 expected. the output may instruct the application to make a specific decision
 after the hook function is executed. Possible return values are defined below
 after the hook function is executed. Possible return values are defined below
-for each hook.
+for each hook. Some special return values, such as `BREAK` and `CONT`, are
+registered as constants under `transliterator.exceptions`.
 
 
 **[TODO]** These hooks are being implemented in a vacuum, without much of a
 **[TODO]** These hooks are being implemented in a vacuum, without much of a
 real-world use case. Modifications to these capabilities may change as actual
 real-world use case. Modifications to these capabilities may change as actual
@@ -122,6 +123,10 @@ challenges arise.
 This hook is run after the whole configuration is parsed and possibly merged
 This hook is run after the whole configuration is parsed and possibly merged
 with a parent configuration.
 with a parent configuration.
 
 
+This hook can be used to completely override the transliteration process by
+devising an entirely different logic and/or calling a third party library
+or REST API.
+
 #### Available context members
 #### Available context members
 
 
 - `ctx.src`: Source text. It should not be reassigned.
 - `ctx.src`: Source text. It should not be reassigned.
@@ -132,12 +137,19 @@ with a parent configuration.
 
 
 #### Return
 #### Return
 
 
-`None`
+`None` or `BREAK`. In the former case the application proceeds to the usual
+translteration process; in the latter case, it returns the value of
+`ctx.dest`, which the hook function should have set.
 
 
 ### `begin_input_token`
 ### `begin_input_token`
 
 
 This hook is run at the beginning of each iteration of the input parsing loop.
 This hook is run at the beginning of each iteration of the input parsing loop.
 
 
+Functions implemented here can be used to override the default behavior for
+each iteration of the input text scan, e.g. when special conditions must be
+applied to detect word boundaries or punctuation, or handling the interaction
+of multiple symbols based on logical rules rather than a dictionary.
+
 #### Available context members
 #### Available context members
 
 
 - `ctx.src`: Source text. It should not be reassigned.
 - `ctx.src`: Source text. It should not be reassigned.
@@ -148,17 +160,20 @@ This hook is run at the beginning of each iteration of the input parsing loop.
 
 
 #### Return
 #### Return
 
 
-Possible values are `"continue"`, `"break"`, or `None`. If `None` is returned,
-the parsing proceeds as normal. `"continue"` causes the application to skip the
-parsing of the current token. `"break"` interrupts the text scanning and
+Possible values are `CONT`, `BREAK`, or `None`. If `None` is returned,
+the parsing proceeds as normal. `CONT` causes the application to skip the
+parsing of the current token. `BREAK` interrupts the text scanning and
 proceeds directly to handling the result list for output. **CAUTION**: when
 proceeds directly to handling the result list for output. **CAUTION**: when
-returning "continue", it is the responsibility of the function to advance
+returning CONT, it is the responsibility of the function to advance
 `ctx.cur` so that the loop doesn't become an infinite one. 
 `ctx.cur` so that the loop doesn't become an infinite one. 
 
 
 ### `pre_ignore_token`
 ### `pre_ignore_token`
 
 
 Run before each ignore token is compared with the input.
 Run before each ignore token is compared with the input.
 
 
+Functions implementing this hook can change the behavior for detecting an
+ignore term and when or when not to trigger a match.
+
 #### Available context members
 #### Available context members
 
 
 - `ctx.src`: Source text. It should not be reassigned.
 - `ctx.src`: Source text. It should not be reassigned.
@@ -169,15 +184,20 @@ Run before each ignore token is compared with the input.
 
 
 #### Output
 #### Output
 
 
-`"continue"`, `"break"`, or `None`. `"continue"` skips the checks on the
-current ignore token. `"break"` stops looking up ignore tokens for the current
-position. This function can return `"continue"` without advancing the cursor and
+`CONT`, `BREAK`, or `None`. `CONT` skips the checks on the
+current ignore token. `BREAK` stops looking up ignore tokens for the current
+position. This function can return `CONT` without advancing the cursor and
 without causing an infinite loop.
 without causing an infinite loop.
 
 
 ### `on_ignore_match`
 ### `on_ignore_match`
 
 
 Run when an ignore token matches.
 Run when an ignore token matches.
 
 
+Functions implementing this hook can change the behavior of the process after
+an ignore token has matched. Actions may include skipping or redefining the
+ignore step, which by default copies the matching token verbatim and keeps
+scanning for more ignore tokens past the match.
+
 #### Available context members
 #### Available context members
 
 
 - `ctx.src`: Source text. It should not be reassigned.
 - `ctx.src`: Source text. It should not be reassigned.
@@ -191,14 +211,20 @@ Run when an ignore token matches.
 
 
 #### Output
 #### Output
 
 
-`"continue"`, `"break"`, or `None`. `"continue"` voids the match and keeps
-on looking up the ignore list. `"break"` stops looking up ignore tokens for the
+`CONT`, `BREAK`, or `None`. `CONT` voids the match and keeps
+on looking up the ignore list. `BREAK` stops looking up ignore tokens for the
 current position. See cautionary note on `begin_input_token`.
 current position. See cautionary note on `begin_input_token`.
 
 
 ### `pre_tx_token`
 ### `pre_tx_token`
 
 
 Run before comparing each transliteration token with the current text.
 Run before comparing each transliteration token with the current text.
 
 
+Functions implementing this hook can change the behavior of how a character is
+matched, e.g. by injecting additional conditions based on logical rules, which
+may take a broader context into consideration. They may also take over the
+substitution step for the current position, skip the scanning for an arbitrary
+number of characters, and/or exit the text scanning loop altogether.
+
 #### Available context member
 #### Available context member
 
 
 - `ctx.src`: Source text. It should not be reassigned.
 - `ctx.src`: Source text. It should not be reassigned.
@@ -211,14 +237,19 @@ Run before comparing each transliteration token with the current text.
 
 
 #### Output
 #### Output
 
 
-`"continue"`, `"break"`, or `None`. `"continue"` skips the checks on the
-current token. `"break"` stops looking up all tokens for the current
+`CONT`, `BREAK`, or `None`. `CONT` skips the checks on the
+current token. `BREAK` stops looking up all tokens for the current
 position. See cautionary note on `begin_input_token`.
 position. See cautionary note on `begin_input_token`.
 
 
 ### `on_tx_token_match`
 ### `on_tx_token_match`
 
 
 Run when a transliteration token matches the input.
 Run when a transliteration token matches the input.
 
 
+Functions implementing this hook can override how the transliterated
+character(s) are added to the result token list once a match is found. They can
+also inject additional conditions and logic for the match, and revoke the
+"match" status, which would prevent the transliteration step from running.
+
 #### Available context members
 #### Available context members
 
 
 - `ctx.src`: Source text. It should not be reassigned.
 - `ctx.src`: Source text. It should not be reassigned.
@@ -234,8 +265,8 @@ Run when a transliteration token matches the input.
 
 
 #### Output
 #### Output
 
 
-`"continue"`, `"break"`, or `None`. `"continue"` voids the match and keeps
-on looking up the token list. `"break"` stops looking up tokens for the
+`CONT`, `BREAK`, or `None`. `CONT` voids the match and keeps
+on looking up the token list. `BREAK` stops looking up tokens for the
 current position and effectively reports a non-match.
 current position and effectively reports a non-match.
 
 
 ### `on_no_tx_token_match`
 ### `on_no_tx_token_match`
@@ -243,6 +274,11 @@ current position and effectively reports a non-match.
 Run after all tokens for the current position have been tried and no match has
 Run after all tokens for the current position have been tried and no match has
 been found.
 been found.
 
 
+Functions implementing this hook can perform additional actions after the
+current position has not been matched by any of the available tokens. They can
+also override the default logic which is adding the single character at the
+cursor position to the destination list, verbatim.
+
 #### Available context members
 #### Available context members
 
 
 - `ctx.src`: Source text. It should not be reassigned.
 - `ctx.src`: Source text. It should not be reassigned.
@@ -253,17 +289,18 @@ been found.
 
 
 #### Output
 #### Output
 
 
-`"continue"`, `"break"`, or `None`. `"continue"` skips to the next
-position in the input text. Int his case, the function **must** advance the
-cursor. `"break"` stops all text parsing and proceeds to the assembly of the
-output.
+`CONT`, `BREAK`, or `None`. `CONT` skips to the next position in the input
+text. Int his case, the function **must** advance the cursor. `BREAK` stops all
+text parsing and proceeds to the assembly of the output.
 
 
 ### `pre_assembly`
 ### `pre_assembly`
 
 
 Run after the whole text has been scanned, before the output list is
 Run after the whole text has been scanned, before the output list is
-capitalized and assembled into a string. This function may manipulate the token
-list and/or handle the assembly itself, in which case it can return the
-assembled string and bypass any further output handling.
+capitalized and assembled into a string.
+
+Functions implementing this hook can manipulate the token list and/or handle
+the assembly itself, in which case they can return the assembled string and
+bypass any further output handling.
 
 
 #### Available context members
 #### Available context members
 
 
@@ -280,6 +317,12 @@ adjustments and assembly of the output list.
 
 
 ### `post_assembly`
 ### `post_assembly`
 
 
+Run after the output has been assembled into a string, before whitespace is
+stripped off.
+
+Functions implementing this hook can manipulate and reassign the output string,
+and return it before any further default processing is done.
+
 #### Available context members
 #### Available context members
 
 
 - `ctx.src`: Source text. It should not be reassigned.
 - `ctx.src`: Source text. It should not be reassigned.
@@ -289,9 +332,6 @@ adjustments and assembly of the output list.
 - `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.langsec`: language section (S2R or R2S) of configuration.
 - `ctx.dest`: output string.
 - `ctx.dest`: output string.
 
 
-Run after the output has been assembled into a string, before whitespace is
-stripped off.
-
 #### Output
 #### Output
 
 
 `"ret"` or `None`. If `"ret"`, the transliteration function returns `ctx.dest`
 `"ret"` or `None`. If `"ret"`, the transliteration function returns `ctx.dest`