Browse Source

Multiple recursive inheritance.

Stefano Cossu 1 year ago
parent
commit
703d2eccdc

+ 4 - 1
TODO.md

@@ -5,7 +5,10 @@ discussion, etc.); *X* = not implementing.
 
 - *D* Basic table loading & parsing
 - *D* Table inheritance
-- *P* Multiple inheritance (not recursive)
+- *W* Multiple recursive inheritance
+  - *D* Inherit map
+  - *D* Inherit ignore
+  - *P* Inherit hooks
 - *D* Ignore list (R2S)
 - *D* Basic transliteration in both directions
 - *D* Basic REST API

+ 21 - 5
doc/config.md

@@ -30,13 +30,10 @@ testing purposes. See below for more details about inhritance.
 
 ## Inheritance
 
-A configuration file may inherit rules from another file. Currently, only one
-level of inheritance is allowed (i.e. a table can only inherit from another
-table, and no further lookup is done if the other table inherits from yet
-another one).
+A configuration file may inherit rules from one or more other files.
 
 Inheritance means that, for each section (`script_to_roman` and
-`roman_to_script`) in the parent table, the child table uses all the rule
+`roman_to_script`) in the parent table, the child table uses all the rules
 found in that section, and may add to or replace them.  This is used for
 Cyrillic languages for example, which share a broad base of common characters,
 but each language has its own variations on certain characters, or adds
@@ -44,6 +41,12 @@ characters that are not present in other languages.
 
 This has the obvious advantage of avoiding repetition and copying entire tables
 for just slight variations of each language.
+ 
+ The `parent` key indicates a list of tables that the current table inherits
+ from.  Inheritance is recursive, i.e. if table A inherits from B and B from C,
+ table A presents the combined results of the three tables. If multiple parents
+ are specified, the ones listed later override the earlier ones. The child
+ values override all the parents'.
 
 Overriding of transliteration rules is applied on the left-hand side of
 the mapping. I.e., if a parent table has the following rules: 
@@ -76,6 +79,12 @@ to most languages, and the few exceptions can be overridden in the relevant
 specific tables. It is up to the language table maintainer to decide how to
 organize these rules.
 
+Elements that are inherited in a configuration are:
+
+- Transliteration maps (both S2R and R2S)
+- Ignore lists
+- Hooks [TODO]
+
 
 ## Configuration file structure
 
@@ -111,6 +120,13 @@ Informational field containing notes, mostly aimed at maintainers. The
 application doesn't use this field. For information meant for the end  user,
 use the `description` field in the index file.
 
+#### `general.parents`
+
+Type: list
+
+A list of parents that the configuration inherits from. See "Inheritance"
+above.
+
 ### `roman_to_script`
 
 Roman-to-script transliteration section. If absent, the application will raise

+ 31 - 26
transliterator/tables/__init__.py

@@ -98,7 +98,7 @@ def list_tables():
 @cache
 def load_table(tname):
     """
-    Load one transliteration table and possible parent.
+    Load one transliteration table and possible parents.
 
     The table file is parsed into an in-memory configuration that contains
     the language & script metadata and parsing rules.
@@ -112,20 +112,21 @@ def load_table(tname):
         tdata = load(fh, Loader=Loader)
 
     # NOTE Only one level of inheritance. No need for recursion for now.
-    parent = tdata.get("general", {}).get("inherits", None)
-    if parent:
-        parent_tdata = load_table(parent)
+    parents = tdata.get("general", {}).get("parents", [])
 
     if "script_to_roman" in tdata:
-        tokens = {
-                Token(k): v
-                for k, v in tdata["script_to_roman"].get("map", {}).items()}
-        if parent:
-            # Merge (and override) parent values.
-            tokens = {
+        tokens = {}
+        for parent in parents:
+            parent_tdata = load_table(parent)
+            # Merge parent tokens. Child overrides parents, and a parent listed
+            # later override ones listed earlier.
+            tokens |= {
                 Token(k): v for k, v in parent_tdata.get(
                         "script_to_roman", {}).get("map", {})
-            } | tokens
+            }
+        tokens |= {
+                Token(k): v
+                for k, v in tdata["script_to_roman"].get("map", {}).items()}
         tdata["script_to_roman"]["map"] = tuple(
                 (k.content, tokens[k]) for k in sorted(tokens))
 
@@ -134,29 +135,33 @@ def load_table(tname):
                     tname, tdata["script_to_roman"])
 
     if "roman_to_script" in tdata:
-        tokens = {
-                Token(k): v
-                for k, v in tdata["roman_to_script"].get("map", {}).items()}
-        if parent:
-            # Merge (and override) parent values.
-            tokens = {
+        tokens = {}
+        for parent in parents:
+            parent_tdata = load_table(parent)
+            # Merge parent tokens. Child overrides parents, and a parent listed
+            # later override ones listed earlier.
+            tokens |= {
                 Token(k): v for k, v in parent_tdata.get(
                         "roman_to_script", {}).get("map", {})
-            } | tokens
+            }
+        tokens |= {
+            Token(k): v
+            for k, v in tdata["roman_to_script"].get("map", {}).items()
+        }
         tdata["roman_to_script"]["map"] = tuple(
                 (k.content, tokens[k]) for k in sorted(tokens))
 
-        if parent:
-            p_ignore = {
-                    Token(t) for t in parent_tdata.get(
-                            "roman_to_script", {}).get("ignore", [])}
-        else:
-            p_ignore = set()
-
         ignore = {
             Token(t)
             for t in tdata["roman_to_script"].get("ignore", [])
-        } | p_ignore
+        }
+        for parent in parents:
+            parent_tdata = load_table(parent)
+            # No overriding occurs with the ignore list, only de-duplication.
+            ignore |= {
+                Token(t) for t in parent_tdata.get(
+                        "roman_to_script", {}).get("ignore", [])
+            }
 
         tdata["roman_to_script"]["ignore"] = [
                 t.content for t in sorted(ignore)]

+ 46 - 0
transliterator/tables/data/_test_base1.yml

@@ -0,0 +1,46 @@
+# Test file for inheritance.
+
+general:
+  name: Test inheritance base 1
+
+roman_to_script:
+  ignore:
+    - "Ho"
+
+  map:
+    "A": "a"
+    "B": "b"
+    "C": "c"
+    "D": "d"
+    "E": "e"
+    "F": "f"
+
+script_to_roman:
+  map:
+    "a": "A"
+    "b": "B"
+    "c": "C"
+    "d": "D"
+    "e": "E"
+    "f": "F"
+    "g": "G"
+    "h": "H"
+    "i": "I"
+    "j": "J"
+    "k": "K"
+    "l": "L"
+    "m": "M"
+    "n": "N"
+    "o": "O"
+    "p": "P"
+    "q": "Q"
+    "r": "R"
+    "s": "S"
+    "t": "T"
+    "u": "U"
+    "v": "V"
+    "w": "W"
+    "x": "X"
+    "y": "Y"
+    "z": "Z"
+

+ 25 - 0
transliterator/tables/data/_test_base2.yml

@@ -0,0 +1,25 @@
+# Test file for inheritance.
+
+general:
+  name: Test inheritance base 2
+  parents:
+    - _test_base1
+
+roman_to_script:
+  ignore:
+    - "Hi"
+
+  map:
+    "G": "g"
+    "H": "h"
+    "I": "i"
+    "J": "j"
+    "K": "k"
+    "L": "l"
+
+script_to_roman:
+  map:
+    "a": "1"
+    "b": "2"
+    "c": "3"
+    "d": "4"

+ 24 - 0
transliterator/tables/data/_test_base3.yml

@@ -0,0 +1,24 @@
+# Test file for inheritance.
+
+general:
+  name: Test inheritance base 3
+
+roman_to_script:
+  ignore:
+    - "Fritter my wig"
+
+  map:
+    "M": "m"
+    "N": "n"
+    "O": "o"
+    "P": "p"
+    "Q": "q"
+    "R": "r"
+    "S": "s"
+
+script_to_roman:
+  map:
+    "a": "5"
+    "b": "6"
+    "c": "7"
+    "d": "8"

+ 25 - 0
transliterator/tables/data/_test_inherited.yml

@@ -0,0 +1,25 @@
+# Test file for inheritance.
+
+general:
+  name: Test inheritance leaf file
+  parents:
+    - _test_base2
+    - _test_base3
+
+roman_to_script:
+  ignore:
+    - "Thing-um-a-jig"
+
+  map:
+    "T": "t"
+    "U": "u"
+    "V": "v"
+    "W": "w"
+    "X": "x"
+    "Y": "y"
+    "Z": "z"
+
+script_to_roman:
+  map:
+    "a": "9"
+    "b": "0"

+ 2 - 1
transliterator/tables/data/belarusian.yml

@@ -1,6 +1,7 @@
 general:
   name: Belorusian
-  inherits: _cyrillic_base
+  parents:
+    - _cyrillic_base
 
 roman_to_script:
   map:

+ 2 - 1
transliterator/tables/data/bulgarian.yml

@@ -1,6 +1,7 @@
 general:
   name: Bulgarian
-  inherits: _cyrillic_base
+  parents:
+    - _cyrillic_base
 
 roman_to_script:
   map:

+ 2 - 1
transliterator/tables/data/church_slavonic.yml

@@ -1,6 +1,7 @@
 general:
   name: Church Slavonic
-  inherits: _cyrillic_base
+  parents:
+    - _cyrillic_base
 
 roman_to_script:
   map:

+ 2 - 1
transliterator/tables/data/russian.yml

@@ -1,6 +1,7 @@
 general:
   name: Russian
-  inherits: _cyrillic_base
+  parents:
+    - _cyrillic_base
 
 roman_to_script:
   map:

+ 2 - 1
transliterator/tables/data/serbian_macedonian.yml

@@ -1,6 +1,7 @@
 general:
   name: Serbian and Macedonian
-  inherits: _cyrillic_base
+  parents:
+    - _cyrillic_base
 
 roman_to_script:
   map:

+ 2 - 1
transliterator/tables/data/ukrainian.yml

@@ -1,6 +1,7 @@
 general:
   name: Ukrainian
-  inherits: _cyrillic_base
+  parents:
+    - _cyrillic_base
 
 roman_to_script:
   map: