Browse Source

Set up tests.

Stefano Cossu 1 year ago
parent
commit
356d3b5c23

+ 6 - 2
TODO.md

@@ -21,11 +21,15 @@ discussion, etc.); *X* = not implementing.
 - *B* Optimize token lookup
   - *D* Break loop early based on alphabetical order
   - *B* Ignore word break characters
+  - *P* Capitalization
+    - *P* Separate capitalization function
+    - *P* Capitalize ligated letters (e.g. Cyrillic T͡͡S)
+    - *P* Option for capitalizing first word, all words, none, unchanged
 - *D* API documentation
 - *D* Config file documentation
 - *D* Hooks documentation
-- *P* Tests
-  - *P* Config parsing
+- *W* Tests
+  - *W* Config parsing
   - *P* Transliteration
   - *P* REST API
 - *W* Complete conversion of existing tables to YAML

+ 4 - 0
tests/__init__.py

@@ -0,0 +1,4 @@
+from os import path
+
+TEST_DIR = path.dirname(path.realpath(__file__))
+TEST_CONFIG_DIR = path.join(TEST_DIR, "data")

+ 0 - 0
transliterator/tables/data/_test_base1.yml → tests/data/_test_base1.yml


+ 0 - 0
transliterator/tables/data/_test_base2.yml → tests/data/_test_base2.yml


+ 0 - 0
transliterator/tables/data/_test_base3.yml → tests/data/_test_base3.yml


+ 14 - 0
tests/data/ordering.yml

@@ -0,0 +1,14 @@
+# Test configuration for token ordering.
+general:
+  name: Token ordering test
+
+roman_to_script:
+  map:
+    "B": ""
+    "BCD": ""
+    "BCDE": ""
+    "BEFGH": ""
+    "A": ""
+    "AB": ""
+    "ABCD": ""
+

+ 0 - 0
transliterator/tables/data/rot3.yml → tests/data/rot3.yml


+ 0 - 0
transliterator/tables/data/_test_inherited.yml → tests/data/test_inherited.yml


+ 105 - 0
tests/data/transliterator_sample_strings.csv

@@ -0,0 +1,105 @@
+Language,Script,Table key (if implemented),Original ,Romanized,Reading dir.,Comments
+Arabic,Arabic,,نظام الحكم في عمان : من إمامة الإنتخاب الى السلطنة الوراثية,Niẓām al-ḥukm fī ʻUmān : min imāmat al-intikhāb ilá al-salṭanah al-wirāthīyah ,R-L,Hans Wehr's Dictionary for modern written Arabic is the current reference used for proper vocalization
+Arabic,Arabic,,ندوة علاقات مصر بدول حوض النيل في ظل رئاسة مصر للاتحاد الإفريقي‏,Nadwat ʻAlāqāt Miṣr bi-Duwal Ḥawḍ al-Nīl fī ẓill Riʼāsat Miṣr lil-Ittiḥād al-Ifrīqī,R-L ,
+Arabic,Arabic,,تهذيب البيان والجمع في الفرق بين التكليف والوضع,Tahdhīb al-bayān wa-al-jamʻ fī al-farq bayna al-taklīf wa-al-waḍʻ,R-L ,
+ABAZIN,Cyrillic,,,,L-R ,
+ABKHAZ,Cyrillic,,,,L-R ,
+ADYGEI,Cyrillic,,,,L-R ,
+ALTAY,Cyrilllic,,,,L-R ,
+Armenian,Armenian,armenian,Մեդիա իրավունք : (ուսումնական ձեռնարկ),Media iravunkʻ : (usumnakan dzeṛnark),L-R ,
+Assamese,Assamese,,আগবাৰীত  ফুলিলে  সোনে  মোৰ  চম্পা,Āgabārīta phulile soṇe mora campā,R-L ,
+AVARIC,Cyrillic,,,,L-R ,
+Azerbaijani (North),Latin,,Milli dövlətçilik hərəkatının yüksəlişi və Xalq Cümhuriyyəti dövründə Azərbaycançılıq ideyası,Milli dövlätçilik häräkatının yüksälişi vä Xalq Cümhuriyyäti dövründä azärbaycançılıq ideyası ,L-R ,
+Azerbaijani (South),Arabic,,مجنون مجنون دوشون منى  شعر توپلوسو ,Macnūn macnūn düşün manī : şiʻr toplūsū,L-R ,
+Azerbaijani ,Cyrillic,,Ҝениш коммунизм гуруҹулуғу дөврүндә Азәрбајҹан тарихинин бәьзи мәсәләләринә даир С. Ағамалы Оғлу адына Азәрбајҹан Кәнд Тәсәррүфаты Институтунун Низами адына Кировабад Дөвләт Тарих-Өлкәшунаслыг Музеји илә бирҝә кечирәҹәји елми конфрансын материаллары,Ġenish kommunizm gurujulughu dȯvru̇ndă Azărbai̐jan tarikhinin băʹzi măsălălărină dair S. Aghamaly Oghlu adyna Azărbai̐jan Kănd Tăsărru̇faty Institutunun Nizami adyna Kirovabad Dȯvlăt Tarikh-Ȯlkăshunaslyg Muzei̐i ilă birġă kechirăjăi̐i elmi konfransyn materiallary,L-R ,
+BALKAR,Cyrillic,,,,L-R ,
+Baluchi,Arabic,,درداں گریتگ زار جتک,Dardān̲ grītag zār jatak,R-L ,
+BASHKIR,Cyrillic,,,,L-R ,
+Belarusian,Cyrillic,belarusian,Пётр Клімук : жыццё і подзвіг касманаўта,Pi︠o︡tr Klimuk : z︠h︡ytstsi︠o︡ i podzvih kasmanaŭta,L-R ,
+Bengali,Bengali,,উনিশ-বিশ শতকে  পুরোনো  ঢাকার  সমাজ  ও  সংষ্কৃতি  ,Uniśa-Biśa śatake purono Ḍhākāra samāja o saṃskr̥ti ,R-L ,
+Brahui,Arabic,,پام کروسن,Pām karosan,R-L ,
+Bulgarian,Cyrillic,bulgarian,Нова книга за руската емиграция в България,Nova kniga za ruskata emigrat︠s︡ii︠a︡ v Bŭlgarii︠a︡,L-R ,
+Buryat,Cyrillic,,"Хоёр үндэрэй хормойдо : очеркнууд, публицистическе статьянууд = У подножия двух ундуров / Бата-Мүнхэ Жигжитов.","Khoër u̇ndėrėĭ khormoĭdo : ocherknuud, publit︠s︡isticheske statʹi︠a︡nuud  = U podnozhii︠a︡ dvukh undurov / Bata-Mu̇nkhė Zhigzhitov.",L-R ,
+Burmese,Burmese,,ရခိုင်မဟာရာဇဝင်တော်ကြီး,Rakhuiṅʻ mahā rājavaṅʻ toʻ krīʺ,L-R ,
+Central Asian languages,Cyrillic,,,,L-R ,
+CHECHEN,Cyrillic,,,,L-R ,
+Chinese,Hanzi,chinese,撞倒須彌 : 漢傳佛教青年學者論壇論文集,Zhuang dao Xumi : Han chuan Fo jiao qing nian xue zhe lun tan lun wen ji ,L-R ,
+CHUVASH,Cyrillic,,,,L-R ,
+Church Slavonic,Cyrillic,,,,L-R ,[SC] Placeholder: need samples for testing 
+CIRCASSIAN,Cyrillic,,,,L-R ,
+DAGESTANI,Cyrillic,,,,L-R ,
+DARGWA,Cyrillic,,,,L-R ,
+Ethiopic,Amharic,,,,,[SC] Placeholder: need samples for testing 
+GAGAUZ,Cyrillic,,,,L-R ,
+Georgian,"Asomtavruli, Nuskhuri, Mkhedruli",georgian,ადგილობრივი თვითმმართველობის კოდექსი : საქართველოს ორგანული კანონი; 2018 წლის 7 სექტებრის მდგომარეობით.,Adgilobrivi tʻvitʻmmartʻvelobis kodekʻsi : Sakʻartʻvelos organuli kanoni; 2018 clis 7 sekʻtembris mdgomareobitʻ.,,Modern Georgian is really only written in the mkhedruli script. The other two scripts are its historical predecessors. 
+Greek (Ancient),Greek,greek,καὶ ἀπεγαλάκτισεν τὴν Οὐκ-ἠλεημένην καὶ συνέλαβεν ἔτι καὶ ἔτεκεν υἱόν,kai apegalaktisen tēn ouk ēleēmenēn kai synelaben eti kai eteken huion ,L-R ,
+Greek (Modern),Greek,,"Η ΑΕΚ θα καλύψει όλο το συμβόλαιο του Μεξικανού παίχτη, πολλά χρήματα δηλαδή","Hē AEK tha kalypsei holo to symvolaio tou Mexikanou paichtē, polla chrēmata dēladē",L-R ,
+Gujarati,Gujarati,,વીરપસલી અને અન્ય વાર્તાઓ,Vīrapasalī ane anya vārtāo,L-R ,
+Hebrew,Hebrew,,אבות לבנים,Avot le-vanim,R-L ,
+Hebrew,Hebrew with the diacritics in Roman,,בנוסח עדות המזרח ונוסח אשכנז,be-nusaḥ ʻadot ha-Mizraḥ ṿe-nusaḥ Ashkenaz,R-L ,
+Hindi,Devanagari,,परमहंस की पीड़ा : महान क्रांतिकारी रामप्रसाद बिस्मिल के जीवन पर आधारित उपन्यास,Paramahaṃsa kī pīṛā : mahāna krāntikārī Rāmaprasāda Bismila ke jīvana para ādhārita upanyāsa ,,"There are several other dialects of Hindi language as well as Rajasthani language and its dialects, all are written in Devanagari script."
+INGUSH,Cyrillic,,,,L-R ,
+Japanese,"Hiragana, Katakana, Kanji(Chinese character)",,小学校における包括的自己成長プログラムの開発,Shōgakkō ni okeru hōkatsuteki jiko seichō puroguramu no kaihatsu ,L-R ,
+KABARDIAN,Cyrillic,,,,L-R ,
+KALMYK,Cyrillic,,,,L-R ,
+Kannada,Kannada,,ಹರಪನಹಳ್ಳಿ  ಭೀಮವ್ವನವರ  ಕೀರ್ತನೆಗಳು  ,Harapanahaḷḷi Bhīmavvanavara kīrtanegaḷu,L-R ,
+KARACAY-BALKAR,Cyrillic,,,,L-R ,
+KARAKALPAK,Cyrillic,,,,L-R ,
+Kazakh,Cyrillic/moving to Latin,,"Дәуірдің жарық жұлдызы : ‡b халқымыздың көрнекті саяси қайраткері М. Тынышбаевқа арналады / ‡c [бас редакторлары, Қ.С. Алдажұманов, Д.М. Тынышбаев (Шейх-Али)].
+","Dăuīrdīn︠g︡ zharyq zhūldyzy : khalqymyzdyn︠g︡ kȯrnektī sai︠a︡si qaĭratkerī M. Tynyshbaevqa arnalady / [bas redaktorlary, Q.S. Aldazhūmanov, D.M. Tynyshbaev (Sheĭkh-Ali)].",L-R ,
+KHAKAS,Cyrillic,,,,L-R ,
+KOMI/KIMI-PERMYAK,Cyrillic,,,,L-R ,
+Konkani,Devanagari,,श्रीज्ञानेश्वर : अलोकीक व्यक्तीमत्व ,Śrījñāneśvara : alokīka vyaktīmatva ,L-R ,
+Konkani,Kannada,,ಚಂದ್ರ ಅನಿ ತಾರಾಂ,Candr ani tārāṃ,L-R ,
+Korean,Hangul,,민주화 이후 국정 운영,Minjuhwa ihu kukchŏng unyŏng,L-R ,
+Korean,Hancha only,,曉城 趙 明基 博士 追慕 佛教 史學 論文集,Hyosŏng Cho Myŏng-gi Paksa ch'umo Pulgyo sahak nonmunjip,,Not Chinese
+Korean ,Hangul +Hancha,,民法 과 法學 의 重要 問題,Minpŏp kwa pŏphak ŭi chungyo munje,,Not Chinese
+KUMYK,Cyrillic,,,,,
+Kurdish (Kurmanji),Cyrillic,,Ә'франдинед нвиск'аред к'öрдед Әрмәнистанейә Советие,E'frandinêd nvîsk'arêd k'urdêd Ermenîstanêye Sovêtiê,L-R ,
+Kurdish (Sorani),Arabic,,کەس خۆى بۆ تەرک ناکرێ,Kes xoy bo terk nakrê,R-L ,
+Kyrgyz,Cyrillic,,"Uchkul sȯzdȯr, chechen sȯzdȯr, tamsilder, myskyldar ",Kyrgyzstandyn tarykhy : baĭyrky mezgilden bukungu kungȯ cheĭin : u̇ch tomduk / bashky red. A. Dzhumanaliev [and nine others].,L-R ,
+LAK,Cyrillic,,,,L-R ,
+Lao,Lao,,ປະຫວັດສາດປະເທດລາວແລະວັດທະນະທຳ,Pavatsāt Pathēt Lāo læ vatthanatham,L-R ,
+LEZGIAN,Cyrillic,,,,L-R ,
+Macedonian,Cyrillic,serbian_macedonian,Облици на моќ : вистината за Македонија / Георги (Џорџ) Бранов,Oblici na moḱ : vistinata za Makedonija / Georgi (D︠ž︡ord︠ž︡) Branov,L-R ,[SC] Same table as Serbian.
+Malayalam,Malayalam,,കേരളപാണിനീയം,Kēralapāninīyam,L-R ,
+Marathi,Devanagari,,निवडक शाहीर अमरशेख ,Nivaḍaka Śāhīra Amaraśekha,L-R ,
+MARI,Cyrillic,,,,L-R ,
+?,Gurmukhi,,ਪੰਜਾਬੀ ਲੋਕ-ਸਾਹਿਤ ਵਿਚ ਸੈਨਿਕ,Pañjābī loka-sāhita wica sainika,R-L ,
+Mongolian,Cyrillic,,Дайчин гүрний үеийн олон хэлний үсэг хавсарсан сурвалж бичгийн судлал.Тываның төөгүзү / Салчак Тока. Лодон багшын дэбтэрһээ.,Daĭchin gu̇rniĭ u̇eiĭn olon khėlniĭ u̇sėg khavsarsan survalzh bichgiĭn sudlal. Tyvanyn︠g︡ tȯȯgu̇zu̇ / Salchak Toka. Lodon bagshyn dėbtėrḣėė.,L-R ,
+Mongolian,Mongolian,,ᠳᠠᠶᠢᠴᠢᠩ ᠭᠦᠷᠦᠨ ᠦ ᠦᠶ ᠡ ᠶᠢᠨ ᠥᠯᠠᠨ ᠺᠡᠯᠡᠨ ᠦ ᠦᠰᠦᠭ ᠬᠠᠪᠰᠸᠷᠸᠭᠰᠠᠨ ᠰᠸᠷᠪᠸᠯᠵᠢ ᠪᠢᠴᠢᠭ ᠦᠨ ᠰᠸᠳᠸᠯᠸᠯ,Dayicing gu̇ru̇n-u̇ u̇y-e-yin olan kelen-u̇ u̇su̇g qabsuruġsan surbulji bicig-u̇n sudulul,L-R ,Originally T-D Script but displayed as L-R
+MORDVIN,Cyrillic,,,,L-R ,
+Nepali,Devanagari,,थोपै थोपा : उपन्यास,Thopai thopa : upanayāsa,L-R ,
+Newari,Devanagari,,बुनाः त्याः पि : नियात्रा ,Bunāḥ tyāḥ pi : niyātrā,L-R ,
+NOGAI,Cyrillic,,,,L-R ,
+OSSETIC,Cyrillic,,,,L-R ,
+Panjabi,Gurmukhi,,ਪੰਜਾਬੀ ਲੋਕ-ਸਾਹਿਤ ਵਿਚ ਸੈਨਿਕ,Pañjābī loka-sāhita wica sainika,R-L ,
+Panjabi,Arabic,,پنجابی وچ 20 ہندی کہانیاں,Panjābiī vic 20 Hindī kahāniyān̲,R-L ,
+Persian,Arabic,,‏جامعه ايران در دوران رضا شاه,Jāmiʻah-i Īrān dar dawrān-i Riz̤ā Shāh,R-L ,
+Persian,Arabic,,بچه‌هاى بد,Bachchahʹhā-yi bad,R-L ,
+Pushto,Arabic,,چې لاس دې نه راکاوه,Che lās de nah rākāwah,R-L ,
+Russian,Cyrillic,russian,"Священный мусор : поднимаясь по лестнице Якова : [рассказы, эссе, интервью]","Svi︠a︡shchennyĭ musor : podnimai︠a︡sʹ po lestnit︠s︡e I︠A︡kova : [rasskazy, ėsse, intervʹi︠u︡]",L-R ,
+Sanskrit,Devanagari,,संस्कृतानिबन्धञ्जलिः,Saṃskr̥tanibandhāñjaliḥ ,L-R ,
+Serbian,Cyrillic,,,,L-R ,[SC] Placeholder: need samples for testing 
+Sindhi,Arabic,,انسائيڪلوپيڊيا سنڌيانا,Insāʼiklopīḍiyā Sindhiyānā,R-L ,
+Sinhalese,Sinhalese,,රාවන හිනාව,Rāvaṇa hināva,L-R ,
+Syriac,Syriac,,ܠܫܢܝ ܒܐܘܪܚܐ ܚܕܬܐ,Lešāní b-ʼúrḥā ḥadtā,R-L ,
+TABASARAN,Cyrillic,,,,L-R ,
+Tajik,Cyrillic,,Farḣangi zaboni tojikī va durnamoi farḣangnigorī dar Tojikiston,Фарҳанги забони тоҷикӣ ва дурнамои фарҳангнигорӣ дар Тоҷикистон ,L-R ,
+Tamil,Tamil,,திருக்குறள் தெளிவுரை,Tirukkur̲aḷ teḷivurai /,R-L ,
+Tatar,Cyrillic,,"Татар халкы 1552 елдан соң : ‡b югалтулар һәм табышлар : фәнни-гамәли конференция материаллары : Казан шәһәре, 2002 елның 4 октябре / ‡c [жаваплы мөхәррир Р.Р. Хайретдинов]","Tatar khalky 1552 eldan son︠g︡ : i︠u︡galtular ḣăm tabyshlar : fănni-gamăli konferent︠s︡ii︠a︡ materiallary : Kazan shăḣăre, 2002 elnyn︠g︡ 4 okti︠a︡bre / [zhavaply mȯkhărrir R.R. Khaĭretdinov].",L-R ,
+Telugu,Telugu,,తెలంగాణ ఉద్యమపాట ప్రాదేశిక విమర్శ,Telaṅgāṇa udyamapāṭa prādēśika vimarśa,L-R ,
+Thai,Thai,,แนวคิด รูปแบบ และกระบวนการสร้างสรรค์,Nǣokhit rūpbǣp læ krabūankān sāngsan,L-R ,
+Tibetan,Tibetan,,དབུས་འགྱུར་གྱི་གདན་ས་ཆེན་པོ་སེ་ར་ཐེག་ཆེན་གླིང་གི་གདན་རབས་ངོ་མཚར་ནོར་བུའི་ཕྲེང་བ།,Dbus 'gyur gyi gdan sa chen po se ra theg chen gling gi gdan rabs ngo mtshar nor bu'i phreng ba,L-R ,
+Turkmen,Modified variant of Latin																							,,Türkmenistanyň Prezidenti Gurbanguly Berdimuhamedowyň ýrdy täzeden galkyndyrmak baradaky syýasaty ,Türkmenistanyň Prezidenti Gurbanguly Berdimuhamedowyň ýurdy täzeden galkyndyrmak baradaky syýasaty,L-R ,
+Turkmen,Cyrillic,,"Түркмен халкының гелип чыкышының дүнйә яйрайшының ве онуң дөвлетиниң тарыхының проблемалары : халкара ылмы конференцияның докладларының ве хабарларының тезислери, Ашгабат, 1993 й. 25-26 октябрь / редакторлар, Б.О. Шыхмырадов ... [et al.].","Tu̇rkmen khalkynyn︠g︡ gelip chykyshynyn︠g︡ du̇nĭă i︠a︡ĭraĭshynyn︠g︡ ve onun︠g︡ dȯvletinin︠g︡ tarykhynyn︠g︡ problemalary : khalkara ylmy konferent︠s︡ii︠a︡nyn︠g︡ dokladlarynyn︠g︡ ve khabarlarynyn︠g︡ tezisleri, Ashgabat, 1993 ĭ. 25-26 okti︠a︡brʹ / redaktorlar, B.O. Shykhmyradov ... [et al.].",,
+TUVAN,Cyrillic,,,,L-R ,
+UDMURT,Cyrillic,,,,L-R ,
+Ukrainian,Cyrillic,ukrainian,Децентралізація в Україні та її вплив на соціально-економічний розвиток територій,Det︠s︡entralizat︠s︡ii︠a︡ v Ukraïni ta ïï vplyv na sot︠s︡ialʹno-ekonomichnyĭ rozvytok terytoriĭ,L-R ,
+Urdu,Arabic,,   گلگت سے هندور تک,Gilgit se Hundūr tak,R-L ,
+Urdu,Arabic,,قصّه ميرے سفر کا,Qiṣṣah mere safar kā,R-L ,
+Urdu,Arabic,,نور جهاں، دليپ اور دوسرے فلمى ستارے,"Nūr Jihān̲, Dalīp aur dūsare filmī sitāre",R-L ,
+UYGUR,Cyrillic,,,,L-R ,
+Uzbek,Cyrillic,,Темур ва Улуғбек : даври тарихи / [бош муһаррир Аһмадали Асқаров ; масъул муһаррир Оқилхон Одилхон]. Тошкент : Қомуслар бош таһририяти,"Temur va Ulughbek : davri tarikhi / [bosh muḣarrir Aḣmadali Asqarov ; masʺul muḣarrir Oqilkhon Odilkhon]. Toshkent : Qomuslar bosh taḣririi︠a︡ti, [1996].",L-R ,
+YAKUTIAN,Cyrillic,,,,L-R ,
+Yiddish,Hebrew,,מעשיות אויף שבת,Mayśes̀ af Shabes̀,R-L ,

+ 23 - 0
tests/test01_cfg.py

@@ -0,0 +1,23 @@
+from unittest import TestCase
+
+from importlib import reload
+from os import environ
+
+from tests import TEST_CONFIG_DIR
+import transliterator.tables
+
+
+class TestConfig(TestCase):
+    """ Test configuration parsing. """
+
+    def test_ordering(self):
+        environ["TXL_CONFIG_TABLE_DIR"] = TEST_CONFIG_DIR
+        reload(transliterator.tables)  # Reload new config dir.
+        from transliterator import tables
+        tables.list_tables.cache_clear()
+        tables.load_table.cache_clear()
+
+        tbl = tables.load_table("ordering")
+        exp_order = ["ABCD", "AB", "A", "BCDE", "BCD", "BEFGH", "B"]
+
+        assert [s[0] for s in tbl["roman_to_script"]["map"]] == exp_order

+ 31 - 0
tests/test02_transliteration.py

@@ -0,0 +1,31 @@
+from unittest import TestCase
+
+from importlib import reload
+from os import environ
+
+from transliterator.trans import transliterate
+import transliterator.tables
+
+
+class TestScriptToRoman(TestCase):
+    """
+    Test S2R transliteration.
+
+    TODO use a comprehensive sample table and report errors for unsupported
+    languages.
+    """
+    def setUp(self):
+        if "TXL_CONFIG_TABLE_DIR" in environ:
+            del environ["TXL_CONFIG_TABLE_DIR"]
+            reload(transliterator.tables)
+            # import transliterator.tables
+
+    def test_basic_chinese(self):
+        breakpoint()
+        src = "撞倒須彌 : 漢傳佛教青年學者論壇論文集"
+        dest = (
+                "Zhuang dao Xumi : han zhuan Fo jiao qing nian xue zhe lun "
+                "tan lun wen ji")
+
+        trans = transliterate(src, "chinese")
+        assert trans == dest

+ 0 - 0
transliterator/tests/test03_rest_api.py → tests/test03_rest_api.py


+ 5 - 3
transliterator/hooks/test.py

@@ -1,5 +1,7 @@
 import logging
 
+from transliterator.exceptions import CONT
+
 
 __doc__ = """ Test hook functions. """
 
@@ -19,7 +21,7 @@ def rotate(ctx, n):
     """
     uc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
     lc = uc.lower()
-    logger.debug(f"cursor: {ctx.cur}")
+    logger.debug(f"cursor before ROT: {ctx.cur}")
 
     ch = ctx.src[ctx.cur]
     if ch in uc:
@@ -30,9 +32,9 @@ def rotate(ctx, n):
         dest_ch = lc[(idx + n) % len(lc)]
     else:
         dest_ch = ch
-    logger.debug(f"ROT {n}: {ch} -> {dest_ch}")
+    logger.debug(f"ROT{n}: {ch} -> {dest_ch}")
 
     ctx.dest_ls.append(dest_ch)
     ctx.cur += 1
 
-    return "continue"
+    return CONT

+ 22 - 3
transliterator/tables/__init__.py

@@ -1,8 +1,9 @@
 import logging
+import re
 
 from functools import cache
 from importlib import import_module
-from os import path, access, R_OK
+from os import environ, path, access, R_OK
 
 from yaml import load
 try:
@@ -21,7 +22,9 @@ language (or language and script? TBD)
 """
 
 
-TABLE_DIR = path.join(path.dirname(path.realpath(__file__)), "data")
+DEFAULT_TABLE_DIR = path.join(path.dirname(path.realpath(__file__)), "data")
+# Can be overridden for tests.
+TABLE_DIR = environ.get("TXL_CONFIG_TABLE_DIR", DEFAULT_TABLE_DIR)
 
 # Available hook names.
 HOOKS = (
@@ -151,6 +154,22 @@ def load_table(tname):
         tdata["roman_to_script"]["map"] = tuple(
                 (k.content, tokens[k]) for k in sorted(tokens))
 
+        # Ignore regular expression patterns.
+        # Patterns are evaluated in the order they are listed in the config.
+        ignore_ptn = [
+                re.compile(ptn)
+                for ptn in tdata["roman_to_script"].get("ignore_ptn", [])]
+        for parent in parents:
+            parent_tdata = load_table(parent)
+            # NOTE: duplicates are not removed.
+            ignore_ptn = [
+                re.compile(ptn)
+                for ptn in parent_tdata.get(
+                        "roman_to_script", {}).get("ignore_ptn", [])
+            ] + ignore_ptn
+        tdata["roman_to_script"]["ignore_ptn"] = ignore_ptn
+
+        # Ignore plain strings.
         ignore = {
             Token(t)
             for t in tdata["roman_to_script"].get("ignore", [])
@@ -162,10 +181,10 @@ def load_table(tname):
                 Token(t) for t in parent_tdata.get(
                         "roman_to_script", {}).get("ignore", [])
             }
-
         tdata["roman_to_script"]["ignore"] = [
                 t.content for t in sorted(ignore)]
 
+        # Hooks.
         if "hooks" in tdata["roman_to_script"]:
             tdata["roman_to_script"]["hooks"] = load_hook_fn(
                     tname, tdata["script_to_roman"])

+ 0 - 0
transliterator/tests/test01_cfg.py


+ 0 - 20
transliterator/tests/test02_transliteration.py

@@ -1,20 +0,0 @@
-import unittest
-
-from transliterator.trans import transliterate
-
-
-class TestScriptToRoman(unittest.TestCase):
-    """
-    Test S2R transliteration.
-
-    TODO use a comprehensive sample table and report errors for unsupported
-    languages.
-    """
-
-    def test_basic_chinese(self):
-        src = "撞倒須彌 : 漢傳佛教青年學者論壇論文集"
-        dest = (
-                "Zhuang dao Xumi : Han chuan Fo jiao qing nian xue zhe lun "
-                "tan lun wen ji ")
-
-        assert transliterate(src, "chinese") == dest