Bläddra i källkod

Basic REST API.

Stefano Cossu 1 år sedan
förälder
incheckning
ec33242346

+ 100 - 100
data/transliterator_sample_strings.csv

@@ -1,100 +1,100 @@
-Language,Script,Original ,Romanized,Reading dir.,Comments
-Arabic,Arabic,نظام الحكم في عمان : من إمامة الإنتخاب الى السلطنة الوراثية,Niẓām al-ḥukm fī ʻUmān : min imāmat al-intikhāb ilá al-salṭanah al-wirāthīyah ,R-L,Hans Wehr's Dictionary for modern written Arabic is the current reference used for proper vocalization
-Arabic,Arabic,ندوة علاقات مصر بدول حوض النيل في ظل رئاسة مصر للاتحاد الإفريقي‏,Nadwat ʻAlāqāt Miṣr bi-Duwal Ḥawḍ al-Nīl fī ẓill Riʼāsat Miṣr lil-Ittiḥād al-Ifrīqī,R-L ,
-Arabic,Arabic,تهذيب البيان والجمع في الفرق بين التكليف والوضع,Tahdhīb al-bayān wa-al-jamʻ fī al-farq bayna al-taklīf wa-al-waḍʻ,R-L ,
-ABAZIN,Cyrillic,,,L-R ,
-ABKHAZ,Cyrillic,,,L-R ,
-ADYGEI,Cyrillic,,,L-R ,
-ALTAY,Cyrilllic,,,L-R ,
-Armenian,Armenian,Մեդիա իրավունք : (ուսումնական ձեռնարկ),Media iravunkʻ : (usumnakan dzeṛnark),L-R ,
-Assamese,Assamese,আগবাৰীত  ফুলিলে  সোনে  মোৰ  চম্পা,Āgabārīta phulile soṇe mora campā,R-L ,
-AVARIC,Cyrillic,,,L-R ,
-Azerbaijani (North),Latin,Milli dövlətçilik hərəkatının yüksəlişi və Xalq Cümhuriyyəti dövründə Azərbaycançılıq ideyası,Milli dövlätçilik häräkatının yüksälişi vä Xalq Cümhuriyyäti dövründä azärbaycançılıq ideyası ,L-R ,
-Azerbaijani (South),Arabic,مجنون مجنون دوشون منى  شعر توپلوسو ,Macnūn macnūn düşün manī : şiʻr toplūsū,L-R ,
-Azerbaijani ,Cyrillic,Ҝениш коммунизм гуруҹулуғу дөврүндә Азәрбајҹан тарихинин бәьзи мәсәләләринә даир,Ġenish kommunizm gurujulughu dȯvru̇ndă Azărbai̐jan tarikhinin băʹzi măsălălărină dair,L-R ,
-BALKAR,Cyrillic,,,L-R ,
-Baluchi,Arabic,درداں گریتگ زار جتک,Dardān̲ grītag zār jatak,R-L ,
-BASHKIR,Cyrillic,,,L-R ,
-Belarusian,Cyrillic,Пётр Клімук : жыццё і подзвіг касманаўта,Pi︠o︡tr Klimuk : z︠h︡ytstsi︠o︡ i podzvih kasmanaŭta,L-R ,
-Bengali,Bengali,উনিশ-বিশ শতকে  পুরোনো  ঢাকার  সমাজ  ও  সংষ্কৃতি  ,Uniśa-Biśa śatake purono Ḍhākāra samāja o saṃskr̥ti ,R-L ,
-Brahui,Arabic,پام کروسن,Pām karosan,R-L ,
-Bulgarian,Cyrillic,Нова книга за руската емиграция в България,Nova kniga za ruskata emigrat︠s︡ii︠a︡ v Bŭlgarii︠a︡,L-R ,
-Buryat,Cyrillic,"Хоёр үндэрэй хормойдо : очеркнууд, публицистическе статьянууд = У подножия двух ундуров / Бата-Мүнхэ Жигжитов.","Khoër u̇ndėrėĭ khormoĭdo : ocherknuud, publit︠s︡isticheske statʹi︠a︡nuud  = U podnozhii︠a︡ dvukh undurov / Bata-Mu̇nkhė Zhigzhitov.",L-R ,
-Burmese,Burmese,ရခိုင်မဟာရာဇဝင်တော်ကြီး,Rakhuiṅʻ mahā rājavaṅʻ toʻ krīʺ,L-R ,
-Central Asian languages,Cyrillic,,,L-R ,
-CHECHEN,Cyrillic,,,L-R ,
-Chinese,Hanzi,撞倒須彌 : 漢傳佛教青年學者論壇論文集,Zhuang dao Xumi : Han chuan Fo jiao qing nian xue zhe lun tan lun wen ji ,L-R ,
-CHUVASH,Cyrillic,,,L-R ,
-CIRCASSIAN,Cyrillic,,,L-R ,
-DAGESTANI,Cyrillic,,,L-R ,
-DARGWA,Cyrillic,,,L-R ,
-GAGAUZ,Cyrillic,,,L-R ,
-Georgian,"Asomtavruli, Nuskhuri, Mkhedruli",ადგილობრივი თვითმმართველობის კოდექსი : საქართველოს ორგანული კანონი; 2018 წლის 7 სექტებრის მდგომარეობით.,Adgilobrivi tʻvitʻmmartʻvelobis kodekʻsi : Sakʻartʻvelos organuli kanoni; 2018 clis 7 sekʻtembris mdgomareobitʻ.,,Modern Georgian is really only written in the mkhedruli script. The other two scripts are its historical predecessors. 
-Greek (Ancient),Greek,καὶ ἀπεγαλάκτισεν τὴν Οὐκ-ἠλεημένην καὶ συνέλαβεν ἔτι καὶ ἔτεκεν υἱόν,kai apegalaktisen tēn ouk ēleēmenēn kai synelaben eti kai eteken huion ,L-R ,
-Greek (Modern),Greek,"Η ΑΕΚ θα καλύψει όλο το συμβόλαιο του Μεξικανού παίχτη, πολλά χρήματα δηλαδή","Hē AEK tha kalypsei holo to symvolaio tou Mexikanou paichtē, polla chrēmata dēladē",L-R ,
-Gujarati,Gujarati,વીરપસલી અને અન્ય વાર્તાઓ,Vīrapasalī ane anya vārtāo,L-R ,
-Hebrew,Hebrew,אבות לבנים,Avot le-vanim,R-L ,
-Hebrew,Hebrew with the diacritics in Roman,בנוסח עדות המזרח ונוסח אשכנז,be-nusaḥ ʻadot ha-Mizraḥ ṿe-nusaḥ Ashkenaz,R-L ,
-Hindi,Devanagari,परमहंस की पीड़ा : महान क्रांतिकारी रामप्रसाद बिस्मिल के जीवन पर आधारित उपन्यास,Paramahaṃsa kī pīṛā : mahāna krāntikārī Rāmaprasāda Bismila ke jīvana para ādhārita upanyāsa ,,"There are several other dialects of Hindi language as well as Rajasthani language and its dialects, all are written in Devanagari script."
-INGUSH,Cyrillic,,,L-R ,
-Japanese,"Hiragana, Katakana, Kanji(Chinese character)",小学校における包括的自己成長プログラムの開発,Shōgakkō ni okeru hōkatsuteki jiko seichō puroguramu no kaihatsu ,L-R ,
-KABARDIAN,Cyrillic,,,L-R ,
-KALMYK,Cyrillic,,,L-R ,
-Kannada,Kannada,ಹರಪನಹಳ್ಳಿ  ಭೀಮವ್ವನವರ  ಕೀರ್ತನೆಗಳು  ,Harapanahaḷḷi Bhīmavvanavara kīrtanegaḷu,L-R ,
-KARACAY-BALKAR,Cyrillic,,,L-R ,
-KARAKALPAK,Cyrillic,,,L-R ,
-Kazakh,Cyrillic/moving to Latin,Қазақстан Республикасы Ұлттық ғылым академиясының хабарлары,Qazaqstan Respublikasy ūlttyq ghylym akademii︠a︡synyn︠g︡ khabarlary,L-R ,
-KHAKAS,Cyrillic,,,L-R ,
-KOMI/KIMI-PERMYAK,Cyrillic,,,L-R ,
-Konkani,Devanagari,श्रीज्ञानेश्वर : अलोकीक व्यक्तीमत्व ,Śrījñāneśvara : alokīka vyaktīmatva ,L-R ,
-Konkani,Kannada,ಚಂದ್ರ ಅನಿ ತಾರಾಂ,Candr ani tārāṃ,L-R ,
-Korean,Hangul,민주화 이후 국정 운영,Minjuhwa ihu kukchŏng unyŏng,L-R ,
-Korean,Hancha only,曉城 趙 明基 博士 追慕 佛教 史學 論文集,Hyosŏng Cho Myŏng-gi Paksa ch'umo Pulgyo sahak nonmunjip,,Not Chinese
-Korean ,Hangul +Hancha,民法 과 法學 의 重要 問題,Minpŏp kwa pŏphak ŭi chungyo munje,,Not Chinese
-KUMYK,Cyrillic,,,,
-Kurdish (Kurmanji),Cyrillic,Ә'франдинед нвиск'аред к'öрдед Әрмәнистанейә Советие,E'frandinêd nvîsk'arêd k'urdêd Ermenîstanêye Sovêtiê,L-R ,
-Kurdish (Sorani),Arabic,کەس خۆى بۆ تەرک ناکرێ,Kes xoy bo terk nakrê,R-L ,
-Kyrgyz,Cyrillic,"Uchkul sȯzdȯr, chechen sȯzdȯr, tamsilder, myskyldar ","Учкул сөздөр, чечен сөздөр, тамсилдер, мыскылдар ",L-R ,
-LAK,Cyrillic,,,L-R ,
-Lao,Lao,ປະຫວັດສາດປະເທດລາວແລະວັດທະນະທຳ,Pavatsāt Pathēt Lāo læ vatthanatham,L-R ,
-LEZGIAN,Cyrillic,,,L-R ,
-Macedonian,Cyrillic,Облици на моќ : вистината за Македонија / Георги (Џорџ) Бранов,Oblici na moḱ : vistinata za Makedonija / Georgi (D︠ž︡ord︠ž︡) Branov,L-R ,
-Malayalam,Malayalam,കേരളപാണിനീയം,Kēralapāninīyam,L-R ,
-Marathi,Devanagari,निवडक शाहीर अमरशेख ,Nivaḍaka Śāhīra Amaraśekha,L-R ,
-MARI,Cyrillic,,,L-R ,
-?,Gurmukhi,ਪੰਜਾਬੀ ਲੋਕ-ਸਾਹਿਤ ਵਿਚ ਸੈਨਿਕ,Pañjābī loka-sāhita wica sainika,R-L ,
-Mongolian,Cyrillic,Дайчин гүрний үеийн олон хэлний үсэг хавсарсан сурвалж бичгийн судлал,Daĭchin gu̇rniĭ u̇eiĭn olon khėlniĭ u̇sėg khavsarsan survalzh bichgiĭn sudlal,L-R ,
-Mongolian,Mongolian,ᠳᠠᠶᠢᠴᠢᠩ ᠭᠦᠷᠦᠨ ᠦ ᠦᠶ ᠡ ᠶᠢᠨ ᠥᠯᠠᠨ ᠺᠡᠯᠡᠨ ᠦ ᠦᠰᠦᠭ ᠬᠠᠪᠰᠸᠷᠸᠭᠰᠠᠨ ᠰᠸᠷᠪᠸᠯᠵᠢ ᠪᠢᠴᠢᠭ ᠦᠨ ᠰᠸᠳᠸᠯᠸᠯ,Dayicing gu̇ru̇n-u̇ u̇y-e-yin olan kelen-u̇ u̇su̇g qabsuruġsan surbulji bicig-u̇n sudulul,L-R ,Originally T-D Script but displayed as L-R
-MORDVIN,Cyrillic,,,L-R ,
-Nepali,Devanagari,थोपै थोपा : उपन्यास,Thopai thopa : upanayāsa,L-R ,
-Newari,Devanagari,बुनाः त्याः पि : नियात्रा ,Bunāḥ tyāḥ pi : niyātrā,L-R ,
-NOGAI,Cyrillic,,,L-R ,
-OSSETIC,Cyrillic,,,L-R ,
-Panjabi,Gurmukhi,ਪੰਜਾਬੀ ਲੋਕ-ਸਾਹਿਤ ਵਿਚ ਸੈਨਿਕ,Pañjābī loka-sāhita wica sainika,R-L ,
-Panjabi,Arabic,پنجابی وچ 20 ہندی کہانیاں,Panjābiī vic 20 Hindī kahāniyān̲,R-L ,
-Persian,Arabic,‏جامعه ايران در دوران رضا شاه,Jāmiʻah-i Īrān dar dawrān-i Riz̤ā Shāh,R-L ,
-Persian,Arabic,بچه‌هاى بد,Bachchahʹhā-yi bad,R-L ,
-Pushto,Arabic,چې لاس دې نه راکاوه,Che lās de nah rākāwah,R-L ,
-Russian,Cyrillic,"Священный мусор : поднимаясь по лестнице Якова : [рассказы, эссе, интервью]","Svi︠a︡shchennyĭ musor : podnimai︠a︡sʹ po lestnit︠s︡e I︠A︡kova : [rasskazy, ėsse, intervʹi︠u︡]",L-R ,
-Sanskrit,Devanagari,संस्कृतानिबन्धञ्जलिः,Saṃskr̥tanibandhāñjaliḥ ,L-R ,
-Sindhi,Arabic,انسائيڪلوپيڊيا سنڌيانا,Insāʼiklopīḍiyā Sindhiyānā,R-L ,
-Sinhalese,Sinhalese,රාවන හිනාව,Rāvaṇa hināva,L-R ,
-Syriac,Syriac,ܠܫܢܝ ܒܐܘܪܚܐ ܚܕܬܐ,Lešāní b-ʼúrḥā ḥadtā,R-L ,
-TABASARAN,Cyrillic,,,L-R ,
-Tajik,Cyrillic,Farḣangi zaboni tojikī va durnamoi farḣangnigorī dar Tojikiston,Фарҳанги забони тоҷикӣ ва дурнамои фарҳангнигорӣ дар Тоҷикистон ,L-R ,
-Tamil,Tamil,திருக்குறள் தெளிவுரை,Tirukkur̲aḷ teḷivurai /,R-L ,
-Tatar,Cyrillic,"Равил Әмирхан : биобиблиографик күрсәткеч = Равиль Усманович Амирханов : биобиблиографический указатель, 1969-2003 / [төзучеләр, Алмаз Вәлиев, Зөбәрҗәт Гарипова].","Ravil Ămirkhan : biobibliografik ku̇rsătkech = Ravilʹ Usmanovich Amirkhanov : biobibliograficheskiĭ ukazatelʹ, 1969-2003 /  [tȯzuchelăr, Almaz Văliev, Zȯbărjăt Garipova].",L-R ,
-Telugu,Telugu,తెలంగాణ ఉద్యమపాట ప్రాదేశిక విమర్శ,Telaṅgāṇa udyamapāṭa prādēśika vimarśa,L-R ,
-Thai,Thai,แนวคิด รูปแบบ และกระบวนการสร้างสรรค์,Nǣokhit rūpbǣp læ krabūankān sāngsan,L-R ,
-Tibetan,Tibetan,དབུས་འགྱུར་གྱི་གདན་ས་ཆེན་པོ་སེ་ར་ཐེག་ཆེན་གླིང་གི་གདན་རབས་ངོ་མཚར་ནོར་བུའི་ཕྲེང་བ།,Dbus 'gyur gyi gdan sa chen po se ra theg chen gling gi gdan rabs ngo mtshar nor bu'i phreng ba,L-R ,
-Turkmen,Modified variant of Latin																							,Türkmenistanyň Prezidenti Gurbanguly Berdimuhamedowyň ýrdy täzeden galkyndyrmak baradaky syýasaty ,Türkmenistanyň Prezidenti Gurbanguly Berdimuhamedowyň ýurdy täzeden galkyndyrmak baradaky syýasaty,L-R ,
-TUVAN,Cyrillic,,,L-R ,
-UDMURT,Cyrillic,,,L-R ,
-Ukrainian,Cyrillic,Децентралізація в Україні та її вплив на соціально-економічний розвиток територій,Det︠s︡entralizat︠s︡ii︠a︡ v Ukraïni ta ïï vplyv na sot︠s︡ialʹno-ekonomichnyĭ rozvytok terytoriĭ,L-R ,
-Urdu,Arabic,   گلگت سے هندور تک,Gilgit se Hundūr tak,R-L ,
-Urdu,Arabic,قصّه ميرے سفر کا,Qiṣṣah mere safar kā,R-L ,
-Urdu,Arabic,نور جهاں، دليپ اور دوسرے فلمى ستارے,"Nūr Jihān̲, Dalīp aur dūsare filmī sitāre",R-L ,
-UYGUR,Cyrillic,,,L-R ,
-Uzbek,Cyrillic,"Amir Temur davrida Movarounnakhr : arkheologii︠a︡, tarikh, madanii︠a︡t ","Амир Темур даврида Мовароуннахр : археология, тарих, маданият",L-R ,
-YAKUTIAN,Cyrillic,,,L-R ,
-Yiddish,Hebrew,מעשיות אויף שבת,Mayśes̀ af Shabes̀,R-L ,
+Language,Script,Original ,Romanized,Reading dir.,Comments,,,,,,,,,,,,,,,,,,,,,,
+Arabic,Arabic,نظام الحكم في عمان : من إمامة الإنتخاب الى السلطنة الوراثية,Niẓām al-ḥukm fī ʻUmān : min imāmat al-intikhāb ilá al-salṭanah al-wirāthīyah ,R-L,Hans Wehr's Dictionary for modern written Arabic is the current reference used for proper vocalization,,,,,,,,,,,,,,,,,,,,,,
+Arabic,Arabic,ندوة علاقات مصر بدول حوض النيل في ظل رئاسة مصر للاتحاد الإفريقي‏,Nadwat ʻAlāqāt Miṣr bi-Duwal Ḥawḍ al-Nīl fī ẓill Riʼāsat Miṣr lil-Ittiḥād al-Ifrīqī,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Arabic,Arabic,تهذيب البيان والجمع في الفرق بين التكليف والوضع,Tahdhīb al-bayān wa-al-jamʻ fī al-farq bayna al-taklīf wa-al-waḍʻ,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+ABAZIN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+ABKHAZ,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+ADYGEI,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+ALTAY,Cyrilllic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Armenian,Armenian,Մեդիա իրավունք : (ուսումնական ձեռնարկ),Media iravunkʻ : (usumnakan dzeṛnark),L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Assamese,Assamese,আগবাৰীত  ফুলিলে  সোনে  মোৰ  চম্পা,Āgabārīta phulile soṇe mora campā,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+AVARIC,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Azerbaijani (North),Latin,Milli dövlətçilik hərəkatının yüksəlişi və Xalq Cümhuriyyəti dövründə Azərbaycançılıq ideyası,Milli dövlätçilik häräkatının yüksälişi vä Xalq Cümhuriyyäti dövründä azärbaycançılıq ideyası ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Azerbaijani (South),Arabic,مجنون مجنون دوشون منى  شعر توپلوسو ,Macnūn macnūn düşün manī : şiʻr toplūsū,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Azerbaijani ,Cyrillic,Ҝениш коммунизм гуруҹулуғу дөврүндә Азәрбајҹан тарихинин бәьзи мәсәләләринә даир,Ġenish kommunizm gurujulughu dȯvru̇ndă Azărbai̐jan tarikhinin băʹzi măsălălărină dair,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+BALKAR,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Baluchi,Arabic,درداں گریتگ زار جتک,Dardān̲ grītag zār jatak,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+BASHKIR,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Belarusian,Cyrillic,Пётр Клімук : жыццё і подзвіг касманаўта,Pi︠o︡tr Klimuk : z︠h︡ytstsi︠o︡ i podzvih kasmanaŭta,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Bengali,Bengali,উনিশ-বিশ শতকে  পুরোনো  ঢাকার  সমাজ  ও  সংষ্কৃতি  ,Uniśa-Biśa śatake purono Ḍhākāra samāja o saṃskr̥ti ,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Brahui,Arabic,پام کروسن,Pām karosan,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Bulgarian,Cyrillic,Нова книга за руската емиграция в България,Nova kniga za ruskata emigrat︠s︡ii︠a︡ v Bŭlgarii︠a︡,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Buryat,Cyrillic,"Хоёр үндэрэй хормойдо : очеркнууд, публицистическе статьянууд = У подножия двух ундуров / Бата-Мүнхэ Жигжитов.","Khoër u̇ndėrėĭ khormoĭdo : ocherknuud, publit︠s︡isticheske statʹi︠a︡nuud  = U podnozhii︠a︡ dvukh undurov / Bata-Mu̇nkhė Zhigzhitov.",L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Burmese,Burmese,ရခိုင်မဟာရာဇဝင်တော်ကြီး,Rakhuiṅʻ mahā rājavaṅʻ toʻ krīʺ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Central Asian languages,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+CHECHEN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Chinese,Hanzi,撞倒須彌 : 漢傳佛教青年學者論壇論文集,Zhuang dao Xumi : Han chuan Fo jiao qing nian xue zhe lun tan lun wen ji ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+CHUVASH,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+CIRCASSIAN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+DAGESTANI,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+DARGWA,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+GAGAUZ,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Georgian,"Asomtavruli, Nuskhuri, Mkhedruli",ადგილობრივი თვითმმართველობის კოდექსი : საქართველოს ორგანული კანონი, 2018 წლის 7 სექტებრის მდგომარეობით.,Adgilobrivi tʻvitʻmmartʻvelobis kodekʻsi : Sakʻartʻvelos organuli kanoni, 2018 clis 7 sekʻtembris mdgomareobitʻ.,,Modern Georgian is really only written in the mkhedruli script. The other two scripts are its historical predecessors. ,,,,,,,,,,,,,,,,,,,,
+Greek (Ancient),Greek,καὶ ἀπεγαλάκτισεν τὴν Οὐκ-ἠλεημένην καὶ συνέλαβεν ἔτι καὶ ἔτεκεν υἱόν,kai apegalaktisen tēn ouk ēleēmenēn kai synelaben eti kai eteken huion ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Greek (Modern),Greek,"Η ΑΕΚ θα καλύψει όλο το συμβόλαιο του Μεξικανού παίχτη, πολλά χρήματα δηλαδή","Hē AEK tha kalypsei holo to symvolaio tou Mexikanou paichtē, polla chrēmata dēladē",L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Gujarati,Gujarati,વીરપસલી અને અન્ય વાર્તાઓ,Vīrapasalī ane anya vārtāo,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Hebrew,Hebrew,אבות לבנים,Avot le-vanim,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Hebrew,Hebrew with the diacritics in Roman,בנוסח עדות המזרח ונוסח אשכנז,be-nusaḥ ʻadot ha-Mizraḥ ṿe-nusaḥ Ashkenaz,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Hindi,Devanagari,परमहंस की पीड़ा : महान क्रांतिकारी रामप्रसाद बिस्मिल के जीवन पर आधारित उपन्यास,Paramahaṃsa kī pīṛā : mahāna krāntikārī Rāmaprasāda Bismila ke jīvana para ādhārita upanyāsa ,,"There are several other dialects of Hindi language as well as Rajasthani language and its dialects, all are written in Devanagari script.",,,,,,,,,,,,,,,,,,,,,,
+INGUSH,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Japanese,"Hiragana, Katakana, Kanji(Chinese character)",小学校における包括的自己成長プログラムの開発,Shōgakkō ni okeru hōkatsuteki jiko seichō puroguramu no kaihatsu ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+KABARDIAN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+KALMYK,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Kannada,Kannada,ಹರಪನಹಳ್ಳಿ  ಭೀಮವ್ವನವರ  ಕೀರ್ತನೆಗಳು  ,Harapanahaḷḷi Bhīmavvanavara kīrtanegaḷu,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+KARACAY-BALKAR,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+KARAKALPAK,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Kazakh,Cyrillic/moving to Latin,Қазақстан Республикасы Ұлттық ғылым академиясының хабарлары,Qazaqstan Respublikasy ūlttyq ghylym akademii︠a︡synyn︠g︡ khabarlary,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+KHAKAS,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+KOMI/KIMI-PERMYAK,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Konkani,Devanagari,श्रीज्ञानेश्वर : अलोकीक व्यक्तीमत्व ,Śrījñāneśvara : alokīka vyaktīmatva ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Konkani,Kannada,ಚಂದ್ರ ಅನಿ ತಾರಾಂ,Candr ani tārāṃ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Korean,Hangul,민주화 이후 국정 운영,Minjuhwa ihu kukchŏng unyŏng,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Korean,Hancha only,曉城 趙 明基 博士 追慕 佛教 史學 論文集,Hyosŏng Cho Myŏng-gi Paksa ch'umo Pulgyo sahak nonmunjip,,Not Chinese,,,,,,,,,,,,,,,,,,,,,,
+Korean ,Hangul +Hancha,民法 과 法學 의 重要 問題,Minpŏp kwa pŏphak ŭi chungyo munje,,Not Chinese,,,,,,,,,,,,,,,,,,,,,,
+KUMYK,Cyrillic,,,,,,,,,,,,,,,,,,,,,,,,,,
+Kurdish (Kurmanji),Cyrillic,Ә'франдинед нвиск'аред к'öрдед Әрмәнистанейә Советие,E'frandinêd nvîsk'arêd k'urdêd Ermenîstanêye Sovêtiê,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Kurdish (Sorani),Arabic,کەس خۆى بۆ تەرک ناکرێ,Kes xoy bo terk nakrê,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Kyrgyz,Cyrillic,"Uchkul sȯzdȯr, chechen sȯzdȯr, tamsilder, myskyldar ","Учкул сөздөр, чечен сөздөр, тамсилдер, мыскылдар ",L-R ,,,,,,,,,,,,,,,,,,,,,,,
+LAK,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Lao,Lao,ປະຫວັດສາດປະເທດລາວແລະວັດທະນະທຳ,Pavatsāt Pathēt Lāo læ vatthanatham,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+LEZGIAN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Macedonian,Cyrillic,Облици на моќ : вистината за Македонија / Георги (Џорџ) Бранов,Oblici na moḱ : vistinata za Makedonija / Georgi (D︠ž︡ord︠ž︡) Branov,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Malayalam,Malayalam,കേരളപാണിനീയം,Kēralapāninīyam,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Marathi,Devanagari,निवडक शाहीर अमरशेख ,Nivaḍaka Śāhīra Amaraśekha,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+MARI,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+?,Gurmukhi,ਪੰਜਾਬੀ ਲੋਕ-ਸਾਹਿਤ ਵਿਚ ਸੈਨਿਕ,Pañjābī loka-sāhita wica sainika,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Mongolian,Cyrillic,Дайчин гүрний үеийн олон хэлний үсэг хавсарсан сурвалж бичгийн судлал,Daĭchin gu̇rniĭ u̇eiĭn olon khėlniĭ u̇sėg khavsarsan survalzh bichgiĭn sudlal,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Mongolian,Mongolian,ᠳᠠᠶᠢᠴᠢᠩ ᠭᠦᠷᠦᠨ ᠦ ᠦᠶ ᠡ ᠶᠢᠨ ᠥᠯᠠᠨ ᠺᠡᠯᠡᠨ ᠦ ᠦᠰᠦᠭ ᠬᠠᠪᠰᠸᠷᠸᠭᠰᠠᠨ ᠰᠸᠷᠪᠸᠯᠵᠢ ᠪᠢᠴᠢᠭ ᠦᠨ ᠰᠸᠳᠸᠯᠸᠯ,Dayicing gu̇ru̇n-u̇ u̇y-e-yin olan kelen-u̇ u̇su̇g qabsuruġsan surbulji bicig-u̇n sudulul,L-R ,Originally T-D Script but displayed as L-R,,,,,,,,,,,,,,,,,,,,,,
+MORDVIN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Nepali,Devanagari,थोपै थोपा : उपन्यास,Thopai thopa : upanayāsa,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Newari,Devanagari,बुनाः त्याः पि : नियात्रा ,Bunāḥ tyāḥ pi : niyātrā,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+NOGAI,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+OSSETIC,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Panjabi,Gurmukhi,ਪੰਜਾਬੀ ਲੋਕ-ਸਾਹਿਤ ਵਿਚ ਸੈਨਿਕ,Pañjābī loka-sāhita wica sainika,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Panjabi,Arabic,پنجابی وچ 20 ہندی کہانیاں,Panjābiī vic 20 Hindī kahāniyān̲,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Persian,Arabic,‏جامعه ايران در دوران رضا شاه,Jāmiʻah-i Īrān dar dawrān-i Riz̤ā Shāh,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Persian,Arabic,بچه‌هاى بد,Bachchahʹhā-yi bad,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Pushto,Arabic,چې لاس دې نه راکاوه,Che lās de nah rākāwah,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Russian,Cyrillic,"Священный мусор : поднимаясь по лестнице Якова : [рассказы, эссе, интервью]","Svi︠a︡shchennyĭ musor : podnimai︠a︡sʹ po lestnit︠s︡e I︠A︡kova : [rasskazy, ėsse, intervʹi︠u︡]",L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Sanskrit,Devanagari,संस्कृतानिबन्धञ्जलिः,Saṃskr̥tanibandhāñjaliḥ ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Sindhi,Arabic,انسائيڪلوپيڊيا سنڌيانا,Insāʼiklopīḍiyā Sindhiyānā,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Sinhalese,Sinhalese,රාවන හිනාව,Rāvaṇa hināva,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Syriac,Syriac,ܠܫܢܝ ܒܐܘܪܚܐ ܚܕܬܐ,Lešāní b-ʼúrḥā ḥadtā,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+TABASARAN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Tajik,Cyrillic,Farḣangi zaboni tojikī va durnamoi farḣangnigorī dar Tojikiston,Фарҳанги забони тоҷикӣ ва дурнамои фарҳангнигорӣ дар Тоҷикистон ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Tamil,Tamil,திருக்குறள் தெளிவுரை,Tirukkur̲aḷ teḷivurai /,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Tatar,Cyrillic,"Равил Әмирхан : биобиблиографик күрсәткеч = Равиль Усманович Амирханов : биобиблиографический указатель, 1969-2003 / [төзучеләр, Алмаз Вәлиев, Зөбәрҗәт Гарипова].","Ravil Ămirkhan : biobibliografik ku̇rsătkech = Ravilʹ Usmanovich Amirkhanov : biobibliograficheskiĭ ukazatelʹ, 1969-2003 /  [tȯzuchelăr, Almaz Văliev, Zȯbărjăt Garipova].",L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Telugu,Telugu,తెలంగాణ ఉద్యమపాట ప్రాదేశిక విమర్శ,Telaṅgāṇa udyamapāṭa prādēśika vimarśa,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Thai,Thai,แนวคิด รูปแบบ และกระบวนการสร้างสรรค์,Nǣokhit rūpbǣp læ krabūankān sāngsan,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Tibetan,Tibetan,དབུས་འགྱུར་གྱི་གདན་ས་ཆེན་པོ་སེ་ར་ཐེག་ཆེན་གླིང་གི་གདན་རབས་ངོ་མཚར་ནོར་བུའི་ཕྲེང་བ།,Dbus 'gyur gyi gdan sa chen po se ra theg chen gling gi gdan rabs ngo mtshar nor bu'i phreng ba,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Turkmen,Modified variant of Latin,,,,,,,,,,,,,,,,,,,,,,,,Türkmenistanyň Prezidenti Gurbanguly Berdimuhamedowyň ýrdy täzeden galkyndyrmak baradaky syýasaty ,Türkmenistanyň Prezidenti Gurbanguly Berdimuhamedowyň ýurdy täzeden galkyndyrmak baradaky syýasaty,L-R 
+TUVAN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+UDMURT,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Ukrainian,Cyrillic,Децентралізація в Україні та її вплив на соціально-економічний розвиток територій,Det︠s︡entralizat︠s︡ii︠a︡ v Ukraïni ta ïï vplyv na sot︠s︡ialʹno-ekonomichnyĭ rozvytok terytoriĭ,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Urdu,Arabic,   گلگت سے هندور تک,Gilgit se Hundūr tak,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Urdu,Arabic,قصّه ميرے سفر کا,Qiṣṣah mere safar kā,R-L ,,,,,,,,,,,,,,,,,,,,,,,
+Urdu,Arabic,نور جهاں، دليپ اور دوسرے فلمى ستارے,"Nūr Jihān̲, Dalīp aur dūsare filmī sitāre",R-L ,,,,,,,,,,,,,,,,,,,,,,,
+UYGUR,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Uzbek,Cyrillic,"Amir Temur davrida Movarounnakhr : arkheologii︠a︡, tarikh, madanii︠a︡t ","Амир Темур даврида Мовароуннахр : археология, тарих, маданият",L-R ,,,,,,,,,,,,,,,,,,,,,,,
+YAKUTIAN,Cyrillic,,,L-R ,,,,,,,,,,,,,,,,,,,,,,,
+Yiddish,Hebrew,מעשיות אויף שבת,Mayśes̀ af Shabes̀,R-L ,,,,,,,,,,,,,,,,,,,,,,,

+ 20 - 12
transliterator/rest_api.py

@@ -1,6 +1,9 @@
 from os import environ
 
-from flask import Flask, request
+from flask import Flask, Response, jsonify, request
+
+from transliterator.tables import list_tables, load_table
+from transliterator.trans import transliterate
 
 
 def create_app():
@@ -26,19 +29,24 @@ def health_check():
 
 @app.route("/languages", methods=["GET"])
 def list_languages():
-    return "TODO list of supported languages goes here."
+    return jsonify(list_tables())
 
 
-@app.route("/scripts")
-@app.route("/scripts/<lang>")
-def list_scripts(lang=None):
-    lang_str = f"for {lang}" if lang else "for all languages"
-    return f"TODO list of supported scripts {lang_str} go here."
+@app.route("/table/<lang>")
+def dump_table(lang):
+    """
+    Dump parsed transliteration table for a language.
+    """
+    return jsonify(load_table(lang))
 
 
-@app.route("/trans/<script>/<lang>/<dir>", methods=["POST"])
-def transliterate(script, lang, dir):
+@app.route("/trans/<lang>/r2s", methods=["POST"], defaults={"s2r": False})
+@app.route("/trans/<lang>", methods=["POST"])
+def transliterate_req(lang, s2r=True):
     in_txt = request.form["text"]
-    return (
-            f"TODO transliterate text {in_txt}, language {lang}, "
-            f"script {script}, direction {dir}")
+    if not len(in_txt):
+        return ("No input text provided! ", 400)
+
+    return Response(
+            transliterate(in_txt, lang, s2r),
+            content_type="text/plain")

+ 44 - 2
transliterator/tables/__init__.py

@@ -1,7 +1,6 @@
 import logging
 
 from functools import cache
-# from glob import glob
 from os import path, access, R_OK
 
 from yaml import load
@@ -65,10 +64,21 @@ class Token:
         return self.content < other.content
 
 
+@cache
+def list_tables():
+    """
+    List all the available tables.
+    """
+    with open(path.join(TABLE_DIR, "index.yml")) as fh:
+        tdata = load(fh, Loader=Loader)
+
+    return tdata
+
+
 @cache
 def load_table(tname):
     """
-    Load one transliteration table.
+    Load one transliteration table and possible parent.
 
     The table file is parsed into an in-memory configuration that contains
     the language & script metadata and parsing rules.
@@ -81,10 +91,21 @@ def load_table(tname):
     with open(fname) as fh:
         tdata = load(fh, Loader=Loader)
 
+    # NOTE Only one level of inheritance. No need for recursion for now.
+    parent = tdata.get("general", {}).get("inherits", None)
+    if parent:
+        parent_tdata = load_table(parent)
+
     if "script_to_roman" in tdata:
         tokens = {
                 Token(k): v
                 for k, v in tdata["script_to_roman"].get("map", {}).items()}
+        if parent:
+            # Merge (and override) parent values.
+            tokens = {
+                Token(k): v for k, v in parent_tdata.get(
+                        "script_to_roman", {}).get("map", {})
+            } | tokens
         tdata["script_to_roman"]["map"] = tuple(
                 (k.content, tokens[k]) for k in sorted(tokens))
 
@@ -92,7 +113,28 @@ def load_table(tname):
         tokens = {
                 Token(k): v
                 for k, v in tdata["roman_to_script"].get("map", {}).items()}
+        if parent:
+            # Merge (and override) parent values.
+            tokens = {
+                Token(k): v for k, v in parent_tdata.get(
+                        "roman_to_script", {}).get("map", {})
+            } | tokens
         tdata["roman_to_script"]["map"] = tuple(
                 (k.content, tokens[k]) for k in sorted(tokens))
 
+        if parent:
+            p_ignore = {
+                    Token(t) for t in parent_tdata.get(
+                            "roman_to_script", {}).get("ignore", [])}
+        else:
+            p_ignore = set()
+
+        ignore = {
+            Token(t)
+            for t in tdata["roman_to_script"].get("ignore", [])
+        } | p_ignore
+
+        tdata["roman_to_script"]["ignore"] = [
+                t.content for t in sorted(ignore)]
+
     return tdata

+ 29 - 25
transliterator/tables/data/_cyrillic_base.yml

@@ -1,11 +1,36 @@
 general:
   name: Cyrillic base
-  notes: copied from Ukrainian .cfg file.
+  notes: >
+    copied from Russian .cfg file and stripped
+    off language-specific tokens. Russian ignore list
+    has been left here on purpose, assuming it's valid
+    for all child languages.
 
 roman_to_script:
   ignore:
-    - "At head of title"
-    - "Colophon"
+    - "at head of title"
+    - "colophon"
+    - "date of publication not identified"
+    - "place of publication not identified"
+    - "publisher not identified"
+    # NOTE There is ambiguity about ignoring these
+    # words. Note that the single-character Roman
+    # numerals are not included on purpose.
+    # Ideally the source editors should use the
+    # dedicated U+2160÷U+216F (uppercase Roman
+    # numerals) and/or U+2170÷U+217F (lower case Roman
+    # numerals) ranges to avoid this ambiguity.
+    #- re: "I{2,3}"
+    #- re: "I(V|X)"
+    #- re: "LI{,3}"
+    #- re: "LI?(V|X)"
+    #- re: "L(V|X{1,3})I{,3}"
+    #- re: "LX{1,3}I?V"
+    #- re: "LX{1,3}VI{,3}"
+    #- re: "(V|X{1,3})I{,3}"
+    #- re: "X{1,3}I{,3}"
+    #- re: "X{1,3}I(V|X)"
+    #- re: "X{1,3}VI{,3}"
     - "II"
     - "III"
     - "IV"
@@ -92,29 +117,8 @@ roman_to_script:
     - "XXXVII"
     - "XXXVIII"
     - "and one other"
-    - "and two others"
-    - "and three others"
-    - "and four others"
-    - "and five others"
-    - "and six others"
-    - "and seven others"
-    - "and eight others"
-    - "and nine others"
-    - "and ten others"
-    - "and eleven others"
-    - "and twelve others"
-    - "and thirteen others"
-    - "and fourteen others"
-    - "and fifteen others"
-    - "and sixteen others"
-    - "and seventeen others"
-    - "and eighteen others"
-    - "and nineteen others"
-    - "and others"
+    #- re: "and ([a-z]+ )?others"
     - "et al."
-    - "date of publication not identified"
-    - "Place of publication not identified"
-    - "publisher not identified"
 
   map:
     "A": "\u0410"

+ 27 - 7
transliterator/trans.py

@@ -11,7 +11,7 @@ MULTI_WS_RE = re.compile(r"\s{2,}")
 logger = logging.getLogger(__name__)
 
 
-def transliterate(src, script, lang, s2r=True):
+def transliterate(src, lang, s2r=True):
     """
     Transliterate a single string.
 
@@ -20,8 +20,6 @@ def transliterate(src, script, lang, s2r=True):
 
         lang (str): Language name.
 
-        script (str): Name of the script that the language is encoded in.
-
     Keyword args:
         s2r (bool): If True (the default), the source is considered to be a
         non-latin script in the language and script specified, and the output
@@ -31,12 +29,16 @@ def transliterate(src, script, lang, s2r=True):
     Return:
         str: The transliterated string.
     """
-    # TODO script is ignored at the moment.
+    source_str = "Latin" if s2r else lang
+    target_str = lang if s2r else "Latin"
+    logger.info(f"Transliteration is from {source_str} to {target_str}.")
+
     cfg = load_table(lang)
+    logger.info(f"Loaded table for {lang}.")
+
     # General directives.
     # general_dir = cfg.get("directives", {})
 
-    # We could be clever here but let's give the users a precise message.
     if s2r and "script_to_roman" not in cfg:
         raise NotImplementedError(
             f"Script-to-Roman transliteration not yet supported for {lang}."
@@ -53,7 +55,24 @@ def transliterate(src, script, lang, s2r=True):
     dest_ls = []
     # Loop through source characters. The increment of each loop depends on the
     # length of the token that eventually matches.
+    ignore_list = langsec.get("ignore", [])  # Only present in R2S
     while i < len(src):
+        # Check ignore list first. Find as many subsequent ignore tokens
+        # as possible before moving on to looking for match tokens.
+        while True:
+            ignoring = False
+            for tk in ignore_list:
+                step = len(tk)
+                if tk == src[i:i + step]:
+                    logger.info(f"Ignored token: {tk}")
+                    dest_ls.append(tk)
+                    i += step
+                    ignoring = True
+                    break
+            # We looked through all ignore tokens, not found any. Move on.
+            if not ignoring:
+                break
+
         match = False
         for src_tk, dest_tk in langsec["map"]:
             # Longer tokens should be guaranteed to be scanned before their
@@ -66,8 +85,9 @@ def transliterate(src, script, lang, s2r=True):
                 match = True
                 i += step
                 break
+
         if not match:
-            # Copy non-mapped character (one at a time).
+            # No match found. Copy non-mapped character (one at a time).
             logger.info(f"Token {src[i]} at position {i} is not mapped.")
             dest_ls.append(src[i])
             i += 1
@@ -75,7 +95,7 @@ def transliterate(src, script, lang, s2r=True):
     if langsec_dir.get("capitalize", False):
         dest_ls[0] = dest_ls[0].capitalize()
 
-    logger.info(f"Output list: {dest_ls}")
+    logger.debug(f"Output list: {dest_ls}")
     dest = "".join(dest_ls)
 
     dest = re.sub(MULTI_WS_RE, ' ', dest.strip())