模組:Sa-Java-translit

維基詞典,自由的多語言詞典

Interfacing[编辑]

This module works on text in the Javanese script. 這個模組會將梵語未確定的文字拉丁化。

最好不要直接從模板或其他模組調用此模組。要從模板中使用它,請以{{xlit}}做為替代;若要在模組中使用,則以Module:languages#Language:transliterate替代。

關於測試用例,請參閱Module:Sa-Java-translit/testcases

函數[编辑]

tr(text, lang, sc)
Transliterates a given piece of text written in the script specified by the code sc, and language specified by the code lang. When the transliteration fails, returns nil.

It transliterates Sanskrit text in accordance with the IAST convention.

Method[编辑]

The core of the transliteration is the conversion of CV? sequences where V is a vowel or a mark of its absence. The Javanese script is more complicated than the Devanagari script, so the process is a bit more complicated.

The characters of the script that may be transliterated consist of consonants, both base and subscript consonants, dependent vowels, and others. The base consonants are listed in the variable C and the subscript consonants are listed in the variable S. Their transliterations are stored in the table consonants. The transliterations of the dependent vowels are stored in the table diacritics. Other transliterations are stored in the table tt. These include independent vowels and anusvara.

The first step is to partially transliterate the sequences 'CS', for there is no implicit vowel between the two parts. The 'C' part is transliterated, and the 'S' part is left for further consideration. This step is repeated, so as to handle any potential sequences CSSS, though there should not be any.

The next step is to transliterate CV? combinations. Some vowels are encoded as three characters (virama, liquid vowel letter, and length mark). (TODO: Trap undefined sequences.) The structure of vowels is simple enough to be captured inline in the coding of the substitution. Note that if there were any CSSS sequences, the first letters of the transliterations of the subscript consonants would have to be treated as vowels.

The final step is to transliterate the other symbols. Some symbols (certain of the independent vowels) have a second character, which is always TARUNG. These are transliterated first, and then the symbols consisting of a single character are transliterated.


local export = {}
local gsub = mw.ustring.gsub

local consonants = {
	['ꦏ']='k', ['ꦑ']='kh', ['ꦒ']='g', ['ꦓ']='gh', ['ꦔ']='ṅ',
	['ꦕ']='c', ['ꦖ']='ch', ['ꦗ']='j', ['ꦙ']='jh', ['ꦚ']='ñ', 
	['ꦛ']='ṭ', ['ꦜ']='ṭh', ['ꦝ']='ḍ', ['ꦞ']='ḍh', ['ꦟ']='ṇ', 
	['ꦠ']='t', ['ꦡ']='th', ['ꦢ']='d', ['ꦣ']='dh', ['ꦤ']='n', 
	['ꦥ']='p', ['ꦦ']='ph', ['ꦧ']='b', ['ꦨ']='bh', ['ꦩ']='m',
	['ꦪ']='y', ['ꦫ']='r', ['ꦭ']='l', ['ꦮ']='v', -- ['ળ']='ḷ',
	['ꦯ']='ś', ['ꦰ']='ṣ', ['ꦱ']='s', ['ꦲ']='h',
-- Include subscript ('medial') consonants for translation only.
	['ꦿ']='r', ['ꦾ']='y',
}

local diacritics = {
	['ꦴ']='ā', ['ꦶ']='i', ['ꦷ']='ī', ['ꦸ']='u', ['ꦹ']='ū', ['ꦽ']='ṛ', ['ꦽꦴ']='ṝ', 
	['꧀ꦊ']='ḷ', ['꧀ꦋ']='ḹ', ['ꦺ']='e', ['ꦻ']='ai', ['ꦺꦴ']='o', ['ꦵ']='o', ['ꦻꦴ']='au', ['꧀']='',
-- In general, include results of second level diacritics.  I think not needed for Javanese.
--	['y']='y', ['r']='r',
}

local tt = {
	-- vowels
	['ꦄ']='a', ['ꦄꦴ']='ā', ['ꦆ']='i', ['ꦇ']='ī', ['ꦈ']='u', ['ꦈꦴ']='ū', ['ꦉ']='ṛ', ['ꦉꦴ']='ṝ',
	['ꦊ']='ḷ', ['ꦋ']='ḹ', ['ꦌ']='e', ['ꦍ']='ai', ['ꦎ']='o', ['ꦎꦴ']='au', 
	-- chandrabindu    
	['ꦀ']='m̐', --until a better method is found
	-- anusvara    
	['ꦁ']='ṃ', --until a better method is found
	-- visarga    
	['ꦃ']='ḥ',
	-- avagraha
	-- ['ઽ']='’',
	-- others
	['ꦂ']='r',
	--numerals
	['꧐']='0', ['꧑']='1', ['꧒']='2', ['꧓']='3', ['꧔']='4', ['꧕']='5', ['꧖']='6', ['꧗']='7', ['꧘']='8', ['꧙']='9', ['꧇']='',
	--punctuation        
    ['꧉']='.', --double danda
	['꧈']='.', --danda
    --Vedic extensions
    -- ['ᳵ']='x', ['ᳶ']='f',
    --Om
    ['ꦎꦴꦀ']='oṃ',
    --reconstructed
    ['*'] = '',
}
-- List the consonants
local S = 'ꦾꦿ' -- Subscript y and r.
local C = 'ꦏꦑꦒꦓꦔꦕꦖꦗꦙꦚꦛꦜꦝꦞꦟꦠꦡꦢꦣꦤꦥꦦꦧꦨꦩꦪꦫꦭꦮꦯꦰꦱꦲ'..S

function export.tr(text, lang, sc)
-- Handle subscript consonants
	local fn = function(c, d) return consonants[c]..d end
	local search = '(['..C..'])(['..S..'])'
	text = gsub(text, search, fn);
	text = gsub(text, search, fn); -- and again
	text = gsub(
		text,
		'(['..C..S..'])'..
		'(꧀?[ꦴꦶꦷꦸꦹꦽꦊꦋꦺꦻꦵ꧀]?ꦴ?)',
		function(c, d)
			if d == "" then        
				return consonants[c] .. 'a'
			else
				return consonants[c] .. diacritics[d]
			end
		end)

	text = mw.ustring.gsub(text, '.ꦴ', tt) -- Two part independent vowels.
	text = mw.ustring.gsub(text, '.', tt)
	
	return text
end
 
return export