模组:Data consistency check/doc

这是Module:Data consistency check的文档页面

此模块检查维基词典上使用的语言、语言系属和文字数据模块的有效性和内部一致性：包括Category:语言资料模块和 Module:scripts/data。

输出[编辑]

检测到差异：

Template:langname-lite

代码：aek。现有名称：Haeke。预期名称：哈克语。
代码：aey。现有名称：Amele。预期名称：阿梅勒语。
代码：anw。现有名称：Anaang。预期名称：阿纳昂语。
代码：apl。现有名称：Lipan。预期名称：利攀语。
代码：aqd。现有名称：Ampari Dogon。预期名称：安帕里-多贡语。
代码：aqg。现有名称：Arigidi。预期名称：阿里吉蒂语。
代码：arx。现有名称：Aruá。预期名称：阿鲁阿什语。
代码：bbr。现有名称：Girawa。预期名称：吉拉瓦语。
代码：bmi。现有名称：Bagirmi。预期名称：巴吉尔米语。
代码：bnq。现有名称：Bantik。预期名称：班第语。
代码：byt。现有名称：Berti。预期名称：扎加瓦语。
代码：cav。现有名称：Cavineña。预期名称：卡维内纳语。
代码：clc。现有名称：Chilcotin。预期名称：奇尔科廷语。
代码：cro。现有名称：Crow。预期名称：克劳语。
代码：dcr。现有名称：Negerhollands。预期名称：维京群岛克里奥荷兰语。
代码：dis。现有名称：Dimasa。预期名称：迪马萨语。
代码：djk。现有名称：Aukan。预期名称：奥坎语。
代码：eee。现有名称：E。预期名称：诶话。
代码：emb。现有名称：Embaloh。预期名称：恩巴洛语。
代码：fad。现有名称：Wagi。预期名称：瓦吉语。
代码：foi。现有名称：Foi。预期名称：福伊语。
代码：frd。现有名称：Fordata。预期名称：福尔达塔语。
代码：guw。现有名称：Gun。预期名称：奥古语。
代码：hdy。现有名称：Hadiyya。预期名称：哈迪亚语。
代码：hro。现有名称：Haroi。预期名称：赫雷语。
代码：huu。现有名称：穆瑞胡图图语。预期名称：穆鲁伊维托托语。
代码：jaz。现有名称：Jawe。预期名称：贾韦语。
代码：lew。现有名称：Ledo Kaili。预期名称：列多-凯利语。
代码：mbj。现有名称：Nadëb。预期名称：纳德布语。
代码：mee。现有名称：Mengen。预期名称：门根语。
代码：moa。现有名称：Mwan。预期名称：姆宛语。
代码：mps。现有名称：Dadibi。预期名称：达迪比语。
代码：muz。现有名称：Mursi。预期名称：穆尔西语。
代码：mvd。现有名称：Mamboru。预期名称：曼博鲁语。
代码：nag。现有名称：Naga Pidgin。预期名称：那加克里奥尔语。
代码：oma。现有名称：Omaha-Ponca。预期名称：奥马哈-庞卡语。
代码：plg。现有名称：Pilagá。预期名称：皮拉加语。
代码：pln。现有名称：Palenquero。预期名称：帕伦奎罗语。
代码：pml。现有名称：Sabir。预期名称：沙比尔语。
代码：poo。现有名称：Central Pomo。预期名称：中波莫语。
代码：qsb-ibe。现有名称：a pre-Roman substrate of Iberia。预期名称：罗马占领前一种伊比利亚底层语言。
代码：rel。现有名称：Rendille。预期名称：伦迪勒语。
代码：slu。现有名称：Selaru。预期名称：塞拉鲁语。
代码：snp。现有名称：Siane。预期名称：西亚内语。
代码：tad。现有名称：Tause。预期名称：陶塞语。
代码：tmu。现有名称：Iau。预期名称：雅乌语。
代码：ugo。现有名称：贡语。预期名称：贡语 (泰国)。
代码：umo。现有名称：Umotína。预期名称：乌莫蒂纳语。
代码：und-phi。现有名称：Philistine。预期名称：非利士语。
代码：xrn。现有名称：Arin。预期名称：阿林语。
代码：yuq。现有名称：Yuqui。预期名称：尤奇语。
代码：zne。现有名称：Zande。预期名称：赞德语。

Module:etymology languages/data

早期现代西班牙语 (es-ear) has the 中世纪西班牙语 (osp) listed in its ancestor field, which is redundant, since it is calculated to be ancestral automatically.
The data key preprocess_links for ??? (th-new) is invalid.
赫尔尼基语 (xum-her) has a canonical name that is not unique; it is also used by the code xhr.

Module:families/data

Phla–Pherá是alv-pph的名称，但在aliases中重复出现。
古印度-雅利安语支 (inc-old) has no child families or languages.
普拉克里特诸语言 (pra) has no child families or languages.

Module:languages/data/2

爪哇语 (jv) is in the 巽他-苏拉威西语群 (poz-sus) and has 古爪哇语 (kaw) set as an ancestor, but it is not possible to form an ancestral chain between them.
书面挪威语 (nb) has 丹麦语 (da) set as an ancestor, but is not in the 东斯堪地那维亚语支 (gmq-eas).
书面挪威语 (nb) has 中古挪威语 (gmq-mno) set as an ancestor, but is not in the 西斯堪地那维亚语支 (gmq-wes).

Module:languages/data/3/a

??? (aeu) 的规范名称不唯一，同时被代码aik使用。
??? (aic) 的规范名称不唯一，同时被代码amk使用。
??? (alh) 的规范名称不唯一，同时被代码aru使用。

Module:languages/data/3/b

??? (bfy) 的规范名称不唯一，同时被代码bgq使用。
??? (bgw) 的规范名称不唯一，同时被代码btv使用。
??? (bhb) 的规范名称不唯一，同时被代码bzr使用。
??? (boa) 的规范名称不唯一，同时被代码bxd使用。
??? (bsq) 的规范名称不唯一，同时被代码bas使用。
??? (bzw) 的规范名称不唯一，同时被代码bas使用。

Module:languages/data/3/c

Maa是cma的名称，但在aliases中重复出现。
Island Carib是crb的名称，但在otherNames中重复出现。

Module:languages/data/3/d

??? (dax) 的规范名称不唯一，同时被代码dij使用。

Module:languages/data/3/h

加勒比印度斯坦语 (hns) has 博杰普尔语 (bho) set as an ancestor, but is not in the 东印度-雅利安语支 (inc-eas).
加勒比印度斯坦语 (hns) has 阿瓦德语 (awa) set as an ancestor, but is not in the 东印地语支 (inc-hie).

Module:languages/data/3/j

新喀里多尼亚爪哇语 (jas) has 爪哇语 (jv) set as an ancestor, but is not in the 巽他-苏拉威西语群 (poz-sus).
加勒比爪哇语 (jvn) has 爪哇语 (jv) set as an ancestor, but is not in the 巽他-苏拉威西语群 (poz-sus).

Module:languages/data/3/k

??? (keh) 的规范名称不唯一，同时被代码kzq使用。
??? (klm) 的规范名称不唯一，同时被代码kyo使用。
??? (kmg) 的规范名称不唯一，同时被代码ket使用。
??? (kmx) 的规范名称不唯一，同时被代码aup使用。
??? (koz) 的规范名称不唯一，同时被代码hhr使用。
??? (kvc) 的规范名称不唯一，同时被代码kqb使用。

Module:languages/data/3/l

Looma是lom的名称，但在otherNames中重复出现。

Module:languages/data/3/m

??? (mmq) 的规范名称不唯一，同时被代码ahs使用。
??? (mmr) 的规范名称不唯一，同时被代码muq使用。
??? (mnk) 的规范名称不唯一，同时被代码man使用。
??? (mui) 的规范名称不唯一，同时被代码mse使用。

Module:languages/data/3/n

??? (nev) 的规范名称不唯一，同时被代码hnu使用。

Module:languages/data/3/o

??? (omi) 的规范名称不唯一，同时被代码aom使用。

Module:languages/data/3/p

??? (pbv) 的规范名称不唯一，同时被代码bvn使用。
??? (pht) 的规范名称不唯一，同时被代码mfl使用。
??? (pmm) 的规范名称不唯一，同时被代码blf使用。
??? (ppt) 的规范名称不唯一，同时被代码pai使用。

Module:languages/data/3/r

??? (rwo) 的规范名称不唯一，同时被代码luf使用。

Module:languages/data/3/s

??? (snx) 的规范名称不唯一，同时被代码raq使用。

Module:languages/data/3/t

??? (tiv) 的规范名称不唯一，同时被代码ter使用。

Module:languages/data/3/v

??? (vmw) 的规范名称不唯一，同时被代码lva使用。

Module:languages/data/3/w

??? (wkd) 的规范名称不唯一，同时被代码mkg使用。
Wè Northern是wob的名称，但在otherNames中重复出现。

Module:languages/data/3/x

??? (xcr) 的规范名称不唯一，同时被代码khr使用。
??? (xib) lists an invalid script code Ibrn.
??? (xns) 的规范名称不唯一，同时被代码soq使用。

Module:languages/data/3/y

Yaroamë是yro的名称，但在otherNames中重复出现。
??? (yun) 的规范名称不唯一，同时被代码bez使用。
??? (yuq) 的规范名称不唯一，同时被代码yuc使用。

Module:languages/data/exceptional

??? (tuw-bal) 的规范名称不唯一，同时被代码bao使用。

Module:languages/data/exceptional/extra

??? (map-kxv) has data in Module:languages/data/exceptional, but does not have corresponding data in Module:languages/data/exceptional/extra.
??? (map-trv) has data in Module:languages/data/exceptional, but does not have corresponding data in Module:languages/data/exceptional/extra.

Module:scripts/data

布列斯符号（Blis）未被任何语言使用并且没有给出供自动检测所用的字元。
塞普勒斯-米诺斯文字（Cpmn）未被任何语言使用。
平假名（Hira）未被任何语言使用。
假名（Hrkt）未被任何语言使用。
东北伊比利亚文字（Ibrnn）未被任何语言使用并且没有给出供自动检测所用的字元。
东南伊比利亚文字（Ibrns）未被任何语言使用并且没有给出供自动检测所用的字元。
图像渲染（Imag）未被任何语言使用并且没有给出供自动检测所用的字元。
国际音标（Ipach）未被任何语言使用并且没有给出供自动检测所用的字元。
Moon（Moon）未被任何语言使用并且没有给出供自动检测所用的字元。
摩斯电码（Morse）未被任何语言使用并且没有给出供自动检测所用的字元。
音乐记号（Music）未被任何语言使用。
未指定文字（None）未被任何语言使用并且没有给出供自动检测所用的字元。
Ol Onal（Onao）未被任何语言使用并且没有给出供自动检测所用的字元。
朗格朗格（Roro）未被任何语言使用并且没有给出供自动检测所用的字元。
卢米文数字（Rumin）未被任何语言使用。
旗语（Semap）未被任何语言使用并且没有给出供自动检测所用的字元。
Visible Speech（Visp）未被任何语言使用并且没有给出供自动检测所用的字元。
数学记号（Zmth）未被任何语言使用。
符号（Zsym）未被任何语言使用。
未定文字（Zyyy）未被任何语言使用并且没有给出供自动检测所用的字元。
未编码文字（Zzzz）未被任何语言使用并且没有给出供自动检测所用的字元。
The data key ietf_subtag for 东南伊比利亚文字 (Ibrns) is invalid.
The data key ietf_subtag for 旗语 (Semap) is invalid.
The data key ietf_subtag for 音乐记号 (Music) is invalid.
The data key ietf_subtag for 未指定文字 (None) is invalid.
The data key ietf_subtag for 国际音标 (Ipach) is invalid.
The data key ietf_subtag for 希腊文 (Polyt) is invalid.
The data key ietf_subtag for 图像渲染 (Imag) is invalid.
The data key ietf_subtag for 卢米文数字 (Rumin) is invalid.
The data key ietf_subtag for 东北伊比利亚文字 (Ibrnn) is invalid.
The data key sort_by_scraping for 日文 (Jpan) is invalid.

进行的检查[编辑]

对于多个数据模块：

语言、语言系属和词源语言的代码必须是唯一的，不能相互冲突。
不得在其他名称列表中找到语言、语言系属和词源语言的规范名称。
其他名称列表中的每个名称只能出现一次。
otherNames如果存在，则必须是一个数组。
维基数据项 ID 必须是正整数，或者以Q开头、以十进位数字结尾的字符串。

Module:languages 使用的数据必须满足以下条件：

Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
If field 2 is not nil, it must a valid Wikidata item ID.
If field 3 or family is given and not nil, it must be a valid family code.
If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
If family is given, it must be a valid family code.
If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
If standardChars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
If link_tr is present, it must be true.
Have no data keys besides these: 1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr".

未执行的检查：

If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

此模块没有检查这些项目，因为如果不满足以上条件，模块错误将很快出现在条目中（如Module:utilities 尝试为与该语言相关的分类生成排序键，或者full_link尝试使用音译模块等）。

Module:languages/code to canonical name 和 Module:languages/canonical names 必须包含且仅应包含 Module:languages 的数据子模块中的所有代码和规范名称。

Module:etymology languages 使用的数据必须满足以下条件：

必须给出 canonicalName。
parent必须给出，且必须是有效的语言、语言系属或词源语言的代码。
If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Module:families 中的代码数据必须：

Have canonicalName, which must not be the same as the canonical name of another family.
If family is given, it must be a valid family code.
Have at least one language or subfamily belonging to it.
Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Module:scripts 中的代码数据必须：

Have canonicalName.
Have at least one language that lists it as one of its scripts.
Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".