Language Codes
Code
HTML code
xml:lang="fr"
xml:lang="en-ca"
etc.
<i xml:lang="fr">Bonjour</i>
<span xml:lang="de">Gutentag</span>
Script shortcut
The script will switch out the properly formatted code for shortcuts formatted like this in classes:
lang:fr
lang:en-ca
<i class="lang:fr">Bonjour</i>
<span class="lang:de">Guten tag</span>
Codes and Subtags
The languages codes are referred to as subtags.
- The official list: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
- Wikipedia table https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes
- A Lookup Tool: https://r12a.github.io/app-subtags/
- the official Onix list: https://ns.editeur.org/onix/en/74
Tag explanations
A great article: https://www.w3.org/International/articles/language-tags/
Language Tags can have several components:
Major languages are represented by a two or three letter code (primary language subtag) e.g. English: en; French: fr; Cree: cr; Albanian: sq; Latin: la (yes, some latin should be tagged) etc.
There can also be an extended subtag region attached e.g. Canadian French: fr-CA vs Luxembourg French: fr-LU Some region tags: https://www.andiamo.co.uk/resources/iso-language-codes/
Alternatively there can be an extended subtag for scripts e.g. Serbian in Cyrillic script: sr-Cyrl vs Serbian in Latin script: sr-Latn (Note: you would normally not use the Latn subtag in a book that was mainly in latin script)
Examples of language tags including extlang subtags are:
zh-yue (Cantonese Chinese)
ar-afb (Gulf Arabic)
Made up languages
Can be tagged:
Klingon: tlh;
Esperanto: eo;
If it is a one-off made up langage it is preceded by an x: x-arcturan
Non-linguistic
Use the subtag zxx when the text is known to be not in any language.
xml:lang="zxx"
<p>Here is a list of part numbers: <span xml:lang="zxx">9RUI34 8XOS12 3TYY85</span>.</p>
Undetermined
xml:lang="und" However you should only tag text as undetermined if you can't just leave it as is. In practice, this means you should only use this markup if the undetermined text is embedded in content that has already been labeled for language in some way. e.g.
<i xml:lang="fr">Mon frére est tout <span xml:lang="und">kazplumed</span>.</i>
Common European languages
- French:
fr - German:
de - Italian:
it - Spanish:
es - Japanese:
ja - Yiddish:
yi - Russian :
ru
Indigenous languages
A listing of some North American indigenous languages.
Be sure to use Tool below to check if there are any sub languages e.g. sal (Salishan) has both slh (southern Puget Sound Salish) and str (Straits Salish)
West Coast Languages
https://en.wikipedia.org/wiki/Salishan_languages
- Chinook jargon (Chinuk Wawa):
chn - Gwichya Gwich’in:
gwi - Haida:
hai - Hul’q’umi’num’:
hur - Kutenai (Kootenai):
kut - Kwakʼwala:
kwk - Salishan languages (Salish):
sal- Southern Puget Sound Salish:
slh - Straits Salish:
str
- Southern Puget Sound Salish:
- Squamish
squ - Tlingit:
tli - Tsimshian:
tsi- includes Sm’algya̱x
- Wakashan languages:
wak
Other North American languages
- Athapascan languages:
ath - Blackfoot/Siksika:
bla - Chipewyan:
chp - Cree:
cre- Plains Cree:
crk
- Plains Cree:
- Dakota (Sioux):
dak - Delaware (Munsee):
del - Dogrib (Tli Cho):
dgr - Inuktitut:
iu - Inupiaq:
ik - Lakota:
lkt - Michif:
crg - Micmac; Mi'kmaq:
mic - Mohawk:
moh - Nēhiyawēwin:
crk - Ojibwa (Anishinaabemowin):
oj - Paiute:
pao - Slave (Athapascan):
den - Shoshone (Eastern):
shh
— source: https://www.nationsonline.org/oneworld/language_code.htm
Spellings
- you can represent Kwaka̱ka̱wakw as
Kwaka̱ka̱wakwor maybeKwakwa̱ka̱ʼwakw