S In Different Languages

 Posted admin

Mother in Different Languages. Find out how the sweetest word ‘Mother’ is said in different languages around the world!! In case, you know of a way of saying mother in language we missed out please help us enhance this page by writing to us.

Apps include resources that can be specific to a particular culture.For example, an app can include culture-specific strings that are translated tothe language of the current locale. It's a good practice to keepculture-specific resources separated from the rest of your app. Android resolveslanguage- and culture-specific resources based on the system locale setting. Youcan provide support for different locales by using the resources directory inyour Android project.

You can specify resources tailored to the culture of the people whouse your app. You can provide any resource type that isappropriate for the language and culture of your users. For example, thefollowing screenshot shows an app displaying string and drawable resources inthe device's default (en_US) locale and the Spanish(es_ES) locale.

Figure 1. App using different resources depending on the current locale

If you created your project using the Android SDKTools (read Creating anAndroid Project), the tools create a res/ directory in the top level ofthe project. Within this res/ directory are subdirectories for various resourcetypes. There are also a few default files such as res/values/strings.xml, which holdsyour string values.

Supporting different languages goes beyond using locale-specific resources.Some users choose a language that uses right-to-left (RTL) scripts, such asArabic or Hebrew, for their UI locale. Other users view or generate content in alanguage that uses RTL scripts, even though they've set a language that uses LTRscripts, such as English, as their UI locale. To support both types of users,your app needs to do the following:

  • Employ an RTL UI layout for RTL locales.
  • Detect and declare the direction of text data that's displayed inside formatted messages. Usually, you can just call a method that determines the direction of text data for you.

Create locale directories and resource files

To add support for more locales, create additional directories insideres/. Each directory's name should adhere to the following format:

For example, values-b+es/ contains stringresources for locales with the language code es. Similarly,mipmap-b+es+ES/ contains icons for locales with the eslanguage code and the ES country code.Android loads the appropriate resources according to the locale settings of thedevice at runtime. For more information, seeProviding Alternative Resources.

After you’ve decided on the locales to support, create the resource subdirectories andfiles. For example:

For example, the following are some different resource files for different languages:

English strings (default locale), /values/strings.xml:

Spanish strings (es locale), /values-es/strings.xml:

United States' flag icon (default locale),/mipmap/country_flag.png:

Figure 2. Icon used for the default (en_US) locale

Spain's flag icon (es_ES locale),/mipmap-b+es+ES/country_flag.png:

Figure 3. Icon used for the es_ES locale

Note: You can use the locale qualifier (or anyconfiguration qualifier) on any resource type, such as if you want to providelocalized versions of your bitmap drawable. For more information, see Localization.

Use the resources in your app

You can reference the resources in your source code and other XML files usingeach resource's name attribute.

In your source code, you can refer to a resource using the syntaxR.<resource type>.<resource name>. There are a varietyof methods that accept a resource this way.

For example:

In other XML files, you can refer to a resource with the syntax@<resource type>/<resource name> whenever the XML attribute accepts a compatible value.

For example:

Format text in messages

One of the most common tasks in an app is formatting text. Localized messagesget formatted by inserting text and numeric data into the appropriate positions.Unfortunately, when dealing with an RTL UI or RTL data, simple formatting candisplay incorrect or even unreadable text output.

Languages such as Arabic, Hebrew, Persian, and Urdu are written in an RTLdirection overall. Some of their elements, however, such as numbers and embeddedLTR text, are written in the LTR direction within the otherwise RTL text.Languages that use LTR scripts, including English, are also bidirectionalbecause they can contain embedded RTL scripts that need to be displayed in anRTL direction.

Most of the time, it's the apps themselves that generate such instances ofembedded opposite-direction text. They insert text data of an arbitrarylanguage—and an arbitrary text direction—into localized messages.This mixing of directions often doesn't include a clear indication of whereopposite-direction text starts and ends. These characteristics of app-generatedtext cause most of the problems.

Although the system's default handling of bidirectional text usually renderstext as expected, it's possible that text won't render properly when your appinserts it into a localized message. The following situations present examplesof cases where it's more likely that text won't appear correctly:

  • Inserted at the very start of the message:

    PERSON_NAMEis calling you

  • Starts with a number, such as in addresses or telephone numbers:

    987 654-3210

  • Starts with punctuation, such as in a phone number:

    +19876543210

  • Ends with punctuation:

    Are you sure?

  • Contains both directions already:

    The word בננה is Hebrew for banana.

Example

For example, assume that an app sometimes needs to display the message 'Didyou mean %s?', with an address inserted in place of the %s at runtime. Becausethe app supports different UI locales, the message comes from a locale-specificresource and uses the RTL direction when an RTL locale is in use. For a HebrewUI, it should appear as follows:

האם התכוונת ל %s?

The suggestion, however, might come from a database that doesn't include textin the locale's language. For example, if the address in question is for a placein California, it appears in the database using English text. If you insert theaddress '15 Bay Street, Laurel, CA' into the RTL message without providing anyhints regarding text direction, the result isn't expected or correct:

האם התכוונת ל 15 Bay Street, Laurel, CA?

Note that the house number appears to the right of the address, not to theleft as intended, and makes the house number look more like a strange postalcode. The same problem may occur if you include RTL text within a message thatuses the LTR text direction.

Explanation and solution

The problem in previous example occurs because the text formatter doesn'tspecify that '15' is part of the address, so the system cannot determine whetherthe '15' is part of the RTL text that comes before it or the LTR text that comesafter it.

To solve this problem, use the unicodeWrap() method, found in the BidiFormatter class, onevery piece of text that you insert into a localized message.The only times when you shouldn't useunicodeWrap() include thefollowing:

  • The text is being inserted into a machine-readable string, such as a URI or a SQL query.
  • You already know that the piece of text is properly wrapped.

The unicodeWrap() methoddetects the direction of a string and wraps it in Unicode formatting charactersthat declare that direction. Because the '15' now appears inside text that isdeclared as LTR, it's displayed in the correct position:

האם התכוונת ל 15 Bay Street, Laurel, CA?

The following code snippet demonstrates how to useunicodeWrap():

Note: If your app targets Android 4.3 (API level 18) orhigher, use the version of BidiFormatter found in theAndroid Framework. Otherwise, use the version ofBidiFormatter found in the Support Library.

Format numbers

Useformatstrings, not method calls, to convert numbers to strings in your app'slogic:

This will format the numbers appropriately for your locale, which mayinclude using a different set of digits.

When you useString.format() to create aSQL query on a device whose locale uses its own set of digits, such as Persianand most Arabic locales, problems occur if any of the parameters to the queryare numbers. This is because the number is formatted in the locale's digits, andthese digits are invalid in SQL.

To preserve ASCII-formatted numbers and keep the SQL query valid, you shouldinstead use the overloaded version ofString.format() thatincludes a locale as the first parameter. The locale argument should beLocale.US.

Support layout mirroring

S Names In Different Languages

People who use RTL scripts prefer an RTL user interface, which includesright-aligned menus, right-aligned text, and forward arrows pointing to theleft.

Figure 4 shows the contrast between the LTR version of a screen within theSettings app and its RTL counterpart:

When adding RTL support to your app, it's particularly important to keep thefollowing points in mind:

  • RTL text mirroring is only supported in apps when used on devices running Android 4.2 (API level 17) or higher. To learn how to support text mirroring on older devices, see Provide support for legacy apps.
  • To test whether your app supports an RTL text direction, test using developer options and invite people who use RTL scripts to use your app.

Note: To view additional design guidelines related tolayout mirroring, including a list of elements that you should and shouldn'tmirror, see theBidirectionalitymaterial design guidelines.

To mirror the UI layout in your app so that it appears RTL in an RTL locale,complete the steps in the following sections.

Modify the build and manifest files

Modify your app module's build.gradle file and app manifest fileas follows:

build.gradle (Module: app)

AndroidManifest.xml

Note: If your app targets Android 4.1.1 (API level 16) orlower, the android:supportsRtl attribute is ignored, along with anystart and end attribute values that appear in yourapp's layout files. In this case, RTL layout mirroring doesn't happenautomatically in your app.

Update existing resources

Convert left and right to start andend, respectively, in each of your existing layout resource files.By doing this, you allow the framework to align your app's UI elements based onthe user's language settings.

Note: Before updating your resources, learn how toprovide support for legacy apps, orapps that target Android 4.1.1 (API level 16) and lower.

To use the framework's RTL alignment capabilities, change the attributes inyour layout files that appear in Table 1.

Table 1. Attributes touse when your app supports multiple text directions

Attribute supporting LTR onlyAttribute supporting LTR and RTL
android:gravity='left'android:gravity='start'
android:gravity='right'android:gravity='end'
android:layout_gravity='left'android:layout_gravity='start'
android:layout_gravity='right'android:layout_gravity='end'
android:paddingLeftandroid:paddingStart
android:paddingRightandroid:paddingEnd
android:drawableLeftandroid:drawableStart
android:drawableRightandroid:drawableEnd
android:layout_alignLeftandroid:layout_alignStart
android:layout_alignRightandroid:layout_alignEnd
android:layout_marginLeftandroid:layout_marginStart
android:layout_marginRightandroid:layout_marginEnd
android:layout_alignParentLeftandroid:layout_alignParentStart
android:layout_alignParentRightandroid:layout_alignParentEnd
android:layout_toLeftOfandroid:layout_toStartOf
android:layout_toRightOfandroid:layout_toEndOf

Table 2 shows how the system handles UI alignment attributes based on thetarget SDK version, whether left and right attributesare defined, and whether start and end attributes aredefined.

Table 2. UI element alignment behavior basedon the target SDK version and defined attributes

Targeting Android 4.2
(API level 17) or higher?
Left and right defined?Start and end defined?Result
YesYesYesstart and end resolved, and override left and right
YesYesNoOnly left and right are used
YesNoYesOnly start and end are used
NoYesYesleft and right are used (start and end are ignored)
NoYesNoOnly left and right are used
NoNoYesstart and end resolved to left and right

Add direction- and language-specific resources

This step involves adding specific versions of your layout, drawables, andvalues resource files that contain customized values for different languagesand text directions.

In Android 4.2 (API level 17) and higher, you can use the -ldrtl(layout-direction-right-to-left) and -ldltr(layout-direction-left-to-right) resource qualifiers. To maintain backwardcompatibility with loading existing resources, older versions of Android use aresource's language qualifiers to infer the correct text direction.

Suppose that you want to add a specific layout file to support RTL scripts,such as the Hebrew, Arabic, and Persian languages. To do this, you add alayout-ldrtl/ directory in your res/ directory, asshown in the following example:

If you want to add a specific version of the layout that is designed for onlyArabic text, your directory structure becomes the following:

Note: Language-specific resources take precedence overlayout-direction-specific resources, which take precedence over the defaultresources.

Use supported widgets

As of Android 4.2 (API level 17), most framework UI elements support the RTLtext direction automatically. However, several framework elements, such asViewPager, don't support the RTL textdirection.

Home-screen widgets support the RTL text direction as long as theircorresponding manifest files include the attribute assignmentandroid:supportsRtl='true'.

Provide support for legacy apps

If your app targets Android 4.1.1 (API level 16) or lower, also includeleft and right attributes, in addition tostart and end.

To check whether your layout should use the RTL text direction, use thefollowing logic:

Note: To avoid compatibility issues, use version 23.0.1or higher of theAndroidSDK Build Tools.

Test using developer options

On devices running Android 4.4 (API level 19) or higher, you can enableForce RTL layout direction in theon-device developer options. This settingallows you to see text that uses LTR scripts, such as English text, in RTLmode.

Update app logic

This section describes specific places in your app's logic that you shouldupdate when adapting your app for handling multiple text directions.

Property changes

To handle a change in any RTL-related property—such as layoutdirection, layout parameters, padding, text direction, text alignment, ordrawable positioning—you can use theonRtlPropertiesChanged()callback. This callback allows you to get the current layout direction andupdate an activity's View objects accordingly.

Views

If you are creating a UI widget that is not directly part of an activity'sview hierarchy, such as a dialog or a toast-like UI element, set the correctlayout direction depending on the context. The following code snippetdemonstrates how to complete this process:

Several methods of the View class require additionalconsideration:

onMeasure()
View measurements might vary depending on text direction.
onLayout()
If you create your own layout implementation, then you'll need to call super() in your version of onLayout() and adapt your custom logic to support RTL scripts.
onDraw()
If you're implementing a custom view or adding advanced functionality to a drawing, you'll need to update your code to support RTL scripts. Use the following code to determine whether your widget is in RTL mode:

Drawables

If you have a drawable that needs to be mirrored for an RTL layout, completeone of these steps based on the version of Android running on the device:

  • On devices running Android 4.3 (API level 18) and lower, you need to add and define the -ldrtl resource files.
  • On Android 4.4 (API level 19) and higher, you can use android:autoMirrored='true' when defining your drawable, which allows the system to handle RTL layout mirroring for you.

    Note: The android:autoMirrored attribute only works for simple drawables whose bidirectional mirroring is simply a graphical mirroring of the entire drawable. If your drawable contains multiple elements, or if reflecting your drawable would change its interpretation, you should perform the mirroring yourself. Whenever possible, check with a bidirectional expert to determine whether your mirrored drawables make sense to users.

Gravity

If your app's code is using Gravity.LEFT orGravity.RIGHT, you will need to change thesevalues to Gravity.START andGravity.END, respectively.

For example, if you're using the following code:

..you need to change it to the following:

This means that you can keep your existing code that handles left-aligned andright-aligned values, even if you are using start andend for your gravity values.

Note: When applying your gravity settings, use anoverloaded version of Gravity.apply() that includes alayoutDirection argument.

Margin and padding

To support RTL scripts in your app, follow these best practices related tomargin and padding values:

  • Use getMarginStart() and getMarginEnd() instead of the direction-specific attribute equivalents, leftMargin and rightMargin.
  • When using setMargins(), swap the values of the left and right arguments if your app detects RTL scripts.
  • If your app includes custom padding logic, override setPadding() and setPaddingRelative().

See also

The confusion of tongues at the building of the Tower of Babel

This list is a Language recognition chart. It describes a variety of simple clues one can use to determine what language a document is written in with high accuracy.

  • 2Latin alphabet (possibly extended)
    • 2.1Romance languages
    • 2.2Germanic languages
    • 2.3Baltic languages
    • 2.4Slavic languages
      • 2.4.5Serbian (Srpski/Српски)
    • 2.5Celtic languages
    • 2.8Iranian languages
    • 2.9Finno-Ugric languages
    • 2.10Eskimo–Aleut languages
    • 2.11Southern Athabaskan languages
    • 2.15Vietnamese (tiếng Việt)
    • 2.16Chinese, Romanized
      • 2.16.1Standard Mandarin (現代標準漢語)
    • 2.17Austronesian languages
    • 2.18Turkic languages
      • 2.18.1Turkish (Türkçe/Türkiye_Türkçesi)
  • 3Chinese (中文)
  • 7Greek (Ελληνικά)
    • 7.5Greek in Greeklish
  • 10Slavic languages using the Cyrillic alphabet
  • 11Arabic alphabet
  • 12Syriac Alphabet
  • 13Dravidian languages
  • 14Bengali
  • 15Canadian Aboriginal syllabics
  • 16Other North American syllabics
  • 17Artificial languages

Characters[edit]

The language of a foreign text can often be identified by looking up characters specific to that language.

  • ABCDEFGHIJKLMNOPQRSTUVWXYZ (Latin alphabet)
    • and no other – English, Indonesian, Latin, Malay, Swahili, Zulu
    • àéëïij – Dutch (Except for the ligature ij, these letters are very rare in Dutch. Even fairly long Dutch texts often have no diacritics.)
    • áêéèëïíîôóúû Afrikaans
    • êôúû – West Frisian
    • ÆØÅæøå – Danish, Norwegian
    • single diacritics, mostly umlauts
      • ÄÖäö – Finnish (BCDFGQWXZÅbcfgqwxzå only found in names and loanwords, occasionally also ŠšŽž)
      • ÅÄÖåäö – Swedish (occasionally é)
      • ÄÖÕÜäöõü – Estonian
      • ÄÖÜäöüß – German
    • Circumflexes
      • ÇÊÎŞÛçêîşû – Kurdish
      • ĂÎÂŞŢăîâşţ – Romanian
      • ÂÊÎÔÛŴŶÁÉÍÏâêîôûŵŷáéíï – Welsh; (ÓÚẂÝÀÈÌÒÙẀỲÄËÖÜẄŸóúẃýàèìòùẁỳäëöüẅÿ used also but much less commonly)
      • ĈĜĤĴŜŬĉĝĥĵŝŭ – Esperanto
    • Three or more types of diacritics
      • ÇĞİÖŞÜğçıöşü – Turkish
      • ÁÐÉÍÓÚÝÞÆÖáðéíóúýþæö – Icelandic
      • ÁÉÍÓÖŐÚÜŰáéíóöőúüű – Hungarian
      • ÀÇÉÈÍÓÒÚÜÏàçéèíóòúüï· – Catalan
      • ÀÂÇÉÈÊËÎÏÔŒÙÛÜŸàâçéèêëîïôœùûüÿ – French; (diacritics on uppercase characters are often optional; Ÿ and ÿ are found only in certain proper names)
      • ÁÀÇÉÈÍÓÒÚËÜÏáàçéèíóòúëüï (· only in Gascon dialect) – Occitan
      • ÁÉÍÓÚÂÊÔÀãõçáéíóúâêôà (ü Brazilian and k, w and y not in native words) – Portuguese
    • ÁÉÍÑÓÚÜáéíñóúü ¡¿ – Spanish
    • ÀÉÈÌÒÙàéèìòù – Italian
    • ÁÉÍÓÚÝÃẼĨÕŨỸÑG̃áéíóúýãẽĩõũỹñg̃ - Guarani (the only language to use g̃)
    • ÁĄĄ́ÉĘĘ́ÍĮĮ́ŁŃ áąą́éęę́íįį́łń (FQRVfqrv not in native words) – Southern Athabaskan languages
      • ’ÓǪǪ́ āą̄ēę̄īį̄óōǫǫ́ǭúū – Western Apache
      • 'ÓǪǪ́ óǫǫ́ – Navajo
      • ’ÚŲŲ́ úųų́ – Chiricahua/Mescalero
    • ąłńóż Lechitic languages
      • ćęśź Polish
      • ćśůź Silesian
      • ãéëòôù Kashubian
    • A, Ą, Ã, B, C, D, E, É, Ë, F, G, H, I, J, K, L, Ł, M, N, Ń, O, Ò, Ó, Ô, P, R, S, T, U, Ù, W, Y, Z, Ż – Kashubian
    • ČŠŽ
      • and no other – Slovene
      • ĆĐ – Bosnian, Croatian, SerbianLatin
      • ÁĎÉĚŇÓŘŤÚŮÝáďéěňóřťúůý – Czech
      • ÁÄĎÉÍĽĹŇÓÔŔŤÚÝáäďéíľĺňóôŕťúý – Slovak
      • ĀĒĢĪĶĻŅŌŖŪāēģīķļņōŗū – Latvian; (ŌŖ and ōŗ no longer used in most modern day Latvian)
      • ĄĘĖĮŲŪąęėįųū – Lithuanian
    • ĐÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬÈẺẼÉẸÊỀỂỄẾỆÌỈĨÍỊÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢÙỦŨÚỤƯỪỬỮỨỰỲỶỸÝỴ đàảãáạăằẳẵắặâầẩẫấậèẻẽéẹêềểễếệìỉĩíịòỏõóọồổỗốơờởỡớợùủũúụưừửữứựỳỷỹýỵ – Vietnamese
      • ꞗĕŏŭo᷄ơ᷄u᷄ – Middle Vietnamese
    • ā ē ī ō ū – May be seen in some Japanese texts in Rōmaji or transcriptions (see below) or Hawaiian and Māori texts.
    • é – Sundanese
    • ñ - Basque
  • ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ي Arabic script
    • Arabic, Malay (Jawi), Kurdish (Soranî), Panjabi / Punjabi, Pashto, Sindhi, Urdu, others.
    • پ چ ژ گ – Persian (Farsi)
  • Brahmic family of scripts
    • Bengali script
      • অ আ কা কি কী উ কু ঊ কূ ঋ কৃ এ কে ঐ কৈ ও কো ঔ কৌ ক্ কত্‍ কং কঃ কঁ ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ৰ ল ৱ শ ষ স হ য় ড় ঢ় ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯
      • used to write Bengali and Assamese.
    • Devanāgarī
      • अ प आ पा इ पि ई पी उ पु ऊ पू ऋ पृ ॠ पॄ ऌ पॢ ॡ पॣ ऍ पॅ ऎ पॆ ए पे ऐ पै ऑ पॉ ऒ पॊ ओ पो औ पौ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल ळ व श ष स ह ० १ २ ३ ४ ५ ६ ७ ८ ९ प् पँ पं पः प़ पऽ
      • used to write, either along with other scripts or exclusively, several Indian languages including Sanskrit, Hindi, Maithili, MagahiMarathi, Kashmiri, Sindhi, Bhili, Konkani, Bhojpuri and Nepali from Nepal.
    • Gurmukhi
      • ਅਆਇਈਉਊਏਐਓਔਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਲ਼ਵਸ਼ਸਹ
      • primarily used to write Punjabi as well as Braj Bhasha, Khariboli (and other Hindustani dialects), Sanskrit and Sindhi.
    • Gujarati script
      • અ આ ઇ ઈ ઉ ઊ ઋ ઌ ઍ એ ઐ ઑ ઓ ઔ ક ખ ગ ઘ ઙ ચ છ જ ઝ ઞ ટ ઠ ડ ઢ ણ ત થ દ ધ ન પ ફ બ ભ મ ય ર લ ળ વ શ ષ સ હ ૠ ૡૢૣ
      • used to write Gujarati and Kachchi
    • Tibetan script
      • ཀ ཁ ག ང ཅ ཆ ཇ ཉ ཏ ཐ ད ན པ ཕ བ མ ཙ ཚ ཛ ཝ ཞ ཟ འ ཡ ར ལ ཤ ས ཧ ཨ
      • used to write Standard Tibetan, Dzongkha (Bhutanese), and Sikkimese
  • АБВГДЕЖЗИКЛМНОПРСТУФХЦЧШ (Cyrillic alphabet)
    • ЙЩЬЮЯ
      • Ъ – Bulgarian
      • ЁЫЭ
        • Ў, no Щ, І instead of И (Ґ in some variants) – Belarusian
        • rarely Ъ – Russian
      • ҐЄІЇ – Ukrainian
    • ЉЊЏ, Ј instead of Й (Vuk Karadžić's reform)
      • ЃЌЅ – Macedonian
      • ЋЂ – Serbian
    • ЄꙂꙀЗІЇꙈОуꙊѠЩЪꙐЬѢЮꙖѤѦѨѪѬѮѰѲѴҀ – Old Church Slavonic, Church Slavonic
    • Ӂ – Romanian in Transnistria (elsewhere in Latin)
  • ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ αβγδεζηθικλμνξοπρσςτυφχψω (Greek Alphabet) – Greek
  • אבגדהוזחטיכלמנסעפצקרשת (Hebrew alphabet)
    • and maybe some odd dots and lines above, below, or inside characters – Hebrew
    • פֿ; dots/lines below letters appearing only with א,י, and ו – Yiddish
    • no dots or lines around the letters, and more than a few words end with א (i.e., they have it at the leftmost position) – Aramaic
  • 漢字文化圈 – Some East Asian Languages
    • and no other – Chinese
    • with あいうえおの Hiragana and/or アイウエオノ Katakana – Japanese
  • 위키백과에 (note commonplace ellipses and circles) Korean
  • ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ etc. -- ㄓㄨˋㄧㄣㄈㄨˊㄏㄠˋ (Bopomofo)
    • ㄪㄫㄬ -- not Mandarin
  • កខគឃងចឆជឈញដឋឌឍណតថទធនបផពភមសហយរលឡអវអ្កអ្ខអ្គអ្ឃអ្ងអ្ចអ្ឆអ្ឈអ្ញអ្ឌអ្ឋអ្ឌអ្ឃអ្ណអ្តអ្ថអ្ទអ្ធអ្នអ្បអ្ផអ្ពអ្ភអ្មអ្សអ្ហអ្យអ្រអ្យអ្លអ្អអ្វ អក្សរខ្មែរ (Khmer alphabet) - Khmer
  • Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ (Armenian alphabet) – Armenian
  • ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ (Georgian alphabet) – Georgian
  • AEIOUHKLMNPW' Hawaiian alphabet - Hawaiian

Latin alphabet (possibly extended)[edit]

Romance languages[edit]

Lots of Latin roots.

French (le français)[edit]

  • Accented letters: â ç è é ê î ô û, rarely ë ï ; ù only in the word , à only in the word à and at end of words ; never á í ì ó ò ú
  • Angle quotation marks: « » (though 'curly-Q' quotation marks are also used); dialogue traditionally indicated by means of dashes
  • Many apostrophised contractions, i.e. words beginning with l' or d', less often c', j', m', n', s', t' — only before vowels and h
  • Common words: de, la, le, du, des, il, et;
  • Letter w is rare and used only in loanwords (e.g whisky).
  • Ligatures œ and æ are conventional
  • Words ending in -ux, especially -aux or -eux;

Jersey Norman / Jèrriais (Jèrriais)[edit]

  • Common words: , , tchi, ès, i', ch'
  • Tch, dg, th and în are common character combinations. ou is frequently followed by another vowel.
  • Many apostrophised short forms, e.g. words beginning with l', d' or r'. é frequently alternates with an apostrophe e.g. c'mîn/quémîn.

Spanish (Español)[edit]

  • Characters: ¿ ¡ (inverted question and exclamation marks), ñ
  • All vowels (á, é, í, ó, ú) may take an acute accent
  • The letter u can take a diaeresis (ü), but only after the letter g
  • Some words frequently used: de, el, del, los, la(s), uno(s), una(s), y
  • No apostrophised contractions
  • Word beginnings: ll- (check not Welsh)
  • Word endings: -o, -a, -ción, -miento, -dad
  • Angle quotation marks: « » (though 'curly-Q' quotation marks are also used); dialogue often indicated by means of dashes

Italian (Italiano)[edit]

  • Almost every word ends in a vowel. Exceptions include non, il, per, con, del.
  • Common one-letter word: è.
  • Common word: perché.
  • Letter sequences: gli, gn, sci.
  • Letters j, k, w, x and y are rare and used only in loanwords (e.g. whisky).
  • Word endings: -o, -a, -zione, -mento, -tà, -aggio.
  • Grave accent (e.g., on à) almost always occurs in the last letter of words.
  • Double consonants (tt, zz, cc, ss, bb, pp, ll, etc.) are frequent.

Catalan (Català)[edit]

  • Character combination l·l and tz
  • Letter sequences: tx (check not Basque) and tg
  • Letters k and w are rare and only used in loanwords (e.g. walkman)
  • Word endings: -o, -a, -es, ció, -tat
  • Word beginning: ll-

Romanian (Română)[edit]

  • Characters: ă â î ș ț
  • Common words: și, de, la, a, ai, ale, alor, cu
  • Word endings: -a, -ă, -u, -ul, -ului, -ţie (or -ţiune), -ment, -tate; names ending in -escu
  • Double and triple i: copii, copiii
  • Note that Romanian is sometimes written online with no diacritics, making it harder to identify. A cedilla is sometimes used on S (ş) and on T (ţ) instead of the correct diacritic, the comma (above).

Portuguese (Português)[edit]

  • Characters: ã, õ, â, ê, ô, á, é, í, ó, ú, à, ç
  • Common one-letter words: a, à, e, é, o
  • Common two-letter words: ao, as, às, da, de, do, em, os, ou, um
  • Common three-letter words: aos, com, das, dos, ele, ela, mas, não, por, que, são, uma
  • Common endings: -ção, -dade, -ismo, -mente
  • Common digraphs: ch, nh, lh; examples: chave, galinha, baralho.
  • The letters k, w and y are rare. They are found mostly in loanwords, e.g.: keynesianismo, walkie-talkie, nylon.
  • Most singular words end in a vowel, l, m, r, or z.
  • Plural words end in -s.

Walloon (Walon)[edit]

  • Characters: å, é, è, ê, î, ô, û
  • Common digraphs and trigraphs: ai, ae, én, -jh-, tch, oe, -nn-, -nnm-, xh, ou
  • Common one-letter words: a, å, e, i, t', l', s', k'
  • Common two-letter words: al, ås, li, el, vs, ki, si, pô, pa, po, ni, èn, dj'
  • Common three-letter words: dji, nén, rén, bén, pol, mel
  • Common endings: -aedje, -mint, -xhmint, -ès, -ou, -owe, -yî, -åcion
  • Apostrophes are followed by a space (preferably non breaking one), eg: l' ome instead of l'ome.

Galician (Galego)[edit]

  • Similar to Portuguese; the indefinite article 'unha' (fem. plural), the suffix -çom and a heavier usage of the letter 'x' usually sign Galician.
  • Definite articles o or ó (masc. sing.), os (masc. plural), a (fem. sing.), as (fem. plural)
  • Common diagraphs: nh (ningunha)
  • The letters j, k, w and y are not in the alphabet, and appear only in loanwords

Germanic languages[edit]

English[edit]

  • words: a, an, and, in, of, on, the, that, to, is, I (should always be a capital)
  • letter sequences: th, ch, sh, ough, augh
  • word endings: -ing, -tion, -ed, -age, -s, -’s, -’ve, -n’t, -’d
  • diacritics or accents only in loanwords (piñata)

Dutch (Nederlands)[edit]

  • letter sequences ij (capitalized as IJ, and also found as a ligature, IJ or ij), ei, doubled vowels (but not ii), kw, sch, oei, ooi, and uw (especially eeuw, ieuw, auw, and ouw).
  • words: het, op, en, een, voor (and compounds of voor).
  • word endings: -tje, -sje, -ing, -en, -lijk,
  • at the start of words: z-, v-, ge-
  • t/m occasionally occurs between two points in time or between numbers (e.g. house numbers).

West Frisian (Frysk)[edit]

  • letter sequences: ij, ei, oa
  • words: yn

Afrikaans (Afrikaans)[edit]

  • Words: 'n, as, vir, nie.
  • Similar to Dutch, but:
    • the common Dutch letters c and z are rare and used only in loanwords (e.g. chalet);
    • the common Dutch vowel ij is not used; instead, i and y are used (e.g. -lik, sy);
    • the common Dutch word ending -en is rare, being replaced by -e.

German (Deutsch)[edit]

  • umlauts (ä, ö, ü), ess-zett (ß)
  • letter sequences: ch, sch, tsch, tz, ss,
  • common words: der, die, das, den, dem, des, er, sie, es, ist, ich, du, aber
  • common endings: -en, -er, -ern, -st, -ung, -chen, -tät
  • rare letters: x, y (except in loanwords)
  • letter c rarely used except in the sequences listed above and in loanwords
  • long compound words
  • a period (.) after ordinal numbers, e.g. 3. Oktober
  • many capitalised words in the middle of sentences.

Swedish (Svenska)[edit]

  • letters å, ä, ö, rarely é
  • common words: och, i, att, det, en, som, är, av, den,
  • long compound words
  • letter sequences: stj, sj, skj, tj, ck, än, and occasionally surnames ending in -qvist
  • no use of characters w, z except for foreign proper nouns and some loanwords but x is used, unlike Danish and Norwegian, which replace it with ks

Danish (Dansk)[edit]

  • letters æ, ø, å
  • common words: af, og, til, er, på, med, det, den;
  • common endings: -tion, -ing, -else, -hed;
  • long compound words;
  • no use of character q, w, x and z except for foreign proper nouns and some loanwords;
  • to distinguish from Norwegian: uses letter combination øj; frequent use of æ; spellings of borrowed foreign words are retained (in particular use of c), such as centralstation.

Norwegian (Norsk)[edit]

  • letters æ, ø, å
  • common words: av, ble, er, og, en, et, men, i, å, for, eller;
  • common endings: -sjon, -ing, -else, -het;
  • long compound words;
  • no use of character c, w, z and x except for foreign proper nouns and some loanwords;
  • two versions of the language: Bokmål (much closer to Danish) and Nynorsk – for example ikke, lørdag, Norge (Bokmål) vs. ikkje, laurdag, Noreg (Nynorsk); Nynorsk uses the word òg; printed materials almost always published in Bokmål only;
  • to distinguish from Danish: uses letter combination øy; less frequent use of æ; spellings of borrowed foreign words are ‘Norsified’ (in particular removing use of c), such as sentralstasjon.

Icelandic (Íslenska)[edit]

  • letters á, ð, é, í, ó, ú, ý, þ, æ, ö
  • common beginnings: fj-, gj-, hj-, hl-, hr-, hv-, kj-, and sj-,
  • common endings: -ar (especially -nar), -ir (especially -nir), -ur, -nn (especially -inn)
  • no use of character c, q, w, or z except for foreign proper nouns, some loanwords, and, in the case of z, older texts.

Faroese (Føroyskt)[edit]

  • letters á, ð, í, ó, ú, ý, æ, ø
  • letter combinations: ggj, oy, skt
  • to distinguish from Icelandic: does not use é or þ, uses ø instead of ö (occasionally rendered as ö on road signs, or even ő).

Baltic languages[edit]

Latvian (Latviešu)[edit]

  • uses diacritics: ā, č, ē, ģ, ī, ķ, ļ, ņ, ō, ŗ, š, ū, ž
  • does not have letters: q, w, x, y
  • no longer uses ō or ŗ in modern language
  • extremely rare doubling of vowels
  • rare doubling of consonants
  • a period (.) after ordinal numbers, e.g. 2005. gads
  • common words: ir, bija, tika, es, viņš

Lithuanian (Lietuvių)[edit]

  • visual abundance of letters ą, č, ę, ė, į, š, ų, ū, ž
  • does not have letters q, w, x
  • extremely rare doubling of vowels and consonants
  • many varying forms (usually endings) of the same word, e.g. namas, namo, namus, namams, etc.
  • generally long words (absence of articles and fewer prepositions in comparison to Germanic languages)
  • common words: ir, yra, kad, bet.

Slavic languages[edit]

Polish (Polski)[edit]

  • consonant clusters rz, sz, cz, prz, trz
  • includes: ą, ę, ć, ś, ł, ó, ż, ź
  • words w, z, k, we, i, na (several one-letter words)
  • words jest, się
  • words beginning with był, będzie, jest (forms of copulabyć, 'to be').

Czech (Čeština)[edit]

  • visual abundance of letters ž š ů ě ř
  • words je, v
  • to distinguish from Slovak: does not use ä, ľ, ĺ, ŕ or ô; ú only appears at the beginning of words.

Slovak (Slovenčina)[edit]

  • visual abundance of letters ž š č;
  • uses: ä, ľ, and ô and (very rarely) ĺ and ŕ;
  • typical suffixes: -cia, ;
  • to distinguish from Czech: does not use ě, ř or ů.

Croatian (Hrvatski)[edit]

  • similar to Serbian
  • letters-digraphs dž, lj, nj
  • does not have q, w, x, y
  • typical suffixes: -ti, -ći
  • special letters: č, ć, š, ž, đ
  • common words: a, i, u, je
  • to distinguish from Serbian: infixes-ije- and -je- are common, verbs ending in -irati, -iran

Serbian (Srpski/Српски)[edit]

Different Languages List

Serbian Latin[edit]
  • similar to Croatian
  • letters-digraphs dž, lj, nj (lj and nj are somewhat more common than dž, although not by much)
  • no q, w, x, y
  • typical verb suffixes -ti, -ći (infinitive is much less used than in Croatian)
  • foreign words might end in -tija, -ovan, -ovati, -uje
  • special letters: đ (rare), č, š (common), ć, ž (less common)
  • common words: a, i, u, je, jeste
  • future tense suffix -iće, -ićeš, -ićemo, -ićete (not found in Croatian)
  • infixes -ije- and -je- are very often in Serbian that is spoken in Bosnia and Herzegovina, Montenegro and Croatia (ijekavica), but it does not appear in Serbia because each of those infixes are substituted with -e- (ekavica).
Serbian Cyrillic[edit]
  • uses Џ, Ј, Љ, Њ, Ђ, Ћ
  • does not use Щ, Ъ, Ы, Ь, Э, Ю, Я, Ё, Є, Ґ, Ї, І, Ў
  • to distinguish from Macedonian: does not use Ѕ, Ѓ, Ќ

Celtic languages[edit]

Welsh (Cymraeg)[edit]

  • letters Ŵ, ŵ used in Welsh
  • words y, yr, yn, a, ac, i, o
  • letter sequences wy, ch, dd, ff, ll, mh, ngh, nh, ph, rh, th, si
  • letters not used: k, q, v, x, z
  • letter only used rarely, in loanwords: j
  • commonly accented letters: â, ê, î, ô, û, ŵ, ŷ, although acute (´), grave (`), and dieresis (¨) accents can hypothetically occur on all vowels
  • word endings: -ion, -au, -wr, -wyr
  • y is the most common letter in the language
  • w between consonants (w in fact represents a vowel in the Welsh language)
  • circumflex accent (^) is by far the commonest diacritical mark, although diacritics are often omitted altogether

Irish (Gaeilge)[edit]

  • vowels with acute accents: á é í ó ú
  • words beginning with letter sequences bp dt gc bhf
  • letter sequences sc cht
  • no use of the letter J, K, Q, V, W.
  • frequent bh, ch, dh, fh, gh, mh, th, sh
  • to distinguish from (Scottish) Gaelic: there may be words or names with the second (or even third) letter capitalized instead of the first: hÉireann.

Scottish Gaelic (Gàidhlig)[edit]

  • vowels with grave accents: à è ì ò ù (é and ó still occasionally seen but usage is now discouraged)
  • letter sequences sg chd
  • frequent bh, ch, dh, fh, gh, mh, th, sh
  • to distinguish from Irish: prefixes are hyphenated, so capitals in the middle of words generally do not occur: an t-Oban.

Albanian (Shqip)[edit]

  • unique letters: ë, ç.
  • ë is the most common letter in the language.
  • the letter w is not used except in loanwords.
  • dh, gj, ll, nj, rr, sh, th, xh, and zh are considered one letter instead of two.
  • common words: po, jo, dhe, i, të, me

Maltese (Malti)[edit]

  • unique letters: 'ċ', 'ġ', 'ħ', 'għ', 'ħ', 'ż'
  • semitic origin, fairly intelligible with Arabic
  • uses il-xxx for the definite article

Iranian languages[edit]

Kurdish (Kurdî / كوردی)[edit]

  • The word xwe (oneself, myself, yourself etc.) is highly specific (xw combination) and frequent.
  • Most frequent letter is ( I, i ) which equivalent to (Schwa).
  • Using circumflex ( ^ ): ê, î, û.
  • Using cedilla ( ¸ ): ç, ş.
  • Have eight vowels (a, e, ê, i, î, o, u, û) where impossible to find a word without any vowel.
  • Have lots of compound words.

Finno-Ugric languages[edit]

Finnish (Suomi)[edit]

  • distinct letters ä and ö; but never õ or ü (y takes the place of ü)
  • b, f, z, š and ž appear in loanwords and proper names only; the last two are substituted with sh or zh in some texts
  • c, q, w, x appear in (typically foreign) proper names only
  • outside of loanwords, d appears only between vowels or in hd
  • outside of loanwords, g only appears in ng
  • outside of loanwords, words do not begin with two consonants; this is reflected in the general syllable structure, where consonant clusters only occur across syllable boundaries, except in some loanwords
  • common words: sinä, on
  • common endings: -nen, -ka/-kä, -in, -t (plural suffix)
  • common vowel combinations: ai, uo, ei, ie, oi, , äi
  • unusually high degree of letter duplication, both vowels and consonants will be geminated, for example aa, ee, ii, kk, ll, ss, yy, ää
  • frequent long words

Estonian (Eesti)[edit]

  • distinct letters: õ, ä, ö and ü; but never ß or å
  • similar to Finnish, except:
    • letter y is not used, except in loanwords (ü is the corresponding vowel)
    • letters b and g (without preceding n) are found outside of loanwords
    • occasional use of š and ž, mainly in loanwords (plus combination )
    • loanwords more common generally than in Finnish, mainly loaned from German
    • words end in consonants more frequently than in Finnish, word-final b, d, v being particularly typical
    • letter d is much more common in Estonian than in Finnish, and in Estonian it is often the last letter of the word (plural suffix), which it never is in Finnish
    • double öö more common than in Finnish; other doubles can include õõ, üü, rarely hh (for German ch) and even šš
  • common words: ja, on, ei, ta, see, või.

Hungarian (Magyar)[edit]

  • letters ő and ű (double acute accent) unique to Hungarian
  • accented letters á and é frequent
  • letter combinations: cs, gy, ly, ny, sz, ty, zs (all classed as separate letters), leg‐, ‐obb (note: sz also common in Polish)
  • common words: a, az, ez, egy, és, van, hogy
  • letter k very frequent (plural suffix)

Eskimo–Aleut languages[edit]

Greenlandic[edit]

  • long polysynthetic words (a single word can number 30+ letters)
  • relatively abundant n, q (not necessarily followed by u), u
  • ubiquitous double consonants and vowels (aa, ii, qq, uu, more rarely ee, oo)
  • vowels a, i, u conspicuously more frequent than e, o (which are only found before q and r)
  • no diphthongs except occasional word-final ai, only consonant combinations besides double consonants and (n)ng consist of r + consonant
  • old spellings (now abolished in spelling reform) sometimes included acute accent, circumflex and/or tilde: Qânâq vs. Qaanaaq.

Southern Athabaskan languages[edit]

  • vowels with acute accent, ogonek (nasal hook), or both: á, ą, ą́
  • doubled vowels: aa, áá, ąą, ą́ą́
  • slashed l: ł (check not Polish!)
  • n with acute accent: ń
  • quotation mark: ' or ’
  • sequences: dl, tł, tł’, dz, ts’, ií, áa, aá
  • may have rather long words

Western Apache (Nnee biyáti’/Ndee biyáti’)[edit]

In addition to the above,

  • may use: u or ú
  • may use vowels with macron: ā ą̄
  • does not use ų

Navajo (Diné bizaad)[edit]

In addition to the above,

  • does not use u, ú, or ų

(Mescalero / Chiricahua) (Mashgaléń / Chidikáágo)[edit]

In addition to the above,

  • uses: u, ú, ų
  • does not use o, ó, or ǫ

Guaraní[edit]

  • lots of tildes over vowels (including y) and n
  • tilde over g: g̃—it's the only language in the world to use it. Example words: hagũa and g̃uahẽ.
  • b, d, and g usually do not occur without m or n before (mb, nd, ng) unless they're Spanish loan words.
  • f, l, q, w, x, z extremely rare outside loan words
  • does not use c without h: ch

Japanese in Romaji (Nihongo/日本語)[edit]

  • words: desu, aru, suru, esp. at end of sentences;
  • word endings: -masu, -masen, -shita;
  • letters: Japanese almost always alternates between a consonant and a vowel. Exceptions are digraphsshi and chi, fricativetsu, gemination (two of the same consonant in a row) and palatalization (a consonant followed by the letter y).
  • a macron or circumflex may be used to indicate doubled vowels, eg. Tōkyō
  • common words: no, o, wa, de, ni

(Note: Romaji is not often used in Japanese script. It is most often used for foreigners learning the pronunciation of the Japanese language.)

Hmong (Hmoob) written in Romanized Popular Alphabet[edit]

  • Almost all written words are quite short (one syllable).
  • Syllables (unless they are pronounced with mid tone) end in a tone letter: one of b s j v m g d, leading to apparent 'consonant clusters' such as -wj
  • w can be the main vowel of a syllable (e.g. tswv)
  • Syllables can begin with sequences such as hm-, ntxh-, nq-.
  • Syllables ending in double vowels (especially -oo, -ee) possibly followed by a tone letters (as in Hmoob 'Hmong').

Vietnamese (tiếng Việt)[edit]

  • Roman characters with more than one diacritical mark on the same vowel. See above.
  • Almost all written words are quite short (one syllable, mostly less than six characters long).
  • Words beginning with ng or ngh
  • Words ending with nh
  • common words: cái, không, có, ở, của, và, tại, với, để, đã, sẽ, đang, tôi, bạn, chúng, là

Vietnamese Quoted-Readable (VIQR)[edit]

  • The following characters (often in combination) after vowels: ^ ( + ' ` ? ~ .
  • DD, Dd, or dd
  • The following character before punctuation:

Vietnamese VNI encoding[edit]

  • The digits 1-8 after vowels
  • The digit 9 after a D or d
  • The following character before numbers:

Vietnamese Telex[edit]

  • The following characters after vowels: s f r x j
  • The following vowels, doubled up: a e o
  • The letter w after the following characters: a o u
  • DD, Dd, or dd

Chinese, Romanized[edit]

Standard Mandarin (現代標準漢語)[edit]

  • In general, Mandarin syllables end only in vowels or n, ng, r; never in p, t, k, m
Pinyin[edit]
  • Words beginning with x, q, zh
  • Tone marks on vowels, such as ā, á, ǎ, à
    • For convenience while using a computer, these are sometimes substituted with numbers, e.g. a1, a2, a3, a4
Wade–Giles[edit]
  • Words do not begin with b, d, g, z, q, x, r
  • Words beginning with hs
  • Many hyphenated words
  • Apostrophes after initial letters or digraphs, e.g. t'a, ch'i
Gwoyeu Romatzyh[edit]
  • Many unusual vowel combinations such as ae, eei, ii, iee, oou, yy, etc.
  • Insertion of r, e.g. arn, erng, etc.
  • Words ending in nn, nq

Standard Cantonese (粵語)[edit]

  • In general, Cantonese syllables can end in p, t, k, m, n, ng; never r
  • Double aa is common but double ee/ii/oo/uu is rare

Southern Min / Min-Nan (Bân-lâm-gí/Bân-lâm-gú) in Pe̍h-ōe-jī[edit]

  • Many hyphenated words.
  • Words can end in p, t, k, m, n, ng, h; never r
  • Roman characters with many diacritical marks on vowels. Unlike Vietnamese, each character has at most one such mark.
  • Unusual combining characters, namely · (middle dot, always after o) and (vertical bar). ¯ (macron) is also common.

Austronesian languages[edit]

Malay (bahasa Melayu) and Indonesian (bahasa Indonesia)[edit]

May contain the following:
Prefixes: me-, mem-, memper-, pe-, per-, di-, ke-
Suffixes: -kan, -an, -i
Others (these almost always written in lowercase): yang, dan, di, ke, oleh, itu

Malay and Indonesian are mutually intelligible to proficient speakers, although translators and interpreters will generally be specialists in one or other language. See Comparison of Standard Malay and Indonesian.

Frequent use of the letter 'a' (comparable to the frequency of the English 'e').

Turkic languages[edit]

Note that some Turkic languages like Azeri and Turkmen use a similar Latin alphabet (often Jaŋalif) and similar words, and might be confused with Turkish.Azeri has the letters Əə, Xx and Qq not present in the Turkish alphabet, and Türkmen has Ää, Žž, Ňň, Ýý and Ww.Latin Characters uniquely (or nearly uniquely) used for Turkic languages: Əə, Ŋŋ, Ɵɵ, Ьь, Ƣƣ, Ğğ, İ, and ı.All Turkic languages can form long words by adding multiple suffixes.

Turkish (Türkçe/Türkiye_Türkçesi)[edit]

Turkish Alphabet[edit]

Lowercase: a b c ç d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z

Uppercase: A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z

Online

Common words[edit]
  • bir — one, a
  • bu — this
  • ancak — but
  • oldu — was
  • şu — that
Misc.[edit]
  • Look for word endings. Tense changes in Turkish verbs are created by adding suffixes to the end of the verb. Pluralizations occur by adding -lar and -ler.
    • Common Tense Changes: -yor-mış-muş-sun
    • Possessivity/person: -im-un-ın-in-iz-dur-tır
    • Example: Yapmıştır, '[He] did it'; Yap is the verb stem meaning 'to do', -mış indicates the perfect tense, -tır indicates the third person (he/she/it).
    • Example: Adalar, 'Islands'; Ada is a noun meaning 'island', -lar makes it plural.)
    • Example: Evimiz, 'Our house'; Ev is a noun meaning 'house', -im indicates the first-person possessor, which -iz then makes plural.)

Azeri (Azərbaycanca)[edit]

Azeri can be easily recognized by the frequent use of ə. This letter is not used in any other officially recognized modern Latin alphabet. In addition, it uses the letters x and q, which are not used in Turkish.

  • Common words: , ki, ilə, bu, o, isə, görə, da,
  • Frequent use of diacritics: ç, ə, ğ, ı, İ, ö, ş, ü
  • Words ending in -lar, -lər, -ın, -in, -da, -də, -dan, -dən
  • Words never beginning with ğ or ı
  • Words rarely beginning with two or more consonants
  • Transliteration of foreign words and names, e.g. Audrey Hepburn = Odri Hepbern

Chinese (中文)[edit]

  • No spaces, except between punctuation marks and (sometimes) foreign words.
  • Arabic numerals (0-9) sometimes used
  • Punctuation:
    • Period 。(not .)
    • Serial comma 、(distinguished from the regular comma ,)
    • Ellipse …… (six dots)
  • No hiragana, katakana, or hangul
  • May be written vertically

Simplified Chinese (简体) vs Traditional Chinese (繁體)[edit]

Note: Many characters were not simplified. As a result, it is common for a short word or phrase to be identical between Simplified and Traditional, but it is rare for an entire sentence to be identical as well.

Common radicals different between Traditional and Simplified:

  • Simplified: 讠钅饣纟门(e.g. 语 银 饭 纪 问)
  • Traditional: 訁釒飠糹門(e.g. 語 銀 飯 紀 問)

Common characters different between Traditional and Simplified:

  • Simplified: 国 会 这 来 对 开 关 门 时 个 书 长 万 边 东 车 爱 儿
  • Traditional: 國 會 這 來 對 開 關 門 時 個 書 長 萬 邊 東 車 愛 兒

Standard written Chinese (based on Mandarin) vs written Vernacular Cantonese[edit]

Note: Cantonese-speakers live in Mainland China, Hong Kong,Taiwan and Macau, so written Cantonese can be written in either Simplified or Traditional characters.

Common characters in Vernacular Cantonese that do not occur in Mandarin (only characters that are the same between Traditional and Simplified are chosen here):

  • 嘅 咗 咁 嚟 啲 唔 佢 乜 嘢

Some of the above characters are not supported in all character encodings, so sometimes the 口 radical on the left is substituted with a 0 or o, e.g.

  • o既 0既

Japanese (日本語)[edit]

  • Katakana (カタカナ) and hiragana (ひらがな) characters mixed with kanji (漢字)
  • Few or no spaces
  • Arabic numerals (0-9) sometimes used
  • Punctuation:
    • Period 。
    • Comma 、(,also used)
    • Quotation marks 「」
  • Occasional small characters beside large ones, eg. しゃ りゅ しょ って シャ リュ ショ ッテ
  • Double tick marks (known as dakuon or handakuon) appearing at upper right of characters, eg. で が ず デ ガ ズ
  • Empty circles (maru) appearing at upper right of characters, eg. ぱ ぴ パ ぴ
  • Frequent characters: の を は が
  • May be written vertically

Korean (한국어/조선말)[edit]

  • Western-style punctuation marks
  • Western-style spacing
  • Hangul letters, e.g. ㅎ h, ㅇ ng, ㅂ b, etc.
  • Hangul letters used to form syllable blocks; e.g. ㅅ s + ㅓ eo + ㅇ ng = 성 seong
  • Circles and ellipses are commonplace in Hangul; are exceedingly rare in Chinese.
  • General appearance has relatively-uniform complexity, as contrasted with Chinese or Japanese.

[edit]

Khmer is written using the distinctive Khmer alphabet.

  • rarely uses spaces
  • Letters have a distinctively 'taller' shape than other Brahmic scripts.
  • Uses Khmer numerals in writing ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩.
  • Has 'clusters' of letters stuffed upon each other.
  • has 24 diacritics denoting syllable rhymes - ា ិ ី ឹ ឺ ុ ូ ួ ើ ឿ ៀ េ ែ ៃ េា ៅ ុំ ំ ាំ ះ ុះ េះ ោះ
  • use this as a full stop '។'

Greek (Ελληνικά)[edit]

Modern Greek is written with Greek alphabet in monotonic, polytonic or atonic, either according to Demotic (Mr. Triantafilidis) grammar or Katharevousa grammar. Some people write in Greeklish (Greek with Latin script) which is either Visual-based, orthographic or phonetic or just messed-up (mixed). The only official orthographic forms of Greek language are Monotonic and Polytonic.

Normal Modern Greek (Greek Monotonic)[edit]

  • words και, είναι;
  • Each multi-syllable word has one accent/tone mark (oxia): ά έ ή ί ό ύ ώ
  • The only other diacritic ever used is the tréma: ϊ/ΐ, ϋ/ΰ, etc.

Pre-1980s Greek (Greek Polytonic)[edit]

Katharevousa, Dimotiki (Triantafylidis' grammar)

  • Diacritics: ά, ᾶ, ἀ, ἁ, and combinations, also with other vowels.
  • Some texts, especially in Katharevousa, also have ὰ, ᾳ, in combination with other diacritics.

Ancient Greek[edit]

  • Diacritics: ά, ὰ, ᾶ, ἀ, ἁ, ᾳ, and combinations, also with other vowels; ῥ; tilde (ᾶ) often appears more like a rounded circumflex
  • some texts feature lunate sigma (looks like c) instead of σ/ς

Greek Atonic[edit]

  • Was common in some Greek media (television);
  • You will see Greek characters without accents/tones;
  • words: και, ειναι, αυτο.

Greek in Greeklish[edit]

  • Automated conversion software for Greeklish->Greek conversion exists. If you notice a Greeklish text it may be useful for the Greek el.wikipedia (after conversion).
  • Keep in mind: in Greeklish more than one character may be used for one letter. (example: th for Θ (theta)).

Orthographic Greeklish[edit]

  • words kai, einai.

Phonetic Greeklish[edit]

  • words ke, ine;
  • omega appears as o;
  • ei, oi appear as i;
  • ai appears as e.

Visual-based Greeklish[edit]

S In Different Languages
  • omega (Ω or ω) may appear as W or w;
  • epsilon (E) may appear as 3;
  • alpha (A) may appear as 4;
  • theta (Θ) may appear as 8;
  • upsilon (Y) may appear as /;
  • gamma (γ) may appear as y
  • More than one character may be used for one letter.

Messed-up (Mixed) Greeklish[edit]

  • words kai, eine;
  • combines principles of phonetic, visual-based and orthographic Greeklish according to writer's idiosyncrasy;
  • The most commonly used form of Greeklish.

Armenian (Հայերեն)[edit]

Armenian can be recognized by its unique 39-letter alphabet:

Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք ԵՎ(և) Օ Ֆ

Georgian (ქართული)[edit]

Georgian can be recognised by its unique alphabet (note some characters have fallen out of use).

ა ბ გ დ ე ვ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ

Slavic languages using the Cyrillic alphabet[edit]

Bolding denotes letters unique to the language

Belarusian (беларуская)[edit]

  • uses: ё, і, й, ў, ы, э, ’
  • features: шч used instead of щ
  • the only Cyrillic language not to feature и.

Bulgarian (български)[edit]

  • uses: ъ, щ, я, ю, й
  • words: със, в
  • features: ъ is used as a vowel; many words end in definite article –ът, –ят, –та, –то, –те

Macedonian (македонски)[edit]

  • uses: ј, љ, њ, џ, ѓ, ќ, ѕ
  • words: во, со
  • features: р is usually found between consonants, for example првин

Russian (русский)[edit]

  • uses: ё, й, ъ, ы, э, щ

Serbian (српски)[edit]

  • uses: ј, љ, њ, џ, ђ, ћ
  • does not use: ъ, щ, я, ю, й
  • words: је, у
  • features: large consonant clusters, for example српски

Ukrainian (українська)[edit]

  • uses: є, и, і, ї, й, ґ, є щ
  • does not use: ъ, ё, ы, э

Arabic alphabet[edit]

  • All languages using the Arabic alphabet are written right-to-left.
  • A number of other languages have been written in the Arabic alphabet in the past, but now are more commonly written in Latin characters; examples include Turkish, Somali and Swahili.

Arabic (العربية)[edit]

  • short vowels are not written so many words are written with no vowel at all
  • common prefix: -ال
  • common suffix: ة-
  • words: إلى, من, على


Persian (فارسی)[edit]

  • uses: پ, چ, ژ, گ
  • words: که, به

Urdu (اردو)[edit]

  • uses: ‮ٹ‎, ڈ‎, ڑ‎, ں, ے
  • many words ending in ے
  • words: اور, ہے
  • to distinguish from Arabic: in many texts, Urdu is written stylistically with words ‘slanting’ downwards from top-right to bottom-left (unlike the ‘linear’ style of Arabic, Persian etc).

Syriac Alphabet[edit]

Syriac (ܐܬܘܪܝܐ)[edit]

  • short vowels are not usually written so many words are written with no vowel at all
  • three styles of writing (estrangela, serto, mahdnaya) and two different ways of representing vowels
  • basic alphabet in Estrangela style is: ܐ ܒ ܓ ܕ ܗ ܘ ܙ ܚ ܛ ܝ ܟ ܠ ܡ ܢ ܣ ܥ ܦ ܨ ܩ ܪ ܣ ܬ
  • basic alphabet in Serto style is: ܬ, ܫ, ܪ, ܩ, ܨ, ܦ, ܥ, ܣ, ܢ, ܡ, ܠ, ܟ, ܝ, ܛ, ܚ, ܙ, ܘ, ܗ, ܕ, ܓ, ܒ, ܐ
  • basic alphabet in Madnhaya style is: ܬ,ܫ,ܪ,ܩ,ܨ,ܦ,ܥ,ܣ,ܢ,ܡ,ܠ,ܟ,ܝ,ܛ,ܚ,ܙ,ܘ,ܗ, ܕ,ܓ,ܒ,ܐ

Dravidian languages[edit]

  • All Dravidian languages are written from left to right.
  • All dravidian languages have different scripts. But similarity can be found in their orthography.

Tamil[edit]

  • common word endings :ள்ளது, கிறது, கின்றன, ம்
  • common words: தமிழ், அவர், உள்ள, சில
  • Tamil has a unique 30-letter alphabet. With the help of diacritics, as many as 247 letters can be written.

அ ஆ இ ஈ உ ஊ எ ஏ ஐ ஒ ஓ ஔ க ங ச ஞ ட ண த ந ப ம ய

Bengali[edit]

The Bengali alphabet or Bangla alphabet (Bengali: বাংলা বর্ণমালা, bangla bôrnômala) or Bengali script (Bengali: বাংলা লিপি, bangla lipi) is the writing system, originating in the Indian subcontinent, for the Bengali language and is the fifth most widely used writing system in the world. The script is used for other languages like Assamese, Maithili, Meithei and Bishnupriya Manipuri, and has historically been used to write Sanskrit within Bengal.

Bengali[edit]

Bengali has unique 50 letter Alphabet.

  • The Bengali script has a total of 9 vowel graphemes, each of which is called a স্বরবর্ণ swôrôbôrnô 'vowel letter'. The swôrôbôrnôs represent six of the seven main vowel sounds of Bengali, along with two vowel diphthongs. All of them are used in both Bengali and Assamese languages.

অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ

  • The Bengali script has a total of 39 Consonants. Consonant letters are called ব্যঞ্জনবর্ণ bænjônbôrnô 'consonant letter' in Bengali. The names of the letters are typically just the consonant sound plus the inherent vowel অ ô. Since the inherent vowel is assumed and not written, most letters' names look identical to the letter itself (the name of the letter ঘ is itself ghô, not gh).

ক খ গ ঘ ঙচ ছ জ ঝ ঞট ঠ ড ঢ ণত থ দ ধ নপ ফ ব ভ ময র ল শ ষ স হ ড় ঢ় য়ৎ ঃ ং ঁ

  • has 10 diacritics denoting syllable rhymes -

া ি ী ু ূ ৃ ে ৈ ো ৌ

Assamese[edit]

  • The Assamese script has a total of 9 vowel graphemes, each of which is called a স্বরবর্ণ swôrôbôrnô 'vowel letter' too.

অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ

  • has a total of 39 Consonants. Consonant letters are called ব্যঞ্জনবর্ণ bænjônbôrnô 'consonant letter' in Bengali.

ক খ গ ঘ ঙচ ছ জ ঝ ঞট ঠ ড ঢ ণত থ দ ধ নপ ফ ব ভ ময ৰ ল শ ষ স হ ড় ঢ় য়ৎ ঃ ং ঁ

  • has 10 diacritics denoting syllable rhymes -

া ি ী ু ূ ৃ ে ৈ ো ৌ

Canadian Aboriginal syllabics[edit]

In modern writing, Canadian Aboriginal syllabics are indicative of Cree languages, Inuktitut, or Ojibwe, though the latter two are also written in alternative scripts. The basic glyph set is ᐁ ᐱ ᑌ ᑫ ᒉ ᒣ ᓀ ᓭ ᔦ, each of which may appear in any of four orientations, boldfaced, superscripted, and with diacritics including ᑊ ᐟ ᐠ ᐨ ᒼ ᐣ ᐢ ᐧ ᐤ ᐦ ᕽ ᓫ ᕑ. This abugida has also been used for Blackfoot.

Cree language[edit]

Inuktitut[edit]

Other North American syllabics[edit]

Blackfoot[edit]

Cherokee[edit]

Artificial languages[edit]

Esperanto (Esperanto)[edit]

  • words: de, la, al, kaj
  • Six accented letters: ĉ Ĉ ĝ Ĝ ĥ Ĥ ĵ Ĵ ŝ Ŝ ŭ Ŭ, their corresponding H-system representation ch Ch gh Gh hh Hh jh Jh sh Sh u U or their corresponding X-system representation cx Cx gx Gx hx Hx jx Jx sx Sx ux Ux
  • words ending in o, a, oj, aj, on, an, ojn, ajn, as, os, is, us, u, i,

Klingon (tlhIngan Hol)[edit]

  • When written in the Latin alphabet Klingon has the unusual property of a distinction in case; q and Q are different letters, and other letters are either always (e.g. D, I, S) or never (e.g. ch, tlh, v) written in upper case. This causes a large number of words that look quite strange to people who aren't used to it, for example: yIDoghQo', tlhIngan Hol (with mixed case).
  • The apostrophe is fairly frequent, especially at the end of a word or syllable.
  • Common suffixes: -be', -'a'
  • Common words: 'oH, Qapla'
  • May use one or more apostrophes in the middle of a word: SuvwI″a'

Lojban (lojban.)[edit]

  • (almost) all lowercase;
  • common words lo, mi, cu, la, nu, do, na, se;
  • paragraphs delimited with with ni'o and sentences delimited with .i (or i);
  • many five-letter words in consonant-vowel shape CCVCV or CVCCV;
  • many short words with apostophes between vowels, like ko'api'o etc.;
  • usually no punctuation except for dots;
  • may use commas in the middle of words (typically proper nouns).

External links[edit]

  • Language Identification Web Service, language detection API, 100+ languages supported
  • Translated, an online language identifier, 102 languages supported
  • Language Detector, Online language identification from text or URLs.
  • Google Translate, Google's translation service.
  • Xerox, an online language identifier, 47 languages supported
  • Language Guesser, a statistical language identifier, 74 languages recognized
  • NTextCat - free Language Identification API for .NET (C#): 280+ languages available out of the box. Recognizes language and encoding (UTF-8, Windows-1252, Big5, etc.) of text. Mono compatible.
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Wikipedia:Language_recognition_chart&oldid=897089076'