Projects

Input tools

Indic Keyboard

Indic Keyboard is a MOSS Award winning, privacy aware versatile keyboard for Android users who wish to use Indic and Indian languages to type messages, compose emails and generally prefer to use them in addition to English on their phone. You can use this application to type anywhere in your phone that you would normally type in English. It currently supports 23 languages and 57 layouts.

Source Details

Poorna

Poorna is an extended Inscript/Remington keyboard layout available for Windows, Linux and Mac. It includes all Malayalam unicode characters which are not found in other layouts. One important use case is typesetting an old book (Malayalam bible for example) that requires characters not present in the other layouts. This layout also makes it easy to type punctuations and other special characters without switching layouts

Source Details

Swanalekha

Swanalekha is a famous transliteration based input tool available in all operating systems in desktops and mobile phones. It is very easy to learn using familiar Manglish based key mappings.

Source Details

Varnam

Varnam is a cross platform predictive transliterator for Indic languages. Varnam aims at providing consistent input experience across different platforms in almost all indic languages.

Source Details

Fonts & Script

Learn to write Malayalam

A simple application to observe and learn how to write Malayalam letters.

Use Source

Optical Character recognition

Scan and OCR documents in Malayalam. OCR recognizes the text in the document and gives plain text version. This is a tesseract-ocr based tool

Use Source

SMC Fonts

SMC released and maintains some of the popular fonts of Malayalam language, Including Manjari, Gayathri, Chilanka, Rachana, AnjaliOldLipi, Keraleeyam, Uroob, Suruma, Meera and so on.

Use Source

Machine translation & Spelling checker

Machine translation

Machine translation between English and Malayalam. This machine translation system uses huggingface transformers with OpusMT language models for translation.

Use

Spellchecker

LibIndic’s spellchecker module may be used to detect spelling mistakes in a word. If a spelling mistake is found, it generates valid root words as suggestions that have a higher probability being the word user actually intended to us

Use Source

Malayalam Linguistics

Malayalam phonetic analyser and generator

Malayalam Phonetic analyser and generator, IPA convertor.

Source Details

Malayalam text prediction system

Markov chain based Malayalam text prediction system.

Details

Morphology analyser and generator

Morphology analyser and generator, a foundational component for Malayalam computational linguistics.

Source Details

Corpus and data collections

SMC Text corpus

A collection of Malayalam text content for various research purpose, curated and released in Creative commons license.

Source

Speech corpus

A collection of Malayalam speech samples for various research purpose, curated and released in Creative commons license.

Source Details

Accessibility

Dhvani

Dhvani is a text to speech system designed for Indian Languages. It is helpful for the visually challenged users as a screenreader in their mother tongue. Currently dhvani is capable of generating intelligible speech for Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Panjabi, Tamil, Telugu, Pashto(experimental)

Details

ibus-sharada-braille

ibus-sharada-braille(ISB) is an ibus input engine based on six key approach of braille. It currently supports seven languages English, Malayalam, Hindi, Kannada, Tamil, French, and Spanish.

Details

Web

Grandham

Grandham.in is an effort to build crowdsourced bibliography datapool. It suppports all indian languages and have interfaces and APIs fo library, publishers and general public. The Malayalagrandham dataset the biggest bibliography data set in malayalam and may be the largest malayalam Open data archive.

Details

Libindic

Libindic is a web platform to host Free Software language processing applications. It consists of a web framework and a set of applications for processing Indian languages. It is a platform for porting existing and upcoming language processing applications to the web.

Details

ml2en

A simple online tool to convert Malayalam text to Manglish. For people who understands Malayalam, but can’t read the script.

Tools Source

Web Extensions

Indic-En

This addon will convert Malayalam, Hindi & Kannada webpages to Manglish, Hinglish & Kanglish respectively. For people who understands the language, but can’t read the script.

Firefox Chrome Details

Malayalam fonts

This addon will help you to choose Unicode Malayalam fonts for the websites you visit, without installing anything on your phone or computer.

Firefox Chrome

Mobile

Akshara Mazha

Open source live wallpaper with Malayalam letters streaming down the screen like you have seen in the popular movie Matrix.

Source Download

Magisk Malayalam Fonts

Set the default Malayalam font for rooted Android devices.

Source How Download

Manglish

A simple app to convert Malayalam text to Manglish. For people who understands Malayalam, but can’t read the script.

Details fdroid PlayStore

Programming Libraries

Hyphenation

Hyphenate Text

Hyphenation is the process inserting hyphens in between the syllables of a word so that when the text is justified, maximum space is utilized. Supported Languages: English, Hindi, Malayalam, Tamil, Telugu, Kannada, Oriya, Bengali, Gujarati, Panjabi, Marathi, Sanskrit, Assamese, Kashmeeri, Afrikaans, German, French, Croatian, Hungarian, Italian, Zulu

Source

indic-trans

cross transliterations among all Indian languages

The project aims on adding a state-of-the-art transliteration module for cross transliterations among all Indian languages including English and Urdu. The module currently supports the following languages: Hindi, Bengali, Gujarati, Punjabi, Malayalam, Kannada, Tamil, Telugu, Oriya, Marathi, Assamese, Konkani, Bodo, Nepali, Urdu and English

Source Details Documentation

indicfortune

Fortune Cookies

random quotes from chanakya, thirukkural and malayalam proverbs

Source

indicngram

An n-gram model is a type of probabilistic model for predicting the next item in a sequence. n-grams are used in various areas of statistical natural language processing and genetic sequence analysis. An n-gram is a subsequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application. An n-gram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”; and size 4 or more is simply called an “n-gram”.

Source

indicstemmer

experimental malayalam stemmer

LibIndic’s stemmer module may be used to extract stems of the words in a sentence. It is implemented in a rule-based model and follows iterative suffix stripping to handle multiple levels of inflection. Right now, it supports Malayalam language only.

Source

Inexact Search

Approximate String Search

By mixing both written like(edit distance) and sounds like(soundex), we achieve an efficient aproximate string searching. This application is capable of cross language string search too. That means, you can search Hindi words in Malayalam text. If there is any Malayalam word, which is approximate transliteration of hindi word, or sounds alike the hindi words, it will be returned as an approximate match. The “written like” algorithm used here is a bigram average algorithm.

Source

Katapayadi

Katapayadi writing method

Katapayadi sankhya is a simplification of Aryabhata’s Sanskrit numerals, due probably to Haridatta from Kerala. In Malayalam it is also known as ‘Paralperu’. For eg: ചണ്ഡാംശുചന്ദ്രാധമകുംഭിപാല represents 31415926536 which is π*1000000000000000. More examples in Malayalam are given in this page LibIndic’s katapayadi module may be used to find out katapayadi number of a given string, as well as to get Swarasthanas of a Melakartha number.

Source

Payyans

ASCII - Unicode converter

LibIndic’s Payyans module may be used to convert texts encoded in ASCII format to Unicode and vice-versa. More fonts can be added by placing their maps easily.

Source

Shingling

Indic W-shingling Library

A w-shingling is a set of unique “shingles”—contiguous subsequences of tokens in a document—that can be used to gauge the similarity of two documents. The w denotes the number of tokens in each shingle in the set. This library supports English, Hindi, Malayalam, Kannada and Bengali.

Source

Soundex

Soundex Phonetic Code Algorithm Demo for Indian Languages.

Soundex is phonetic algorithm for indexing names by sound as pronounced in English. LibIndic’s soundex module implements Soundex algorithm for Engish as well as a modified version of soundex algorithm for Indian languages.

Source Details

Syllabifier

LibIndic’s syllabifier module may be used to split words into their constituent syllables. It currently works for Malayalam, Kannada, Bengali, Tamil, Hindi and English.

Source

Text similarity

This module will compare two texts for their similarity. Based on the similarity it will give a number between 0 and 1. 1 means both text are similary. 0 means texts are completely different. A value in between 0 and 1 indicates how much they are similar. The algorithm uses an n-grams model and cosine similarity.

Source

ucasort

Module to sort words basded on linguistics

Unicode Collation Algorithm(UCA) based sorting for all languages defined in Unicode. The collation weights used in this application is a modified version of Default Unicode Collation Element Table (DUCET). It use the default weights defined by Unicode. Malayalam and Tamil sorting is compatible with GNU C library collation definition.

Source