Swathanthra Malayalam Computing

Projects


Input tools

Indic Keyboard


Indic Keyboard is a MOSS Award winning, privacy aware versatile keyboard for Android users who wish to use Indic and Indian languages to type messages, compose emails and generally prefer to use them in addition to English on their phone. You can use this application to type anywhere in your phone that you would normally type in English. It currently supports 23 languages and 57 layouts.

Poorna


Poorna is an extended Inscript/Remington keyboard layout available for Windows, Linux and Mac. It includes all Malayalam unicode characters which are not found in other layouts. One important use case is typesetting an old book (Malayalam bible for example) that requires characters not present in the other layouts. This layout also makes it easy to type punctuations and other special characters without switching layouts

Swanalekha


Swanalekha is a famous transliteration based input tool available in all operating systems in desktops and mobile phones. It is very easy to learn using familiar Manglish based key mappings.

Varnam


Varnam is a cross platform predictive transliterator for Indic languages. Varnam aims at providing consistent input experience across different platforms in almost all indic languages.


Fonts & Script

Learn to write Malayalam


A simple application to observe and learn how to write Malayalam letters.

Optical Character recognition


Scan and OCR documents in Malayalam. OCR recognizes the text in the document and gives plain text version. This is a tesseract-ocr based tool

SMC Fonts


SMC released and maintains some of the popular fonts of Malayalam language, Including Manjari, Gayathri, Chilanka, Rachana, AnjaliOldLipi, Keraleeyam, Uroob, Suruma, Meera and so on.


Machine translation & Spelling checker

Machine translation


Machine translation between English and Malayalam. This machine translation system uses huggingface transformers with OpusMT language models for translation.

Spellchecker


LibIndic’s spellchecker module may be used to detect spelling mistakes in a word. If a spelling mistake is found, it generates valid root words as suggestions that have a higher probability being the word user actually intended to us


Malayalam Linguistics

Malayalam phonetic analyser and generator


Malayalam Phonetic analyser and generator, IPA convertor.

Malayalam text prediction system


Markov chain based Malayalam text prediction system.

Morphology analyser and generator


Morphology analyser and generator, a foundational component for Malayalam computational linguistics.


Corpus and data collections

SMC Text corpus


A collection of Malayalam text content for various research purpose, curated and released in Creative commons license.

Speech corpus


A collection of Malayalam speech samples for various research purpose, curated and released in Creative commons license.


Accessibility

Dhvani


Dhvani is a text to speech system designed for Indian Languages. It is helpful for the visually challenged users as a screenreader in their mother tongue. Currently dhvani is capable of generating intelligible speech for Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Panjabi, Tamil, Telugu, Pashto(experimental)

ibus-sharada-braille


ibus-sharada-braille(ISB) is an ibus input engine based on six key approach of braille. It currently supports seven languages English, Malayalam, Hindi, Kannada, Tamil, French, and Spanish.


Web

Grandham


Grandham.in is an effort to build crowdsourced bibliography datapool. It suppports all indian languages and have interfaces and APIs fo library, publishers and general public. The Malayalagrandham dataset the biggest bibliography data set in malayalam and may be the largest malayalam Open data archive.

Libindic


Libindic is a web platform to host Free Software language processing applications. It consists of a web framework and a set of applications for processing Indian languages. It is a platform for porting existing and upcoming language processing applications to the web.

ml2en


A simple online tool to convert Malayalam text to Manglish. For people who understands Malayalam, but can’t read the script.


Web Extensions

Indic-En


This addon will convert Malayalam, Hindi & Kannada webpages to Manglish, Hinglish & Kanglish respectively. For people who understands the language, but can’t read the script.

Malayalam fonts


This addon will help you to choose Unicode Malayalam fonts for the websites you visit, without installing anything on your phone or computer.


Mobile

Akshara Mazha


Open source live wallpaper with Malayalam letters streaming down the screen like you have seen in the popular movie Matrix.

Magisk Malayalam Fonts


Set the default Malayalam font for rooted Android devices.

Manglish


A simple app to convert Malayalam text to Manglish. For people who understands Malayalam, but can’t read the script.


Programming Libraries

Hyphenation

Hyphenate Text


Hyphenation is the process inserting hyphens in between the syllables of a word so that when the text is justified, maximum space is utilized. Supported Languages: English, Hindi, Malayalam, Tamil, Telugu, Kannada, Oriya, Bengali, Gujarati, Panjabi, Marathi, Sanskrit, Assamese, Kashmeeri, Afrikaans, German, French, Croatian, Hungarian, Italian, Zulu

indic-trans

cross transliterations among all Indian languages


The project aims on adding a state-of-the-art transliteration module for cross transliterations among all Indian languages including English and Urdu. The module currently supports the following languages: Hindi, Bengali, Gujarati, Punjabi, Malayalam, Kannada, Tamil, Telugu, Oriya, Marathi, Assamese, Konkani, Bodo, Nepali, Urdu and English

indicfortune

Fortune Cookies


random quotes from chanakya, thirukkural and malayalam proverbs

indicngram


An n-gram model is a type of probabilistic model for predicting the next item in a sequence. n-grams are used in various areas of statistical natural language processing and genetic sequence analysis. An n-gram is a subsequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application. An n-gram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”; and size 4 or more is simply called an “n-gram”.

indicstemmer

experimental malayalam stemmer


LibIndic’s stemmer module may be used to extract stems of the words in a sentence. It is implemented in a rule-based model and follows iterative suffix stripping to handle multiple levels of inflection. Right now, it supports Malayalam language only.

Inexact Search

Approximate String Search


By mixing both written like(edit distance) and sounds like(soundex), we achieve an efficient aproximate string searching. This application is capable of cross language string search too. That means, you can search Hindi words in Malayalam text. If there is any Malayalam word, which is approximate transliteration of hindi word, or sounds alike the hindi words, it will be returned as an approximate match. The “written like” algorithm used here is a bigram average algorithm.

Katapayadi

Katapayadi writing method


Katapayadi sankhya is a simplification of Aryabhata’s Sanskrit numerals, due probably to Haridatta from Kerala. In Malayalam it is also known as ‘Paralperu’. For eg: ചണ്ഡാംശുചന്ദ്രാധമകുംഭിപാല represents 31415926536 which is π*1000000000000000. More examples in Malayalam are given in this page LibIndic’s katapayadi module may be used to find out katapayadi number of a given string, as well as to get Swarasthanas of a Melakartha number.

Payyans

ASCII - Unicode converter


LibIndic’s Payyans module may be used to convert texts encoded in ASCII format to Unicode and vice-versa. More fonts can be added by placing their maps easily.

Shingling

Indic W-shingling Library


A w-shingling is a set of unique “shingles”—contiguous subsequences of tokens in a document—that can be used to gauge the similarity of two documents. The w denotes the number of tokens in each shingle in the set. This library supports English, Hindi, Malayalam, Kannada and Bengali.

Soundex

Soundex Phonetic Code Algorithm Demo for Indian Languages.


Soundex is phonetic algorithm for indexing names by sound as pronounced in English. LibIndic’s soundex module implements Soundex algorithm for Engish as well as a modified version of soundex algorithm for Indian languages.

Syllabifier


LibIndic’s syllabifier module may be used to split words into their constituent syllables. It currently works for Malayalam, Kannada, Bengali, Tamil, Hindi and English.

Text similarity


This module will compare two texts for their similarity. Based on the similarity it will give a number between 0 and 1. 1 means both text are similary. 0 means texts are completely different. A value in between 0 and 1 indicates how much they are similar. The algorithm uses an n-grams model and cosine similarity.

ucasort

Module to sort words basded on linguistics


Unicode Collation Algorithm(UCA) based sorting for all languages defined in Unicode. The collation weights used in this application is a modified version of Default Unicode Collation Element Table (DUCET). It use the default weights defined by Unicode. Malayalam and Tamil sorting is compatible with GNU C library collation definition.