|
Efforts
are on to find Indian language solutions for GNU/Linux in
the subcontinent. But in a country that is believed to have
1,652 mother tongues this will prove to be quite a challenge.
Yet going by the sheer number of initiatives in this space
from different corners of the country, Linux enthusiasts seem
undaunted, says Frederick Noronha
 |
| Dr
Nagarjuna G feels that though governments have invested
a lot of money in the development of technologies and
fonts, there are neither consistent standards to be followed
nor are products available freely |
Some
call Indic and other South Asian scripts the final challenge
to computer vendors for full i18n support. It has taken its
time in coming, the challenges are tough, and successes have
been few and far between. Yet optimism is high now. Can this
frontier be conquered early?
South
Asiahome to nearly one-sixth of humanityis struggling
to achieve regional language solutions that would make computing
accessible to the common man. Despite poverty and the fact
that people have low purchasing power, this could open the
floodgates to greater computing power and much-needed efficiency
in a critical area of the globe.
For sure, GNU/Linux is making its own headway. And even if
all this has so far largely failed to get the attention it
deserves, expect some interesting surprises from this area.
In mid-September, key proponents of Indianisation met up in
Bangalore. Their goal: to bring together energetic young developers
working in the space of developing local language development
tools, applications, and content.
The
aims included introducing free discussion intended to spur
creative and passionate thought about the future
of local language computing technologies.
Some Indian regional languages are the most spoken lingos
in the world. Take Hindi, with its 366 million speakers, second
worldwide in number-terms only to Chinese Mandarin; Telugu
with 69 million; Marathis 68 million; and Tamil with
66 million. There are another 13 Indian languages in the top-70
global languages with over 10 million speakers.
Moreover, there are certain languages spoken not only within,
but also beyond Indian boundaries: like Bengali (207 million
speakers in India and Bangladesh), and Urdu (60 million in
Pakistan and India). Naturally, this linguistic space needs
to be closely watched.
Range of initiatives
Varied initiatives are currently on in various parts of the
country. One exciting project is the Simputer, a simple and
relatively inexpensive computing device that would benefit
groups of simple villagers.
GNU/Linux enthusiasts are optimistic about its potential,
especially because this planned computing device runs on their
favourite OS (operating system). Not just that, the creation
of the Simputer is also being undertaken via an open
design format, an innovative idea from India that gives a
new meaning to open technology in the hardware
world.
This device is seen to have a clear edge over any palm top.
Palm tops cant compute in Indian languages and
dont have text-to-speech interfaces for Indian languages.
They are also not aimed for the mass market that Simputer
is looking at and still have a more elitist user community,
says Abhas Abhinav of Deep Root Linux in Bangalore.
Dhvani, is a text-to-speech system for Indian languages developed
by the Simputer Trust developers and others. It promises to
soon have a better phonetic engine, Java port and language
independent framework.
Meanwhile, IMLI is a browser created by the Simputer Trust
that uses the IML markup language. It is designed for easy
creation of Indian language content and is integrated with
the text-to-speech engine. IMLI can be independently installed
on any Simputer.
In the national capital, New Delhi, and the western state
of Goa, campaigners are struggling to take GNU/Linux to the
classroom. Indian language solutions could obviously take
such a project far ahead than software restricted to English.
In Kerala, another southern state with an impressive 90 percent
literacy rate, and where Malayalam is spoken by 35 million
people, another venture is underway.
Senior local government official Ajay Kumar is leading an
initiative to introduce GNU/Linux in Malayalam. He says: We
propose to develop a renderer for our language. Specifically,
we are looking for a renderer for Pango (the generic-engine
used with the GTK toolkit).
The state government is looking for persons who have worked
on Malayalam and Unicode to offer some of their work for this
project, especially in fonts. Ajay Kumar hopes that in nine
months time they can create an atmosphere where language computing
in Malayalam improves. We are confident that once we
deliver the basic framework, others will start localising
more applications in Malayalam, he says.
Other initiatives have also come up, like the GNU India Translation
Project (GTP) by gnu_India that aims at the localisation of
GNU/Linux program into the native languages of India.
Rahul Jindal had earlier announced the Hindi-speaking chat
robotDeepti, on the lines of Alice (www.alicebot.org).
We shall use or develop a Hindi TTS for the output and
add more frills as time permits, he says.
ITRANS by Avinash Chopde is a package for printing texts in
Indian languages. It uses English-encoded text for input,
and it supports the Devanagiri script (used for writing Hindi
and some other Indian languages), Gujarati, Telugu, Kannada,
Bengali, Tamil, Punjabi, and Romanised Sanskrit. Input files
can be in TeX, LaTeX, HTML, or PostScript format, and Unicode
output is supported.
More importantly, international efforts are also helping India.
Yudit, with its recent 2.5.4 release, announced in recent
weeks that it was offering support in three south Indian languagesMalayalam,
Kannada and Telugu. Delhi-based GNU/Linux veteran Raj Mathur
comments: The current version of Yudit has complete
support for Malayalam and other indic languages. It can also
use Opentype layout tables of Malayalam fonts. I think Yudit
is the first application that can use Opentype tables for
Malayalam.
K Ratheesh was a student of the Indian Institute of TechnologyChennai
when he worked on enabling the GNU/Linux console for local
languages, a couple of years ago. As the (then) current
PSF format didnt support variable width fonts, I have
made a patch in the console driver so that it will load a
user-defined multi-glyph mapping table which can be displayed
for a single character code. All editing operations will also
be taken care of, he explains.
Further, as Ratheesh points out, for Indian languages, there
are various consonant/vowel modifiers that result in complex
character clusters. So I have extended the patch to
load user-defined context-sensitive parse rules for glyphs
and character codes as well. Again, all editing operations
will behave according to the parse rule specifications,
he says.
Even though the patch has been developed keeping Indian
languages in mind, I feel it will be applicable to many other
languages (for instance Chinese), which require wider fonts
on console or user defined parsing at I/O level, adds
Ratheesh. The package, containing the patch, some documentation,
utilities and sample files then weighed around 100 KB.
There are even projects aimed at helping to explore Indian
holy books, written in the ancient religious language of Sanskrit.
 |
| Support
for Indian languages in open-source OSs today is confined
to a series of hacks and ad-hoc 'solutions', says Joseph
Koshy |
Strategy
suggested
One Indic-computing strategy document, prepared in May 2002,
notes that India faces a unique local-language computing standardisation
and capacity-building problem, apart from other factors. This
is simply due to the wide variety of regional and local languages
in use. Then, there are also the organisational and regional
obstacles inherent in the effort to standardise this rich
variety of languages.
It moots a strategy of creating a hierarchy of participatory
consortia, to facilitate broad regional and local participation
in standardisation and development from a range of stakeholders
with differing areas of expertise.
It is important that these consortia be participatory
and inclusive to properly represent the viewpoint of local
developers, users and other stake-holders. We recommend the
formation of state-level (regional) consortia for each regional
language, which should include participants from the following
key member groups: developers, technologists, users/practitioners,
linguistic groups... the document reads.
Tapan S Parikh, a 27-year-old US-educated Indo-American who
has set his heart on finding language computing solutions
for his homeland, says he and his colleagues are trying to
pull together some linguistic information for Indian languages,
document it, and post it on the Web.
Says he: Basically the idea now is to put these guidelines
out there and solicit a lot of feedback on this info from
the general community for each language. From that we can
collate the best results and publish a handbook.
At the end of Septembers Bangalore meeting, organisers
hope to have assembled a community of technically informed
and motivated people to organise and lead the Indic-computing
development effort into the future.
The leadership of this community should be individual-driven,
technically motivated, and entrenched with youth, vitality
and a progressive vision, says Parikh, also one of the
organisers.
We also hope that this broad coalition would play a
facilitatory role in helping local language groups interact
more effectively with international standards processes and
forums, such as the Unicode Consortium and W3C, say
the organisers.
Which ones first?
One of the issues that need to be considered is which languages
need be tackled first?
HPs Bangalore-based technical consultant Joseph Koshy
argues that the North Indian Hindi family promises
the greatest reach population-wise. However, he feels the
southern languagesKannada, Telugu, Tamil and Malayalamoffer
the greatest promise of real-world deployability. They enjoy
better support infrastructure needed to deploy an effective
IT solution.
Outside his work-life at HP, Koshy is a volunteer-developer
of the FreeBSD operating system and one of the founders of
the Indic-computing project on SourceForge. Says he: What
I am interested in is to help make standards-based, interoperable
computing for Indian languages a reality. This dream is bigger
than any one operating system or any one computing platform.
I want to see pagers, telephones, PDAs and other devices that
have not been invented yet interacting with our people in
our native languages.
But others have different views. Says C V Radhakrishnan, a
TeX programmer, who runs River Valley Technologies out of
Thiruvananthapuram in South India, I think most of the
South Indian languages would pose much problems because of
their non-linear nature. For example, to create conjunct glyphs
one has to go back and forth, while North Indian languages
do not have this problem. Malayalam has peculiar characters
called half consonants (chillu), there is no equivalent
for this in other languages. This raises severe computing/programming
challenges.
While the debate goes onand the proof would lie in the
actual solutions that come upits clear that some
could be difficult languages.
Others say the smaller languages are traditionally not written,
or are written in non-standard variants of standard scripts.
Radhakrishnan points to SIL.org as a group working on related
issues.
FreeBSD developer Koshy notes that the official Census of
India lists 114 major languages in the sub-continent.
Linguists, who discriminate more discreetly than the Census
officials, peg the number of living languages in India at
850+, he says.
Out of the 18 more important scheduled national
languages, all except those based on Devanagiri (which use
the same script as Hindi) have serious issues when it comes
to representing and processing them on a computer, says Koshy.
Each language needs its differences to be taken care of. Solutions
which treat all languages as equivalent have got only limited
acceptability, argues G Karunakar, another young developer
taking a keen interest in this field.
 |
| Ravikant
says efforts should be made to make existing packages
more user-friendly |
Wish
lists
What would be the applications and solutions required for
a good start? Radhakrishnans wish-list begins with X
Window support for local languages (a promising project in
this direction is Indix); a good editor that supports Unicode
is a prime requirement, since even though Yudit
supports Unicode, it is highly insufficient as an editor.
It goes on to include multi-lingual typesetting systemOmega
(16 bit extension of TeX) is a good candidate for this; simple
mail client-like pine or mutt, and a browser extended to support
local languages with local language menus.
Says Koshy: The usual paper consumption
uses (i.e. word processing, printing, etc) are always there.
But I think that the greatest demand would be for what I call
relevant information for lack of a better name.
Content is also critical.
Requirements vary widely. It all depends on where computers
are used, argues Koshy. For instance, the Garhwal region could
need a matrimonial service uniting its people scattered around
the world. Those in the eastern town of Asansol might need
information about tobacco or tea markets, its most important
local produce.
Some stress on the need for the basicsenabling the user
to type, save and print documents in his language(s), the
ability to share files with others, read and send e-mail,
and the opportunity to browse and search the Net in his mother
tongue.
Other wants come up fast too: Indian GNOME, KDE, Mozilla,
Galeon, and Konqueror; an office suite; and instant messaging
solutions.
Karunakar points out that a team in Sun Microsystems is working
on the X extension approach (eg XOMX output method).
At the toolkit level, Gtk and Qt are the most used toolkits.
This helps. Gtk already has a good framework through the Pango
project, and basic level support for Indian languages. Qt
now also has Unicode level support for all languages, but
rendering is not yet ready.
On the font level, there is no font-encoding standard. ISFOC
aimed to be one, but it has become synonymous as a C-DAC encoding
and due to the lack of a document describing it, it has been
ignored in GNU/Linux solutions.
But what are the priority applications?
Everything, says Edward Cherlin who creates multi-lingual
websites, and is active in internationalisation standards
and implementation.
On GNU/Linux, Cherlin who is based in Cupertino, CA, points
out, You can volunteer to Indicise any application.
In the future, when font management and rendering are standardised,
all applications will run in Indian languages for input and
output without further ado, and anyone will be able to create
a localisation file to customise the user interface. Volunteers
are also needed to translate documentation.
 |
| Prakash
Advani says though Unicode brought in standardisation,
certain issues still remain unresolved |
Other
OSs
Experts in the field are also studying the progress of other
OSs. Some argue that today only Microsofts WinXP has
any kind of Indian language support worth speaking about.
But this is based on the current Unicode version (3.x) and
hence suffers from all the problems of Unicode-based solutions:
inability to represent all the characters of some Indian languages,
and awkwardness in text processing.
Microsoft faces other problems too. When Microsoft came
up with the South Asian edition of MS Word, the fonts had
a lot of problems. Mostly, words were rendered as separate
letters with space in between and not combined together as
is the case with most Indian languages, says Kalika
Bali, a PicoPeta language technology specialist. PicoPeta
is one of the firms working to create the Simputer.
Support for Indian languages in the open-source OSs is today
confined to a series of hacks and ad-hoc solutions,
argues Koshy. Unicode support in the open-source OSs is itself
still coming in (and slowly too).
Dr U B Pavanaja, a former scientist now widely noticed for
his determined work to push computing in the influential south
Indian language of Kannada, however finds the progress quite
remarkable, compared to the scene about two years ago.
Says Pavanaja: Current pricing and product activation
of XP may become a boon for GNU/Linux (since software piracy
would be more difficult).
Cherlin too is optimistic. According to him, By next
year, the Pango project should support all nine official Indic
scripts. So the answer (to which languages should be tackled
at this stage) is, all of them.
As Cherlin argues, Indic and other South Asian scripts are
the final challenge to computer vendors for full I18n support.
Progress is slow at Microsoft and Apple. They are not
willing to simply support typing, display, and printing. They
will not release language and writing system support until
they have complete locales built, preferably including a dictionary
and spelling checker. Linux is under no such constraints.
He points out that the Free Standards Group together with
Li18nux.org are proposing to rationalise and simplify I18n
support under X, including a common rendering engine, shared
font paths, and other standards that will greatly simplify
the business of supporting all writing systems and all languages.
Cherlin feels that Yudit and emacs both support several Indic
scripts, and could be extended with only moderate effort on
the part of a few experts.
Mandrake Linux includes Bengali, Gujarati, Gurmukhi, Hindi
Devanagari, and Tamil out of the box. That leaves Oriya, Malayalam,
Telugu, and Kannada still to be done, along with the Indic-derived
Lao, Sinhala, Myanmar, and Khmer. Tibetan and Thai are moderately
well supported, Cherlin contends.
Recently, localisation efforts are picking up,
agrees scientist and free software advocate in Mumbai, Dr
Nagarjuna G.
Other operating systems have their own funds for R&D.
GNU/Linux depends on volunteers and external financial support.
If the government or other funding agencies can spare even
some amount to bodies like Free Software Foundation of India,
and others who are active in the localisation initiative,
developers would be motivated and make this happen very fast.
FSFIndia is presently working with the Kerala government to
produce Malayalam support to the GNOME desktop, notes
Nagarajuna.
Incidentally, the Indian TeX Users Group now has a project
to fund font designers in all the Indian languages who are
ready to write fonts and donate them under GPL to TUGIndia.
Theyve thus secured Keli a Malayalam font
family in various weights and shapes written by Hashim and
released under GPL. We do hope to get more fonts in
other languages to fill up the gaps. We hope to use the savings
generated from TUG2002 (to be held in India in September 2002)
exclusively for this purpose, says Radhakrishnan in
Thiruvananthapuram.
| Finding
an Indian tongue for the Penguin |
|
Support
for Indian languages is coming in slowly. There are
several efforts towards this end:
IndLinux project: http://www.indlinux.org, http://www.sourceforge.net/projects/indlinux
A volunteer group working at the desktop level (KDE/GNOME),
using Unicode. But ISCII, the Indian standard character
interface and South Asian equivalent of ASCII, will
also be supported by providing converter tools. Current
focus is on Opentype font development and translations
for GNOME 2.0.
This group aims to play the integrating role, by putting
all the pieces together to make it usable. Now, a distributed
approach is being taken to encourage people to take
up localisation for their language. There are now volunteers
from more remote areas like Bhopal, Jabalpur, Nainital,
etc, apart from regular centres like Mumbai, Pune, Hyderabad
and Bangalore.
The group is presently working on Gnome 2 translations,
to make it simple to use so that all the user needs
to do is either change his language or keyboard layout.
Also in progress is a Hindi-enabled version of upcoming
Redhat 7.3.93 (Limbo) which will probably become Redhat
8.0. So you will now have the option of installing in
Hindi.
IndiX: http://rohini.ncst.ernet.in/indix/
A modified X server to support Indian languages using
Opentype fonts. Uses Unicode. Seeks to bring Indic support
at the OS level on GNU/Linux. Others too agree that
NCSTs localisation work is promising, both for
Indix and OpenOffice in Hindi.
IITM indlinux: http://www.tenet.res.in/-Donlab/Indlinux/
From IIT-Madras in the south Indian city of Chennai.
They have modified X and console in kernel to support
Indian languages. Uses ISCII encoding only.
Linux Localisation Initiative (LLI): lli.linux-bangalore.net
A volunteer group working on translating LDP documentation
(starting with HOWTOs) to Indian languages.
n Indic-computing project:. indic-computing.sourceforge.net
Aims to create a resource centre for all Indian language
issues in computing. It is aggregating all language
info in one place, so that its a lot easier for
developers in the future.
Language Technology Resource centre (LTRC): IIIT Hyderabad:
http://www.iiit.net/ltrc/index.html
They have developed language dictionaries, plug-in for
viewing ISCII, and font converters. Also building a
machine-based translation tool (Anusaaraka). Most of
their work is release under GNU GPL. Indix, IITM and
IIIT-Hyderabads work is supported by the government
of India. The rest are volunteer-based and looking
for funds. Some interesting GNU/Linux and other
OS work happening in South India, in the Tamil language
heartland and nearby: http://www.chennaikavigal.com
|| http://www.tamillinux.org
Some other projects earning notice:
-
A team doing good work is the IITM team [http://acharya.iitm.ernet.in/];
the algorithms/approach are interesting.
-
Mithi Technologies, the Pune-based firm, has done
a good job on the Web server front. This is quite
a well-thought effort, as the majority of the Web
servers run on Linux-Apache. There are also international
projects that could benefit Indian computer users:
Pango, Graphite, Li18nux, Free Standards. Mandrake
Linux, which emphasises multi-lingual support and
welcomes any offer to help. And of course Indias
own attempt at building a people-friendly low-cost
computing devicethe Simputer (www.simputer.org).
- http://www.parabaas.com/Parabaas_Axar/index.html
(Bangla editor for Linux, Java-based, runs on all
platforms)
Theres also the Indian language work by a team
in the International Institute for Information Technology
(IIIT) Hyderabad. They have been doing good work in
areas of machine translation, linguistics, dictionaries,
etc, and much their work is available under GNU GPL.
There are two international projects to create a complete
rendering engine: Pango (Pango.org, Li18nux.org) and
Graphite (sil.org). India could gain from these. They
also have plans for complete sets of Unicode fonts (including
not just the Unicode characters, but also all of the
non-character glyphs for rendering Indic scripts).
|
Technical
challenges
Technical challenges are definitely not a small number: the
X rendering model is too simple for Indic scripts (but an
upcoming tutorial on the Indic Computing site will have the
nitty-gritties). Input for Indian languages is an open issue.
Most keyboard solutions available today for X
are fragile and are really more work-arounds than solutions.
In Cherlins view, the principal problem is rendering
conjuncts without proper rendering engines and properly encoded
fonts. Users want to type a sequence of characters, and not
concern themselves with the details of rendering. This requires
fonts with appropriate tables giving the possible character
sequences and the glyphs for rendering each, and an engine
that knows how to read the tables.
Recently, at the user-interface level, GNOME/Gtk teams tried
rendering Unicode encoded Devanagari (Hindis script).
But this is specific to GTK and doesnt extend to the
other X toolkits, adds Koshy.
I
dont know of any non-X user interface toolkits that
support Indian languages. Neither am I aware of any general
text processing toolkitsa toolkit or library that helps
in manipulating Indian language text, for sorting, searching,
storage and retrieval. We dont even have the necessary
technical information about 90 percent of our languages that
we can use to get started on such a toolkit, says Koshy.
For desktop class machines, current font technology (TTF,
OpenType, Type 1, etc) is capable of handling Indic scripts.
Availability of good-quality fonts is another matter; but,
as Koshy puts it, this is not really a show stopper. Display
technology for embedded devices (pagers, small devices) for
Indian languages is not well developed.
Languages like Urdu and Sindhi have right-to-left scripts
which look similar to Arabic but are, in fact, different,
argues Prakash Advani who some years back launched the FreeOS.com
initiative.
I have found a great problem in typesetting technical
documents and school/college text books, particularly in the
disciplines of maths, physics and chemistry. The reason is
the lack of local language support for TeX, the worlds
best maths typesetting system. When an operating system does
not support the education in a local language, the purpose
of usage of computers is extremely diluted, says Radhakrishnan.
Satish
Babu, a Free Software enthusiast and vice president of InApp,
an Indo-US software company dealing with free and Open Source
solutions, points to another problem: Collation(sorting)
order confusion (often there is no unique natural
collation order, and one has to be adopted through standardisation).
Then theres also the non-availability of dictionaries
and thesauri in Indian languages and issues arising out of
multiple correct spellings for words; encoding standardisation
(Unicode) that will, inter alia, facilitate transliteration
between Indian languages program support (database, spreadsheet)
for sorting/searching two-byte strings; lack of support for
some languages (e.g. Tulu, Konkani, Haryanvi, Bhojpuri).
Ravikant, who taught History at Delhi University before moving
to the Language and New Media project of sarai.net, says:
The long term solution is of course Unicode and the
package Yudit already works on both Linux and Windows. Using
the package you can write e-mail, through cut-paste on any
of the browsersthe new Mozilla and IE; hostweb-pagesin
short, write html.
For short-term measures, he suggests working towards developing
the existing packages, in a manner that people can use
them with freedom from OSs and fonts. ITRANS and WRITE32,
written by Indians settled abroad, are transliteration packages,
which already do so. The LATEX-Devnag package is being used
and promoted by the Mahatma Gandhi International University,
Delhi.
Then there are packages that, according to Ravikant, do not
offer OS freedom. These are for Windows only: Baraha (www.baraha.com),
I-Leap and IndiaPage (mithi.com).
Says Advani: There is definitely a market for Indian
language computing that exists today but there is a huge untapped
market. 95 percent of the population do not read/write English.
If we can provide them with a low-cost Indian language computer,
it will be a hit.
According to him, the biggest challenge is lack of standards.
Till Unicode happened, there was no consistent standard, everyone
was following their own standards of input, storage and output
of data.
Unicode brought in standardisation. But not all is hunky-
dory. Certain issues remain unresolved. For instance, not
everyone agrees with Unicode even though it is an international
standard; not all the applications are Unicode-enabled, though
things are getting there; most Indian language websites dont
support Unicode and neither do all OSs.
Also, there is a lack of free Indian language fonts. There
are over 5,000 commercial Indian language fonts but there
are probably 10 Free (GPL/royalty free) Indian language fonts.
This is a serious issue and more efforts should be made to
release free fonts, says Advani.
One other view is that GNU/Linuxs GUI is a soup of various
protocols and toolkits and there is no single point where
Indian languages can be incorporated. GTK and Qt have separate
projects for i18n, but neither is sufficient. IndiX takes
a different route and works at the X level. Over all, the
whole process is awkward.
Besides, others point out, fonts are another mess altogether.
Most of the current implementations rely on glyph locations
to display and store information. For instance, to represent
the letter a what is stored is the position of
a in some particular font used by that package.
This is different from normal English where the ASCII standard
specifies that to represent a the number 65 has
to be used. No such standard exists for Indian languages and
thus one document written in one language cannot be opened
in another application. This is also the reason why in Indian
Web pages one needs to use particular fonts specified by the
author.
Vendors often use such a situation to lock in their customers
to a particular product. This also hampers the exchange of
e-mail to situations only where both the parties have the
same Web interface or program to use an e-mail in an Indian
language.
TUGIndia, which Mathur represents, has procured a Malayalam
font (Keli) from font designer Hashim and will convert it
to Opentype and distribute it under GNU GPL. The project is
expected to be completed by September 2002. Mathur works as
an engineer at Linuxense Information systems, and leads the
Indian TeX users groups localisation project.
Says Karunakar: There are very few people in India who
understand Fonts technology completely, so most fonts that
are available are buggy. Due to lack of font standard, our
fonts are not tagged as an Indian language font.
Right now a general consensus seems to be building on Opentype
Fonts as the suitable technology for Indian language fonts.
There is already a free Devanagari font (Raghu
by Dr R K Joshi, NCST (the Govt of Indias Mumbai-based
National Centre for Software Technology) and used in Indix),
a Kannada Opentype from KGP, also for Malayalam, Telugu and
Bengali.
There is a lot of know-how in books that are rare and
difficult to come by. Lot of research work done by scholars,
linguists, typographers etc is going untapped, adds
Karunakar.
 |
| G
karunakar feels that solutions which treat all languages
as equivalent have got only limited acceptability |
Lack
of information
Koshy says: The biggest problem I see today
is the lack of information in a form useful to a software
developer. Most of the developers for open-source projects
(and this holds true for closed-source companies too) are
not Indians.
Though
we Indians claim to be a software super power,
we apparently arent very good at producing working code.
For example, the core work in bringing Devanagari support
into GTK has been done by a few Europeans; the Indian
contribution has been in providing translations of application
messages, Koshi says.
Given this situation, campaigners at the ground level are
saying it is imperative that information needed to implement
language support be made widely available so that whoever
is interestedbe it the Czech or Scandinavian or Bengalican
add Indian language support to the code base that they maintain.
Indian languages also face challenges in terms of voice synthesis
and recognition. Bali points to the lack of easily available
annotated speech corpora to train language/statistical models
for creating state-of-art TTS and ASR engines.
This is especially the case for ASR as one would need
to train the models for dialectal variation if they were to
be deployed in a semi-urban environment. For example, how
many people actually use the standard Sanskrit-influenced
Doordarshan version of Hindi for their daily interaction?
asks Bali.
Dr Nagarjuna G. lists the problems bluntly: Lack of
standards, lack of good quality fonts available in the public
domain. Governments are spending lots of tax payers
money in the development of technologies and fonts, which
either are not following standards or the products are not
freely available.
Shrinath, a senior staff scientist at Mumbais NCST that
has done some interesting work on this subject, says: We
want Indian language programming to be as simple as programming
in English is today. Almost every company has to reinvent
the wheel or buy costly solutions from others. In English,
the OS supports it. Its a chicken and egg problem. If
there are apps in indic, the OS vendors will build the fundamental
capabilities into the OS, and if the capabilities are built
in, there will be more apps.
There are other needs too: dictionaries and spelling checkers,
of course. Word-breaking doesnt operate the same way
in Indic scripts as in the Latin alphabet. And fine typography,
which you dont find in consumer or office applications
in any language.
One major challenge is the sheer numbers. India is believed
to have 1652 mother tongues, of which 33 are spoken by people
numbering over a hundred thousand.
Girish S, an electronics engineer from Jabalpur, Madhya Pradesh
who set up apnajabalpur.com, sums it best: English has
been de-facto language for software development as well as
usage. So there is a long way to go. As it appears, China
is working fast on that end, and so can we.
|