Issue dated - 15th September 2003

-


Previous Issues

CURRENT ISSUE
INDIA NEWS
STOCK FILE
FOCUS
INDIA TRENDS
NEWS ANALYSIS
OPINION
COMPANY WATCH
TECHSPACE
E-BUSINESS
PRODUCTS
COLUMNS
TECH FORUM

THE C# COLUMN

BETWEEN THE BYTES
TECHNOLOGY
SPECIALS <NEW>
Symantec Report
Security Headquarters
JobsDB
MINDPRINTS
HMA BANKBIZ
EC SERVICES
ARCHIVES/SEARCH
IT APPOINTMENTS
WRITE TO US
SUBSCRIBE/RENEW
CUSTOMER SERVICE
ADVERTISE
ABOUT US

 Network Sites
  IT People
  Network Magazine
  Business Traveller
  Exp. Hotelier & Caterer
  Exp. Travel & Tourism
  Exp. Pharma Pulse
  Exp. Healthcare Mgmt.
  Express Textile
 Group Sites
  ExpressIndia
  Indian Express
  Financial Express

 
Front Page > Focus > Story Print this Page|  Email this page

The of Indic computing

While on the statistical front the demand for local language applications is significant, estimated at $64 million by 2004, on the market front it still remains minuscule. Computing in a vernacular language has never been easy—if the content was there then the fonts were missing, and if the fonts and content were there, then the coding was missing. In the last ten years of existence the Indian language applications market has had more hits than misses and is only now gearing up to come into its own, says Chris Ann Fichardo

Keying in this article is so easy. Hit "Y" and there are no second guesses as to what will appear on the screen. But when computing in any Indian language, this is not the case. Though the personal computer has been around for nearly 20 years in India, its usage is still largely limited to the English-speaking population of the country. And as these users constitute just 5 percent of the country’s population, what this effectively means is that the other 95 percent have not be able to benefit from the information age.

However, there are a host of bodies across the country working to rectify this oversight. Corporate and government research labs like IBM Research Labs, Media Lab Asia, NCST (National Centre for Software Technology), TDIL (Technology Development in Indian Languages), C-DAC (Centre for Development in Advanced Computing), the IITs (Indian Institutes of Technology) and the NIIT-sponsored Centre for Research in Cognitive Systems are doing some great work in this area.

The scope of Indian language applications or indic languages as they are popularly known, is immense. According to a study conducted by hardware industry association MAIT and research firm Frost & Sullivan, revenues from local language applications are likely to touch $64 million by 2005. It also pegged the total market at $11 million in 2002 and has estimated the growth rate to be around 79 percent for the forecast period 2002-2005.

"The IT industry in India is realising that the next spurt of growth will stem from IT services reaching the masses. To make this happen, support for local Indian languages is imperative," says Dr Ponani Gopalakrishnan, director of IBM India Research Lab (IRL). Set up in 1998, IRL has research initiatives in electronic commerce, e-governance, bioinformatics, unstructured information management and technologies for human-computer interaction.

According to Nagarjuna G, companies encoded the fonts instead of the data, and they also kept changing the font encoding, mainly to market their own solutions

Code uncode

Lucrative as the market is, it’s not so easy to cash in on this goldmine. The Indian Constitution officially recognises 18 languages and 10 different scripts used in various states and regions across the country. Developing a font to support each of these languages will not only be time-consuming, but also extremely expensive.

Till date, the government has spent around $300 million in more than 1,000 pilot projects aimed at spreading IT among the masses and has achieved a success rate of just 40 percent.

Making the job even tougher is the lack of uniform standards to develop the various fonts. (See box: IISCI v/s Unicode) Not enforcing a standard in the industry gave rise to private, non-standard solutions. As a result, there was repetition of work and a host of proprietary codes. "Each company encoded the fonts instead of the data, and they also kept changing the font encoding, mainly to market their own solutions. Technically, there is no reason to do so," says Nagarjuna G, Mumbai director of the Free Software Foundation of India.

As there were no standards to adhere to, most indic codes developed remained unique to the owner and thus could not be utilised in developing applications to encapsulate broader areas of computing like banking solutions, legal documents, etc, thus restricting the use of local language applications. Realising the need to develop a standard, C-DAC and the then Ministry of IT (now Ministry of Information and Communications) worked on evolving the national standard which is now called ISCII (Indian Script Code for Information Interchange), recalls C-DAC executive director, R K Arora.

Though this mistake has been rectified the damage has been done and the repercussions are still affecting the industry. Dr Gopalkrishnan, who heads IRL, combats this problem on a daily basis, "When building a local language application, lack of a popular coding standard produces a big challenge. The existing applications have their own fonts and a corresponding keyboard mapping. The statistical techniques used in building the recognition systems require a large amount of training data. Since each such source of data uses its own coding standard, we had to go through an additional step of converting all the different coding standards to a single format. Also, since not all coding standards are supported with complete documentation, the conversion takes greater time and effort."

Damage control

Some standard drafts have been made and presented, such as the eight-bit ISCII or 16-bit Unicode for script standardisation. ISFOC for fonts and INSCRIPT phonetic keyboard layout. However, the final standards are yet to be recommended.

Another significant step is the formation of Technology Development for Indian Languages, (TDIL), which has the onus of developing IT tools in local languages in India. Set-up in 1991, TDIL has sponsored research in developing Indian language computing resources, processing systems, tools and translation support systems and localisation of software for Indian languages. It operates through collaborations with 13 resource centres across India. Some interesting applications have been the local language word processor, LEAP; C-DAC’s GIST (graphics and intelligence-based script), desktop publishing applications and television subtitles in various vernacular languages.

MAIT also supports a consortium of IT companies engaged in the development of local language applications and solutions. Vinnie Mehta, president of the association says that it has done significant work in the area of modification of Unicode to suit the requirements of Indian language computing and also in development of font standards for language computing. Next on the agenda are standards for products, e.g. keyboards.

"On one hand there are issues such as lack of standards in local language computing that have hampered mass adoption of local language solutions, on the other, somehow the user perception has been that local language applications are for free...This is also due the fact that there is rampant piracy in this area and the industry is highly fragmented. It is our endeavour through the consortium that the industry will come to a common platform and focus on collaborative research," elaborates Mehta.

The demand

As the Internet spreads its influence across the width and breath of India, it has become imperative to have a medium that will provide information in a manner that the local user can understand. To take the benefits of the IT revolution to the rest of the non-English speaking population India needs applications and solutions that support vernacular transactions. "The front-end of the computer being English is a limitation in taking IT to the masses that are conversant in a vernacular language. So from a marketers perspective, even if we are able to target 5 percent of the missing 95 percent, the target market will double overnight; this will open doors to huge opportunities," says Mehta.

Collaborating this view, C-DAC’s Arora says, "This is an area that has large potential and there is a big need. If we have to proliferate the usage of computers we have to be able to provide both the hardware and software devices needed. Around 10 years ago people did not even believe that we could create Indic fonts, but now we are evening developing right-to-left fonts like those used to read Urdu, Persian, Arabic, Kashmiri, etc.." C-DAC is currently developing an application called Urdu Nashir, which is a word processor targeted at the publishing industry.

According to Dr Ponani Gopalakrishnan, the next spurt of growth will stem from IT services reaching the masses. To make this happen, support for local Indian languages is imperative

Overcoming the hurdles

Explaining the reason why too much emphasis was not put on indic language applications earlier, Prof Dr Pushpak Bhattacharyya, who is currently heading the Universal Networking Language (UNL) project (See box: UNL—A language for information exchange and transfer over the Internet) at IIT, Bombay says, "India is very English-savvy and we have evolved a tri-lingual formula—Hindi, English and the local language [e.g. Marathi in Maharashtra] and at the central level we have Hindi and English. So a local language font is not yet a crucial necessity—like in Europe or Japan, which is why in spite of spending so much we have not produced some striking products, which would be applicable to Indian languages.. But efforts are on. A lot of the younger generation are actually machine-savvy and they would like to use computers, get educated and access information on a more regular basis."

"Most of the computer-literate population in India is more comfortable with the English language. As the industry matures, and more global companies invest in local language solutions for the masses, more choices will become available to the consumer and we will see a higher rate of adoption of local languages in computing," adds Dr Ponani Gopalakrishnan.

The other reasons that slowed the progress of indic language applications have been lack of development resources and proper guidance. The MAIT-Frost & Sullivan study states that in some instances projects initiated by the government failed due to lack of commercialisation of technology and lax timelines for projects. Also, the basic lexical resources needed to build local language applications were also not in place. Researchers say that India has not invested in its lexical resources, it has yet to realise the need of good dictionaries in the electronic format. It’s only now that institutions like IIT, Bombay are for the first time building in word nets in Hindi, Marathi, Tamil, Gujarati and Oriya.

Funding is also an area of concern. Most of the players in this sector are mid-sized companies or educational institutes with limited financial muscle, hence they often tend to be restrained in terms of their research and development spending on new technologies. And while the corporate sector is investing, the commitment of providing funds is still at a token stage. "So far it’s mainly been the government that has invested in indic projects. The focus from the industry is still not very prominent because they still don’t see a market," says Arora.

"Funding is a big problem in language resource creation and in the language processing area all the activities are extremely manpower intensive, and this manpower is really quality manpower—they have to be trained in computers, linguistics, and have to be paid very well—but there are government restrictions as to how much we can pay our researchers. All over the world language technologists are paid very well, we cannot afford that kind of salaries in India and that is a big deterrent, says Dr Pushpak Bhattacharyya.

Demand analysis

The local language IT market constitutes of about 12-14 vendors. Most of the domestic players are regional and have limited access to the market. They offer both off-the-shelf products and custom-made applications in all major languages. The other set of key players in this market are international players. However, international vendors are yet to take off in a big way in India in terms of application offerings across different languages. IBM offers a Hindi version of Lotus Notes in India, and Microsoft has also taken some steps in this regard. The MAIT-Frost & Sullivan study predicts that the participation of international vendors is expected to increase in the next three years.

According to the study, while in the last three years the market has been driven by off-the-shelf applications for end users such as publications and the government sector, the future growth for indic languages will come from e-governance applications developed for various government departments. E-governance applications will also drive demand in the consulting area. By 2005, sales from consulting will account for 67 percent of total revenues in this segment.

The local language market per se is also expected to come into its own. The law of supply and demand will spur growth. As applications increase the demand for products will also increase. IT for the masses is the new mantra that everybody is chanting. And if the computer is at the doorstep of very village in the country, then the software to run it cannot not be very far behind.


All figures in % and rounded; the base year is 2002
Source: Frost & Sullivan


All figures are rounded; the base year is 2002
Source: Frost & Sullivan


All figures are rounded ($ million); the base year is 2002
Source: Frost & Sullivan


UNL—A language for information exchange and transfer over the Internet

Imagine a scenario where a tourist from Tamil Nadu downloads information from the Goa tourism website in Tamil, while a German tourist simultaneously downloads the same information from the same site in German.

The ease of accessing information in a language one is familiar with is a luxury the Internet still does not provide all its users. And this is one lacuna that the Universal Networking Language (UNL) hopes to rectify. The UNL project is a large-scale international co-operation with the goal to provide information on the Internet in all national languages of the members of the United Nations..

The rationale

The UNL project is an initiative of the United Nations and began in 1996 under the umbrella of ‘Peace and harmony through communication.’ The rationale behind this initiative was that if the language barrier on the Internet continues, i.e. if the Internet continued to be in English, then there would be a huge information gap within humanity itself. Therefore, the UNL endeavour was started. Initially when the project started there were groups all over the world and the UN University in Tokyo co-ordinated the project.

How it works

The language links to a representation called the UNL representation or the language server (see diagram). Each language group had the task of developing translators from their language into UNL form and vice-versa. The first translator from the local language to UNL is called en-converter and UNL to the local language is called de-converter. Dr Pushpak Bhattacharyya and his team at IIT, Bombay were entrusted with building the Hindi en-converters and de-converters.

"The scenario would be where somebody sitting in Tokyo sends an e-mail to a person in India in Japanese and transparent to the receiver/sender this mail gets translated into UNL and from UNL into the target language. The main usage of UNL is the ability to communicate over the Net in a language-independent manner," says Dr Bhattacharyya.

The implications

The ability to automatically translate documents from one language to another. Earlier, machine translation did this process but accurate translation was a problem because a natural language has its own little quirks and is often imprecise, ambiguous and ill-specified, explains Dr Bhattacharyya. "The uniqueness of this effort is that it uses an extremely rich word dictionary and the UNL coders," he adds.

The other application is intelligent information processing on the Web. With UNL documents one can do case matching and word matching at the disambiguous level—so the matching process becomes much richer and unwanted documents are eliminated.

Project status

The UN seeded the efforts, but once the infancy stage was over then all the language groups had to generate their own resources from their own country. The IIT, Bombay effort has received quite a lot of support from TCS, the Indian government and the World Bank, which has commissioned projects that would support these applications. "We are looking at lots of UNL applications and not just the de-converter and en-converter. We have projects where the summary of the documents is created in UNL form, Q&A is in the UNL form, retrieving of the document is done in a better way by filtering out irrelevant things. It’s at a small scale right now but it is an important application," says Dr Bhattacharyya.

"We are hoping that in two years time we should have small tools that are marketable and usable by other agencies. Already, the UNL’s translatory system and the lexicon system which was developed are used by other language processing groups. UNL is still IIT property and negotiating is on with the funding agency to see if the application can be made free; if not the source code, then at least the system," says Dr Bhattacharyya.

Developing a font

Languages are written using scripts. Many languages can be written using a single script. For example, Hindi, Marathi, Nepali and Konkani are all written in the Devnagari script. To be able to use scripts and therefore languages with computers, the characters of a script must be represented by an encoding.

Basic text processes such as display, editing, searching and sorting need a consistent way of encoding characters that enables the exchange of text data and creates the basis for software development.

Plain text is a pure sequence of character codes. It is public, standardised and universally readable, and requires a rendering process to make it visible. Glyphs are images that represent the shapes that characters can have when they are displayed. The character to glyph rendering mechanism is the domain of software processes. Although the encoding encapsulates only the basic alphabetic characters, the number of glyphs and their combinations required for the exhaustive rendering of Indic scripts can be quite large.

A set of glyphs for a script used to display text constitutes a font. The number of glyphs can range from a couple of hundred to a couple of thousand or more depending on the complexity of the script and the font design.

Source: mithi.com

ISCII v/s Unicode

The ISCII (Indian Script Code for Information Interchange) code standard specifies a seven-bit code table which can be used in a seven or eight-bit ISO compatible environment. It allows English and Indian script alphabets to be used simultaneously. ISCII caters to 10 Indian scripts.

Unicode is a 16-bit universal character encoding standard for multilingual text. It covers all the major scripts used for writing Indian languages. The Unicode Standard for Indic scripts is based on the ISCII-1988 revision and is a superset of the ISCII-1991 character encoding. Texts encoded in ISCII-1991 may be automatically converted to Unicode values and back to their original encoding without loss of information.

Drivers and restraints

The top three drivers of the local language application market

  • Introduction and promotion of new technology solutions and applications to cater to the growing needs of end-users.
  • Increasing content creation in Indian languages for the Web.
  • Initiatives in local language projects being undertaken by vendors, central and state governments.
  • Initiatives revolving around the commercialisation of products and applications being developed in the numerous research labs in India.

The top three restraints for the local language software market

  • Lack of formal IT-based language training.
  • Lack of awareness regarding e-governance computing applications at the grassroots level and low PC penetration across the country.
  • Insufficient or delayed implementation of the initiatives taken by different government bodies.
Impact of industry challenges (2003-2009)
  1-2 yrs 3-4 years 5-7 yrs
Lack of standards High Medium Low
Limited availability of software, fonts High High Medium
Lack of local language content High Medium Low
Slow technology progress Medium Medium Low

Source: Frost & Sullivan

<Back to top>


© Copyright 2003: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in
Mumbai by The Business Publications Division of the Indian Express Group of Newspapers.
Please contact our Webmaster for any queries on this site.