Untitled Document
www.expresscomputeronline.com WEEKLY INSIGHT FOR TECHNOLOGY PROFESSIONALS
29 December 2008  
Untitled Document
Sections

Buzz around technology
IT'S South
Technology Life

Express Intelligent Enterprise

Events

Technology Senate
Technology Sabha

Services
Subscribe/Renew
Archives
Search
Contact Us
Network Sites
Exp.Channel Business
Express Hospitality
Express TravelWorld
feBusiness Traveller
Express Pharma
Express Healthcare
Express Textile
Group Sites
ExpressIndia
Indian Express
Financial Express

Untitled Document
 

Semantic Web search

The advent of the Semantic Web

Semantic Web is a search technology that uses sentence logic, conducts semantic analysis and upon understanding the context of keywords, gives appropriate answers. By Malabika Sarkar

As the amount of information in enterprise databases and online data stores expands exponentially every year, enterprises face the problem of sifting through it all and sharing it among disparate systems and end users. The problem is that as the amount of information and numbers of systems increases, traditional index search methods become largely ineffective. Enter semantic Web search technology, a non-proprietary way of categorizing and connecting data with contextual information to make it easier to organize and search.

Most search engines, particularly Google, identify the relevance of a particular topic using the interconnections between sites as much as they do the text on any single page. The semantic Web promises to change this because it helps capture the meaning of data on a page and so gives machines classifying or searching the Web the capability to work out the relevance of a page’s contents to a particular topic.

A semantic search looks at sentence logic (how words in a sentence relate to one another) and conducts semantic analysis (it attempts to understand the context of keywords) to produce appropriate results.

The vision of the semantic search is to enhance and evolve today’s Web by allowing machines to perform tasks through the exploitation of better metadata. With semantic technology, adding, changing and implementing new relationships or interconnecting programs in a different way can be just as simple as changing the external model that these programs share.

Ravi Datanwala, Group Manager Live Search and Acting Windows Live Lead, Consumer & Online International, Microsoft Corporation India, said, “The application of Powerset technology (Semantic Web technology) to Live Search will enable Live Search to more quickly surface the most relevant information for our customers, to help them complete their desired tasks faster.”

He continued, “At least a third of queries still go unanswered, so we continue to believe there is lots of room for improvement here, especially with complex queries. Natural language search will enable us to extract the user’s query intent or meaning when returning Web pages or documents following a search, ultimately delivering great improvements in relevance for the user.”

Semantic technologies are meaning-centric. They include tools for auto-recognition of topics and concepts, information and meaning extraction, and categorization.

Given a question, semantic technologies can directly search topics, concepts, associations that span a vast number of sources. This provides an abstraction layer above existing IT technologies that helps bridge and interconnects data, content, and processes. Secondly, from the portal perspective, you can treat semantic technologies as a new level of depth that provides far more intelligent, capable, relevant, and responsive interaction than conventional methods.

Developers have to implement various tools or components in a Semantic Web system to get the results faster. The first part consists of a set of standards to describe information and resources (e.g. RDF, XML) in a uniform way so that data resources worldwide can talk to each other about their data. The second part includes various technologies and software that developers must implement according to these standards. Components for processing existing data into something like RDF, security tools, presentation tools, a query language, a taxonomy, a schema for rule representation (rules represent the relationship between different objects), deductive logic or data mining system etc fall under this.

How Semantic technology works?

"The application of Powerset technology to Live Search will allow it to surface the most relevant information for our customers quickly and help them complete their desired tasks faster"

- Ravi Datanwala
Group Manager, Live Search and Acting Windows Live Lead, Consumer & Online International, Microsoft Corporation India

The single largest benefit of the Semantic Web is that the user has to do a lot less work in getting the information or service that they want. Now, finding some specific information, such as in research is a time-consuming task that requires some amount of skill. Sometimes you have to look through hundreds of pages to find what you want. It may not be the way the search engine has ‘docketed’ or ‘indexed’ the information in its ontology (e.g. Google or Yahoo Directory). To include all such possible cases and still provide meaningful results, a search engine such as Google has to have an extremely large data set. Google uses thousands and thousands of servers due to all this extra information being necessary to guide a user’s search. Now, if machine readability is there and machines can directly communicate, and there is semantic information included in the Web content, which they can exchange, you can do away with ambiguity and inefficiency in information retrieval. HTML is merely a presentation format. The ‘Semantic Web’ pages instead, additionally contain ‘information on the information’ in the pages.

Additionally, the Semantic Web will make available many services that are not yet easily available, interoperable, more user-friendly by querying in natural language such as English, and asking higher level logical queries (such as in finding a suitable air fare). For example, which is the best flight, or best hotel for a given trip and more? The benefits are endless, but the main thing is that the technology minimizes the user’s burden and the user experience is more pleasant.

Jawahar Malhotra, CTO, Yahoo! India R & D, said, “The intent is to enhance the usefulness of the Web through servers which expose existing data through standards such as XML, RDF and OWL. Documents will now be additionally marked-up with automatically generated semantic information about the content of the document in a way that machines can understand this content. Automated agents will now use this additional information to perform tasks for users.”

Similarly, Powerset technology strives to understand the intent in the customer’s search query and relate that intent (what they are trying to find) to relevant information in Web pages and documents. “Unlike traditional search engines, which just look at words, Powerset reads and extracts meaning from every sentence in Wikipedia index. When you type a query into Powerset, we try to match the meaning of query to the meaning of a sentence in Wikipedia, instead of just returning Web pages,” said Datanwala.

Deeper and more meaningful searches

The benefits of a comprehensive semantic Web are endless interoperability, discovery of new services, intelligent querying in natural language, increased security and capacity to provide new services etc. Natural language search will help to extract the user’s query intent or meaning when returning Web pages or documents following a search, ultimately delivering great improvements in relevance. Semantic technology simplifies customer tasks on Live Search including opinion index, and the recent integration of Powerset technology to power Freebase answers and improved captions for Wikipedia results.

Datanwala added, “With our opinion index, a customer doing a product search with Live Search is presented with a summary of the sentiment on the Web about the desired product, which can save consumers both the time and hassle of combing through multiple Web sites for the information.” About Freebase answers, search queries like San Francisco weather, MSFT, and Banff national park already produce answers. However, many typical queries do not show answers today such as musicians, albums, films, etc. To simplify this we selected some of these categories to return a topic summary with links next to results. Additionally as Wikipedia articles show up in a large percentage of Live Search queries, it is important that the captions are top notch.

Powerset technology allows Live Search to generate improved captions based on the query. These changes are transparent to the end user and allow analyzing the Powerset captions versus the Live Search captions to see which one performs better.

Benefits galore

The term semantic Web typically refers to marking up objects with some kind of special code to identify them, for example, marking a phone number. This will help search engines to surface relevant content for users, but it requires the owners of Web content to do some work. Text is often considered ‘unstructured’, but there is a lot of meaning behind words. Semantic tries to unlock that meaning without forcing publishers to do any work. The whole idea is to make search engine think more like a human, not make humans think more like a search engine.

“Natural language technology is still in its infancy, but we believe now is the right time to start the journey. Combined with our proprietary technology, Powerset has licensed technology from PARC (formerly Xerox PARC), that has been in the labs for about 25 years now. Only recently have computers become powerful enough to do all of the processing necessary for understanding natural language,” expressed Datanwala.

Prabhu Ram Raghunathan, a research engineer, software developer and roboticist from Carnegie Mellon University, said, “HTML is ‘flat’ or two-dimensional and is only a presentation description language. Search engines that only dig for HTML find it difficult to extract contextual information or to provide deductive answers. They work mainly on heuristics, not exact information. Therefore, there really is no comparison. Older technologies are like 2D and Semantic Web is like 3D and with this extra dimension come extra features for the user, at the cost of extra complexity of the system. However, the Semantic Web has not fully happened yet.”

Security concerns

The semantic Web attempts to bring together services from various platforms, more than before. Therefore, it will inherit problems across all such services. For instance if someone steals a credit card number from some Web site, all they can do is go on a shopping spree. In a Semantic Web system, there is much more information linked back to your credit card, so that you can access various services. If now, someone steals your ID, they can totally take over and ruin your life. It is no longer confined to a mere shopping spree; they can dip into bank accounts, mess with the electricity bill, and snoop on your family and so on. End-users suffering from Spam e-mails and junk phone calls blasted to thousands of people will find themselves in the cross hairs of targeted and personalized spam with the semantic Web. 

Further speaking on this Raghunathan averred, “When so many services are interoperable, a distributed denial of service attack, will no longer be a mere disruption of a Web site or your e-mail service, but could disrupt multiple services in multiple areas in one go. Countering such potential security issues is by the old saying that you keep one step ahead and minimize potential threats.”

Security here focuses on the concepts of trust and trusted sources, digital signatures to verify the metadata, non-repudiation etc. However, yet, there is no unified view or a reference implementation. The biggest challenges are the slow adoption of standards and the conversion of existing databases into machine-readable form. This is why semantic Web has not fully explored. The other large challenge is in acquiring and processing semantic information itself.

Guruduth Banavar, Director, IBM-India Research Laboratory, said, “People know a lot about how the world works and every document is written in this context. A computer does not understand many basics, and thus, have a lot of trouble with documents that take advantage of this context.”

What we can expect?

Several semantic Web type functions are already available. In 2009, we could see progress in multimedia retrieval and cyber-security applications becoming smarter and so on. Raghunathan explained, “Numerous applications have now started counting on the metadata or ‘information on information’. I would not expect the ‘full blown’ Semantic Web to become big next year. However, I would expect a lot incremental advances. It is a continuum. The average search engine—be it something general like Google or something Web site-specific like Amazon’s A9 will become much smarter.” Especially in the financial domain, analytics driven information retrieval will enable better oversight.

Semantic Web technologies on the intranet can have a great impact. The more an organization knows about its data resources, the more efficient its operations are. Moreover, the targeted marketing and selling becomes easier. Semantic Web technology also enables firms to provide services they never could provide earlier such as intelligent interfacing, natural language, context driven searching etc.

As the information on the Web is more easily machine understood, it will improve the overall search experience, but the semantic Web has implications well beyond search. However, it is still unclear as to whether this technology is at a stage where it will have tangible business impact in the years to come. There are speculations that semantic Web technology will take longer to evolve and for us to realize the full benefit of this technology. Nevertheless, yes, it will definitely have better information retrieval results and help in decision-making.

malabika.sarkar@expressindia.com

 


Untitled Document

UNSUBSCRIBE HERE
Untitled Document
© Copyright 2001: Indian Express Newspapers (Mumbai) Limited (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by the Business Publications Division (BPD) of the Indian Express Newspapers (Mumbai) Limited. Site managed by BPD.