|
Semantic Web search
The advent of the Semantic Web
Semantic Web is a search technology that uses sentence logic,
conducts semantic analysis and upon understanding the context of keywords, gives
appropriate answers. By Malabika Sarkar
As
the amount of information in enterprise databases and online data stores expands
exponentially every year, enterprises face the problem of sifting through it
all and sharing it among disparate systems and end users. The problem is that
as the amount of information and numbers of systems increases, traditional index
search methods become largely ineffective. Enter semantic Web search technology,
a non-proprietary way of categorizing and connecting data with contextual information
to make it easier to organize and search.
Most search engines, particularly Google, identify the relevance of a particular
topic using the interconnections between sites as much as they do the text on
any single page. The semantic Web promises to change this because it helps capture
the meaning of data on a page and so gives machines classifying or searching
the Web the capability to work out the relevance of a pages contents to
a particular topic.
A semantic search looks at sentence logic (how words in a sentence relate to
one another) and conducts semantic analysis (it attempts to understand the context
of keywords) to produce appropriate results.
The vision of the semantic search is to enhance and evolve todays Web
by allowing machines to perform tasks through the exploitation of better metadata.
With semantic technology, adding, changing and implementing new relationships
or interconnecting programs in a different way can be just as simple as changing
the external model that these programs share.
Ravi Datanwala, Group Manager Live Search and Acting Windows Live Lead, Consumer
& Online International, Microsoft Corporation India, said, The application
of Powerset technology (Semantic Web technology) to Live Search will enable
Live Search to more quickly surface the most relevant information for our customers,
to help them complete their desired tasks faster.
He continued, At least a third of queries still go unanswered, so we continue
to believe there is lots of room for improvement here, especially with complex
queries. Natural language search will enable us to extract the users query
intent or meaning when returning Web pages or documents following a search,
ultimately delivering great improvements in relevance for the user.
Semantic technologies are meaning-centric. They include tools for auto-recognition
of topics and concepts, information and meaning extraction, and categorization.
Given a question, semantic technologies can directly search topics, concepts,
associations that span a vast number of sources. This provides an abstraction
layer above existing IT technologies that helps bridge and interconnects data,
content, and processes. Secondly, from the portal perspective, you can treat
semantic technologies as a new level of depth that provides far more intelligent,
capable, relevant, and responsive interaction than conventional methods.
Developers have to implement various tools or components in a Semantic Web system
to get the results faster. The first part consists of a set of standards
to describe information and resources (e.g. RDF, XML) in a uniform way so that
data resources worldwide can talk to each other about their data. The second part
includes various technologies and software that developers must implement according
to these standards. Components for processing existing data into something like
RDF, security tools, presentation tools, a query language, a taxonomy, a schema
for rule representation (rules represent the relationship between different
objects), deductive logic or data mining system etc fall under this.
How Semantic technology works?
|
"The
application of Powerset technology to Live Search will allow it to surface
the most relevant information for our customers quickly and help them
complete their desired tasks faster"
- Ravi Datanwala
Group Manager, Live Search and Acting Windows Live Lead, Consumer &
Online International, Microsoft Corporation India
|
The single largest benefit of the Semantic Web is that the
user has to do a lot less work in getting the information or service that they
want. Now, finding some specific information, such as in research is a time-consuming
task that requires some amount of skill. Sometimes you have to look through
hundreds of pages to find what you want. It may not be the way the
search engine has docketed or indexed the information
in its ontology (e.g. Google or Yahoo Directory). To include all such possible
cases and still provide meaningful results, a search engine such as Google has
to have an extremely large data set. Google uses thousands and thousands of
servers due to all this extra information being necessary to guide a users
search. Now, if machine readability is there and machines can directly communicate,
and there is semantic information included in the Web content, which they can
exchange, you can do away with ambiguity and inefficiency in information
retrieval. HTML is merely a presentation format. The Semantic Web
pages instead, additionally contain information on the information
in the pages.
Additionally, the Semantic Web will make available many services that are not
yet easily available, interoperable, more user-friendly by querying
in natural language such as English, and asking higher level logical queries
(such as in finding a suitable air fare). For example, which is the best flight,
or best hotel for a given trip and more? The benefits are endless, but
the main thing is that the technology minimizes the users burden and the
user experience is more pleasant.
Jawahar Malhotra, CTO, Yahoo! India R & D, said, The intent is
to enhance the usefulness of the Web through servers which expose existing data
through standards such as XML, RDF and OWL. Documents will now be additionally
marked-up with automatically generated semantic information about the content
of the document in a way that machines can understand this content. Automated
agents will now use this additional information to perform tasks for users.
Similarly, Powerset technology strives to understand the intent in the customers
search query and relate that intent (what they are trying to find) to relevant
information in Web pages and documents. Unlike traditional search engines,
which just look at words, Powerset reads and extracts meaning from every sentence
in Wikipedia index. When you type a query into Powerset, we try to match
the meaning of query to the meaning of a sentence in Wikipedia, instead of just
returning Web pages, said Datanwala.
Deeper and more meaningful searches
The benefits of a comprehensive semantic Web are endless interoperability, discovery
of new services, intelligent querying in natural language, increased security
and capacity to provide new services etc. Natural language search will help
to extract the users query intent or meaning when returning Web pages
or documents following a search, ultimately delivering great improvements in
relevance. Semantic technology simplifies customer tasks on Live Search including
opinion index, and the recent integration of Powerset technology to power Freebase
answers and improved captions for Wikipedia results.
Datanwala added, With our opinion index, a customer doing a product search
with Live Search is presented with a summary of the sentiment on the Web about
the desired product, which can save consumers both the time and hassle of combing
through multiple Web sites for the information. About Freebase answers,
search queries like San Francisco weather, MSFT, and Banff national park already
produce answers. However, many typical queries do not show answers today
such as musicians, albums, films, etc. To simplify this we selected some
of these categories to return a topic summary with links next to results. Additionally
as Wikipedia articles show up in a large percentage of Live Search queries,
it is important that the captions are top notch.
Powerset technology allows Live Search to generate improved captions based on
the query. These changes are transparent to the end user and allow analyzing
the Powerset captions versus the Live Search captions to see which one performs
better.
Benefits galore
The term semantic Web typically refers to marking up objects with some kind
of special code to identify them, for example, marking a phone number. This
will help search engines to surface relevant content for users, but it requires
the owners of Web content to do some work. Text is often considered unstructured,
but there is a lot of meaning behind words. Semantic tries to unlock that
meaning without forcing publishers to do any work. The whole idea is to
make search engine think more like a human, not make humans think more like
a search engine.
Natural language technology is still in its infancy, but we believe now
is the right time to start the journey. Combined with our proprietary technology,
Powerset has licensed technology from PARC (formerly Xerox PARC), that has been
in the labs for about 25 years now. Only recently have computers become powerful
enough to do all of the processing necessary for understanding natural language,
expressed Datanwala.
Prabhu Ram Raghunathan, a research engineer, software developer and roboticist
from Carnegie Mellon University, said, HTML is flat or two-dimensional
and is only a presentation description language. Search engines that only dig
for HTML find it difficult to extract contextual information or to provide deductive
answers. They work mainly on heuristics, not exact information. Therefore,
there really is no comparison. Older technologies are like 2D and Semantic Web
is like 3D and with this extra dimension come extra features for the user, at
the cost of extra complexity of the system. However, the Semantic Web has not
fully happened yet.
Security concerns
The semantic Web attempts to bring together services from various platforms,
more than before. Therefore, it will inherit problems across all such services. For
instance if someone steals a credit card number from some Web site, all they
can do is go on a shopping spree. In a Semantic Web system, there is much more
information linked back to your credit card, so that you can access various
services. If now, someone steals your ID, they can totally take over and ruin
your life. It is no longer confined to a mere shopping spree; they can dip into
bank accounts, mess with the electricity bill, and snoop on your family and
so on. End-users suffering from Spam e-mails and junk phone calls blasted to
thousands of people will find themselves in the cross hairs of targeted and
personalized spam with the semantic Web.
Further speaking on this Raghunathan averred, When so many services are
interoperable, a distributed denial of service attack, will no longer be
a mere disruption of a Web site or your e-mail service, but could disrupt multiple
services in multiple areas in one go. Countering such potential security issues
is by the old saying that you keep one step ahead and minimize potential threats.
Security here focuses on the concepts of trust and trusted sources, digital
signatures to verify the metadata, non-repudiation etc. However, yet, there
is no unified view or a reference implementation. The biggest challenges are the
slow adoption of standards and the conversion of existing databases into machine-readable
form. This is why semantic Web has not fully explored. The other large challenge
is in acquiring and processing semantic information itself.
Guruduth Banavar, Director, IBM-India Research Laboratory, said, People
know a lot about how the world works and every document is written in this context.
A computer does not understand many basics, and thus, have a lot of trouble
with documents that take advantage of this context.
What we can expect?
Several semantic Web type functions are already available. In 2009, we could
see progress in multimedia retrieval and cyber-security applications becoming
smarter and so on. Raghunathan explained, Numerous applications have now
started counting on the metadata or information on information.
I would not expect the full blown Semantic Web to become big next
year. However, I would expect a lot incremental advances. It is a continuum.
The average search enginebe it something general like Google or something
Web site-specific like Amazons A9 will become much smarter. Especially
in the financial domain, analytics driven information retrieval will enable
better oversight.
Semantic Web technologies on the intranet can have a great impact. The more
an organization knows about its data resources, the more efficient its operations
are. Moreover, the targeted marketing and selling becomes easier. Semantic Web
technology also enables firms to provide services they never could provide earlier
such as intelligent interfacing, natural language, context driven searching
etc.
As the information on the Web is more easily machine understood, it will improve
the overall search experience, but the semantic Web has implications well beyond
search. However, it is still unclear as to whether this technology is at a stage
where it will have tangible business impact in the years to come. There are
speculations that semantic Web technology will take longer to evolve and for
us to realize the full benefit of this technology. Nevertheless, yes, it will
definitely have better information retrieval results and help in decision-making.
malabika.sarkar@expressindia.com
|