Semantic Web - Towards a Science of the World Wide Web

I attended two talks on the semantic web by Professor James Hendler, a few years ago. Lectures slides are available on the UoE distinguished lectures webpage. Here are my notes.

The important information for a general public. Corporate Semantic Web (making good money by using data integration/linkable concepts technologies) starts to flourish.For educators like you, once you have the slides, check out the end of the speech, some examples about a scout troupe and how to give access to different pictures based on a simplistic expert system type of rules. If the person who requests to view the photo is a member of the troupe, then grant access. If the person is not part of the troupe, then grant access only if the parents have authorized the photos to be shown to the public, etc.

Part 1. Towards a Science of the World Wide Web

Abstract provided by the author:Computer Science research in the area of the World Wide Web has largely focused on improved search for individual web pages or on the modeling of Web connectivity (using the tools of networking). However, given the huge impact of the Web on our world, this seems to be an impoverished view. What are the principles of engineering that have made the Web flourish? How can we engineer new technologies, that will extend the capabilities of the Web? What are the social impacts of Web use, and how can Web technologies both allow greater freedoms while preserving the ones we have?In this talk, I will use some examples from Semantic Web and "policy aware" information access to demonstrate new Web technologies and how we might explore some of the trade-offs between making it easier to integrate information on the Web with protecting that information from abuse. I will explore some of the emerging trends on the Web including social networking, blogging, and beyond page search, and discuss some of the research and technology challenges that they pose to continued Web growth and access, and some new technologies being explored to address these challenges.

Par 2. Dark Side of the semantic web

Dark side in the title doesn't refer to a negative force but more to the hidden face of the moon. The fact that when people talk about the semantic web, the predominant concept that comes across is semantic... while the web part is important as well.A notion Prof. Hendler used very often during his talks, using different allegories is the one that power comes from putting together a variety of islands of knowledge that pre-existed, linking them together.This is really what the semantic web is about. Data integration and Concept linking. To quote Prof Hendler: "Linking is power". Resources power by exploiting links to web resources, data resources, each other, web 2.0 like annotations on web resources, or annotations in official resources.An interesting characteristic of semantic web technologies compared to usual technologies is that as in many other domains, both a high end and a low end coexist (high end = big corporation with huge amounts of data to manipulate and huge budgets to buy or fund the design of specialized tools; low end = hackers, hobbyists, small or even one person teams). However, here it seems that commercial success opportunities (making money) is more apparent in the low end than in the high end. What we see mostly is commercialisation coming from ajax, 3 tiers web architecture, etc. Commercial success doesn't come that much from the ability to design a tool that will implement complex processes to manipulate huge amounts of data. It rather comes from using a bit of help from knowledge to make an application more intelligent and more interesting to the end user.

Then followed a few slides with a number of acronyms

  • RDF, OWL (ontology web language), SPARQL (query language for RDF) [TBD: explain what these technologies are about]
  • RIF: Rules Interchange Format
  • (Open)Cyc : world's largest and most complete general knowledge base and commonsense reasoning engine.
  • SKOS (Simple Knowledge Organization System)
  • LSID (Life Sciences Identifier)
  • MMORPG (massive multiplayer online role-playing game)

Most of these appear in some form in the layer cake that Berners-Lee is used to show in his presentations(this is a kind of old version, the guy showed an expanded version where in place of ontology vocabulary, you had SPARQK, OWL, RDF)The version most like the one that was presented is this:Basically, that graph is supposed to capture all the different layers that the semantic web is made of.

An excellent point Prof. Hendler made was that most of these technologies were fairly complex and required kind of big time investment to get to master them and understand what was possible or not to do with them. Even if a good collection of tools starts to exist, it still remains that access to these technologies is not a piece of cake.To quote him. "If you believe in a vision, then you should make it easy for developers to follow that vision."Ontologies can be used to capture any knowledge expressed in a semi-structured way.A new interesting trend is therefore to propose subsets of these RDF/OWL technologies. Initially, it was thought that persons would request more expressivity for these markup languages. The trend now is to rather request for more specific, limited expressivity... to use only the little bit of expressivity that helps you a lot and leave on the side the more extended expressivity that you only require very occasionally.He mentioned various initiatives in that direction

  • GRDDL embedding RDF into traditional content (alike microformats, I assume) -- apparently a way of extracting RDF from XML docs (using XSLT). There is some w3 info on this.
  • OWL mini (or OWL Fast, or OWL Prime, or OWLET, or whatever it might end up being called). That's kind of new and you don't really find much more than mailing list posts on the topic.
  • RDF++. Again, something that appears very much under development. Some blog posts and internal presentations

A few more points:

1. Changes in Data Structure

According to James Hendler, the data sources that are being queried after some code is being executed have started to change from data stored in a relational database to data stored in a RDF triple store. In short, "RDFStore implements a generic hashed data storage that allows to serialise RDF models, resources, properties and property values either to disk or in-memory data structures."

2. What's so special about RDF/OWL?

A. To start with, they don't require a priori fully structured representation of the knowledge like relational database. They are designed to cope with semi-structured types of knowledge.B. RDF+OWL are designed to live in an open and distributed environment.C. If you take OWL, for instance, it is grounded into URI (Uniform Resource Identifier). This means that anywhere within your knowledge tree, you put a URI as content of a node. Rather than give a definition of a term or clone the definition form elsewhere like you would be forced to do in a relational database, you can simply refer to some knowledge that is held elsewhere (and eventually kept up to date on that other site). The big distinction then to make is between the URL that corresponds to the usual web page model which can link to a web page like the one of the Guardian (a newspaper in the UK) where the information is susceptible to change everyday and a URI to be used in a knowledge tree. In the second context, the URI doesn't let you retrieve information content per se but information about where that information content is to be found. An important implication of this is of course that the URI for this second type of resource (pointer to the information content) has to remain persistent for it to be useful to link to it.D. Ontologies allow for web-like relationships between data, which is not easily done in a typical relational database. This corresponds to a point he made 3 time during the talk. The general idea being that links on the web have some paralell with links int he real world. He made a connection with this: Connected Services Framework 3.0 Developers Guide.

3. Information being pulled and pushed

The current model of the web is one where information is being pulled. Though Ajax can make the Pull happen a lot quicker and give an illusion of push technology, it still remains a Pull model. Agency would have for contribution to act as pushing agents (probably of more specific interest for mobile web technologies).

Further readings

Further Links

  • Swoogle - Semantic Web Search. Swoogle is a crawler-based indexing and retrieval system for the Semantic Web. That is, it indexes RDF and OWL documents, rather than plain HTML documents.
  • ebiquity - Building intelligent systems in open, heterogeneous, dynamic, distributed environments. Check out the blog, in particular.
  • Ontology Resources on the widged wiki

If you are interested in these issues, you may be interested in last year's presentation, by Tim Berners-Lee: You get everything, slides + video of the talk.

Is there any relatively specific problem that you hope to solve with semantic web (data integration and concept linking) types of technologies?

Powered by Drupal, an open source content management system