Science for SEO: "I won't adopt the semantic web!"

I've heard variants of this for quite a long time, in fact since the semantic web thing became mainstream. It's never easy to introduce something new, as novel applications designers will know, and users don't want to learn something new, however easy it is to pick up. The semantic web is suffering from this too, in fact many webmasters don't want to adopt it saying it's too much work or they can't see why it's important etc...

Well low and behold (you know me by now) I found a paper which addresses this issue in a very intelligent way and answers quite a few questions. It's called "What is an analogue for the semantic web and why is having one important?" by mc schraefel from the university of Southampton (which regularly produces cool papers).

First off, an "analogue" in this sense is something that has a direct resemblance to something else. For example print (books and things) is analogous to the web because a web page is like a page from a book or newspaper, manual and so on. The central topic of this paper is about finding something analogous to help people understand the semantic web so it's not so foreign and scary or whatever.

The author says that the web has been represented by Pages + Links and proposes that the semantic web be represented by Notebook + Memex.

The Memex was invented by Vannevar Bush in 1945. It's the concept of an online library, an interconnected knowledge-base.

He points out that the semantic web offers very powerful way to interact with information, and to build new interactions for that information. He does, as I do, believe that the entire issue concerning the semantic web and its acceptance is due to the research community not communicating it properly.

"It is important to note that the motivation for this question of analogue is not a marketing/packaging question to help sell the Semantic Web, but is simply a matter of fundamental importance in any research space: it is critical to have both a shared and sharable understanding of a (potentially new) paradigm. If we do not have such a shared understanding, we cannot interrogate the paradigm for either its technical or, perhaps especially, its social goals."

He notes that all web 2.0 stuff has been based on highly familiar models, as RSS for example is still text and the idea of tag clouds and tagging is still displayed just like a catalogue. The understanding of the web when it was 5 years old is very different to the understanding of the semantic web which has just turned 5. He also very nicely points out that the wikipedia entry for the semantic web is a bit rubbish: "All that description tells anyone about the semantic Web is that it is for Machines." The emphasis as far as semantic web researchers are concerned is the end user. It is all about creating powerful links in information so that question-answering is made possible and a whole host of other knowledge discovery methods.

"But how do we describe this potential? For a community steeped in rich link models, Hypertext is an obvious conceptualization. But beyond this community, Hypertext equals “a page with links” – it equals the current Web, not the rich possibility of what we might call Real Hypertext, which was modeled in Note Cards and Microcosm."

The hyperlink in the semantic web can be thought of as "meaning" which is one of the hard to grasp concepts. "That is, the way meaning is communicated that is not via the explicit prose page or catalogue page, but is via the exposure of the ways in which data is associated, and can be discovered, by direct semantic association, for the reader/interactor/explorer to make meaning."

The author says that the notepad is a good way of introducing the semantic web because it is a page but is unstructured, Calendars and things can be shared, and things like Twitter allow users to post snippets. "One item can act as a way of redefining another". Google docs and various snippet keepers on the web (like the now defunct Google notebook) mean that data is generated and shared and linked in too. The memex is good too because it is designed to retrieve data and not "denature it" by this we mean that it isn't taken out of context.

The semantic web languages like RDF mean that everything can be connected and retrieved intelligently and effectively. It is "automatic structure extraction" which means that you can look at all of your information in the context it was created or saved in. The data can also be associated to other relevant information too. For example you could find other people talking about the same topic, or working on the same kind of project, events you might be interested in...

The author observes that the notepad model has limitations like the structure and also "viewing page 6 next to page 36" isn't easy. He suggests the note card model which is a stack of cards with ideas, people, data about stuff written on them and interlinked to external data. "The relevance of the note card model to the concept of the Semantic Web as personal work space with associated public data is in the integration of personal ideas with external sources: the idea cards are backed up with/informed by the quotations from external sources."

He concludes asking if we are ready for a system to support such creativity. This means that computers are no longer simply used for productivity. He does believe that we are. I also believe that we are. More importantly I believe that the semantic web is a foundation for far more intelligent systems. We need to experiment with this model and develop it properly in order to advance.

Obviously this kind of system can't be implemented by humans completely and needs to be automated, which it is really. There are all sorts of programs out there that will create stuff for you.

Why should you care?

If you don't embrace the semantic web it is likely that you will be left behind because it is a necessary step in web evolution. It will become widespread and it is important to be prepped for it. It should have your support because it is an ingenious and very powerful way of dealing with the ever growing mass of data available.

4 comments:

Anonymous said...: I also think a factor is the lack of tutorials on how to implement the semantic web correctly is lacking.

there is copious amounts of information to go through, but very few resources for the new kids to take advantage on.

but completely agree with all other points noted.; 29 January 2009 at 09:11
CJ said...: Hey Ben,

actually I'm preparing a tutorial on how to prep your site for the semantic web - should be available soon!

cj; 29 January 2009 at 09:16
Anonymous said...: It's obviously not automated enough yet if it requires a tutorial! :)

I think this is the biggest obstacle to wider adoption, if it cannot be completely automated (and cost-effectively for the big service-providers to adopt it) then it has to become a natural part of content creation - something I've just not really seen yet.

Despite that I look forward to reading the tutorial and trying some of it out when you publish :); 29 January 2009 at 12:44
CJ said...: Hey Chris,

to be fair webmaster had/have tutorials for html, php, css and all sorts of things like that that make the current web possible. Web 3.0 requires new techniques also and this is no different from the web 1.0 stuff and even the web 2.0 stuff that they had to embrace and accept and learn.

Of course automation is necessary mainly because as del.icio.us, blogs and such things showed us that users are a bit useless at tagging things up! They're unusable. Automated tagging (which is just a part of it) needs to happen and is of course on its way.

I'm launching a new blog today or tomorrow and it will eventually be all semanticised as best as possible for now. A bit of an example to what you can do. The tutorial just shows how to do some of that and shows you all the tools that are available right now.

Semantic web isn't finished yet but then neither was any of the web 1.0 stuff for a long time. Semantic web as you already know is just an extension of web 3.0. To be part of web 3.0 it's necessary to implement those extensions.

Your comment is very valid and exactly what Nicholas Belkin and others have said. The researchers in this area have not worked hard enough.

cj; 29 January 2009 at 23:39