Semantic Conference in March in San Francisco 


It does appear like the momentum is picking up. The brochure says the market for "semantic technologies" will be $63 Billion by year 2010. I haven't any idea what that means, I wish they could tell me where exactly the market is so that I can cash out.

"Semantics" has always been hot, so there isn't much new there. There will certainly be some progress in both exiplicit representation of semantics (using RDF/OWL/Rules) and derivation of semantics (from raw data) over the next decade, but there won't be miracles. VCs, use caution and tread carefully!

So, is it 2005? or 2050? 

Year 2004 generated considerable buzz/hype about the Semantic Web, but not much changed really. So, when is the liftoff? 2005 or 2050? Or is it going to be a slow toiling towards much more modest goals?

Well, may be the web is already "semantic" enough? What do we call what Google does? Or possibly futuristic web search engines like clusty.com? Except that they don't work on explicit semantic representations, which is their strength.

I don't expect much exciting news from the Semantic Web world in 2005 either, but I would be ecstatic to welcome any.

Semantic Web: The Next Wave? (contd) 

[previous post] I came across an interesting debate on Semantic Web, sparked by Clay Shirky's article, The Semantic Web, Syllogism, and Worldview, which generated many responses (Bb, Ayers, Bray, Ford). I like the last one from Paul Ford, not only because he writes well, but also because he has done some very interesting things with Semantic Web ideas.

Paul has several interesting articles that are worth reading on his web site, especially the 'Google takes it all' article. He designed and built Harper's Magazine website using the Semantic Web technologies with "3,000 facts, 6,000 events, 12,000 links, 500 topics, and over 939 separate HTML pages. 300,000 words." That's the first of a kind I have seen so far, and if nothing else, it gives us a glimpse of what is possible.

Shirky makes some important points, but misses many. I don't think the majority of the Semantic Web community believes in a single world view (global ontology) as suggested by him, and it is not necessarily a requirement for the success of Semantic Web. As I opined earlier, many interesting applications will evolve in specific contexts (mini / micro worldviews) rather than in the global context (one worldview). (more on the subject)

Topic Maps - What are they good for? 

Topic maps are inherently designed for back-of-the-book indexes and have been extended to encompass other kinds such as glossaries, thesauri and cross references. But, they are too general to limit their use to their initial intended purposes. They can be used to encode arbitrarily complex knowledge structures and link them to information assets, which brings up the debate - which of the two, topic maps and RDF/OWL better for a given task?

There are many informative articles (1, 2, 3) that discuss the differences between these two standards and how they can interoperate or even integrate. Most of the demonstrated applications of topic maps fall in the back-of-book indexing world for informational navigation (e.g., IRS Tax Map), and topic maps are a more natural choice for such applications. I wonder if one can build a good enough ontology in OWL to provide similar semantics for IRS Tax Map kind of applications - I haven't tried. Topics maps have no formal theory and don't guarantee computational completeness and decidability, which means one can shoot one's foot easily overusing topic maps for general knowledge representation.

I certainly wish there were only one set of standards, in stead of many, especially if the same purposes can be achieved with a minimal set of standards. Oh well, there is no ideal world.

Mine and Navigate Unstructured Information  

I think the stage is set for the technologies that help navigate and manage unstructured information in corporations. Many vendors sell products now including Autonomy, ClearForest, InXight, Stratify, SAS, Entrieva, Verity, and Vivisimo. Typical capabilities offered by these products include automatic classification, summarization, taxonomy generation, clustering and concept-based information retrieval. Eventually I think the core technologies themselves will get commoditized (even free and open source) though there will always be premier products.

The real challenge will be to roll out solutions based on these technologies, often combined with other systems such as business process management and collaboration systems, that address specific problems. For example, large engineering organizations can insert these capabilities in their PLM and benefit significantly. Or PLM vendors can integrate them in their products. For hiring managers, wouldn't they be happy to see a neat classification of resumes, preferrably ranked aginst job openings, in stead of requiring them to go through and manually classify them? Vivisimo already does a decent job of categorizing web search results obtained from multiple search engines - not good enough to be my default search engine, but do find it useful often.

The next level of technologies using metadata, topic maps, ontologies and such will take a bit longer considering the need for better planning and higher level of effort. Tools should help us adapt them faster (e.g., automatic metadata generation, automatic topic creation for topic maps, etc).