14 August 2007

Wikipedia, Freebase and the Semantic Web

There is a lot of discussion about how to organize information on the Web. For that matter, there is, and always has been, a lot of discussion about how to organize information generally. I have been on Freebase for about the last month and have found the differences between its approach and those of Wikipedia, at one extreme, and the Semantic Web, at the other, very enlightening. It is not that I think the Freebase is the ultimate answer, I do not. However, I do think that it offers a very interesting alternative to the other two extremes of information organization. Freebase offers a middle ground between the two extremes. It offers the ability to add as much information as possible, but makes only one requirement -- that each 'topic' has only one instance. In recognition of the Semantic Web, a series of high level 'types' are being created, but, unlike the Semantic Web, anyone can extend and create new types as they wish. This may not seem like much of a change, but it is, in fact, quite profound.

The key differences between the Semantic Web and Wikipedia are very telling. If we first look at the difference with the Semantic Web, we see that Freebase has abandoned the central tenant of the Semantic Web (SM). This is that there is a universal logical structure to all knowledge, and that this can be defined (by a select few at the top of the W3C). The Types of Freebase may seem to be very similar to the high-level types of the SM, but they are much more like a hybrid of RDFs and OWLs. Rather than creating a pyramid of truth, as SM is trying to do, Freebase's Types are a more traditional, and more pragmatic, categorisation of things. The high-level Types of Freebase to not, yet, claim to be higher order concepts, but generalised conventions such as 'people', 'places', 'times', etc. I will not go into the discussion as to why these Kantianesque categories are problematic as it doesn't really matter here. We can happily use these categories within Freebase even though in most contexts they are problematic and uncertain. The key point here is that the pragmatic categorisations -- Types -- of Freebase are infinitely extendible where those of the SM are ultimately reducible. Freebase may seem very Semantic Webish, but it is not as it inverts the logical structure of its categorisation. Whereas the SM starts from the messy diversity of the information world and, it hopes, progressively refines it to basic principles, Freebase starts from some pragmatic general categories and allows us all to extend them.

The differences with Wikipedia are even more interesting. Whereas Types are a kind of inversion of SM's hierarchy, Freebase takes on the Wikipedia's uncontroled extension through its definition of 'topics'. By insisting that each "thing" in the world has only one instance -- one topic entry -- Freebase hopes to overcome the multiple accounts that proliferate on Wikipedia. They hope that it will be the categorisations that will proliferate, not the instances.

This is not a bad idea, though it is fraught with its own problems. Robert Cook and I had a few discussions about this problem here and on Freebase, though I don't think I expressed my concerns very well. Perhaps I can clarify a bit here.

What is very interesting about Wikipedia, and all Wikis, is that when a topic is begun it takes a bit of time to stablise. The process of stablisation usually occurs as a certain group of wiki-editors appropriates the topic and keeps others from complicating their version. As a result, others, who may disagree with the now 'authoratative' account create other entries with different accounts. We might call this process "budding'. Other accounts "bud" off of the original stable account to create a constellation of accounts around any topic. It is this budding that Freebase is attempting to avoid.

As I have stated below I do not think that this is a problem as I see it as a sensible pragmatic decision. Not lease as this problem, how to link-up all the different account which surround a topic, is one of the most difficult in the history of philosophy. Thomas Kuhn, for one, demonstrated in the 1960s that this kind of budding around a stable topic is the key mechanism of paradigm shifts in science. Others have argued since that it is a key mechanism in all knowledge production. As such, to legislate against this budding around topics could have serious implications for the future of Freebase.

A problem is, though, that to go down the route of Wikipedia won't work either. There is no way of accounting for the discursive connections between the stable topic and the buds. By Freebase keeping one topic instance and one topic instance only, they overcome the problem of the multiple instances, but at the cost that they deny any mechanism for accounting for the new and diverse opinions that create paradigm shifts in knowledge. What I am arguing here is not very different to that proposed by Marvin Minsky in his Society of Mind theory. Or, for that matter by Danny Hillis, the founder of Metaweb.

I am afraid that I too have no real solution to this problem, but I do ask the people at Metaweb to not ignore this problem by claiming that the single instance topic is philosophically real.

02 August 2007

Freebase revisited

I was very pleased to see that Robert Cook responded to my comments below, and felt that he was right on one point, that I needed to clarify my final point. I agree that I did finish off a bit abruptly.
It is not that I think that Freebase will fail. In fact, I think quite the opposite. Though, I do think that my point is germane not only to Freebase, but to such knowledge accumulations generally. The problem is that we think we know things by knowing how to name them -- and knowing what the name means. We are, in the West, constantly taught that this is how we know. We are bombarded with training manuals that classify the subject and explain to us this classification. We are constantly exposed to a media that is classifying and naming the events around us. We are constantly trying to understand what is going on around us by finding the appropriate names and the appropriate meanings for those names.
We are told that we are this sort of an employee; that we are that sort of a resident; that we are this class of a tax payer; that we are male or female or gay; that we have this type of body; or, hopefully not, this kind of disease with these characteristics. We are classified, ordered and named constantly. We are told that the advances of science and medicine and society are this or that sort of thing. We are told that the problems of the world are due to this type of person, or this type of belief or, worse, this type of religion.
We are also told that types of things have definitive characteristics. When these definitive characteristics are correct, the thing is right, when they are incorrect, the thing is wrong. We see this in the gay debate, or the debate about terrorism. It is not that different people have different characteristics, or that they interpret their characteristics differently -- or even that the social context of these interpretations is very complex -- but that there are bad characteristics, ones that do not fit the norm. But, of course, what is the norm, and how is it defined? I'm not going to go into that, as there is a huge literature on this subject. I would just point you to, if you are interested, the work of Michel Foucault and the hundreds of works about the social construction of the norm.
We could also ask the question, which is more pertinent to the Freebase discussion, What is a thing anyway? Is it a unique entity that simply has names and characteristics defined onto it, or is it something more fluid, dynamic and constructed? Now I'm not an idealist, I do not believe that everything in the world happens in my head, and that there is no reality outside of my mind. But there is a big difference between the physical object and what we say that physical object means.
Naming and classifying an object is certainly a kind of meaning, it would be absurd to say it wasn't. However, it is but a 'kind' of meaning, if I can use classification to explain classification. We use classifications because they are useful, very useful, but usefulness implies use. We use classifications, or whatever method, to understand things because the actions performed, social and practical, allow us, with others, to construct an account of the world that supports other meaningful actions. In this sense, we do not have understanding, as we have a car or a house, but understanding is something we do. It is a skilled activity.
We could say the same of things. We do not know things because they have innate characteristics, that we know more or less well, but because we are able to perform particular meaningful actions with them. Classification is one such meaningful activity that we do with things.
In this way, Freebase offers a very useful approach to the accumulation of accounts of things. Not because it is realistically or definitively defining the world of things, but because it will allow us all, through a dynamic classification, to define our many different domains of understanding. More importantly, Freebase offers to possibility to ensure that these different domains are communally defined and maintained. It should ensure that these various orders of the world, the various domains, are the emergent result of communities of knowledge, not single singular assertions of a single community.
In my next post, I plan to discuss why Freepress is a much better approach to this problem of knowledge order than the simple Wiki.