A Survey of Tagging Trends to Answer the Question "What's all of the fuss about?"
When one digs (or is it diggs?) into tagging, what started out as a really simple way to identify resources quickly gets complicated (anyone up for "deduplication" or "morphological analyses?)." With the hundreds of millions of resources being tagged and the purpose of tagging being the ability to quickly and easily find them, tagging has become an enormous area of focus for entrepeneurs, academics, and researchers, the same way that Internet search grew from basic site cataloging and indexing of a decade ago to today's mind-boggling search/ranking algorithms. But we are getting ahead of ourselves, let's get back to what tagging is.
I. Tagging Defined
"Tagging" is a lightweight and flexible approach to classifying information that allows users to apply whatever terms they think are appropriate to describe or recall an asset without the burden of selecting a category from a known taxonomy. It is an extremely important aspect of 2.0 thinking since it puts the power of classification into users' hands. In short it is a tool set that is both created by the community and beneficial to the community.
II. Tagging Vs. Traditional Bookmarking
Computer users have long stored the URLs of useful web resources locally in a browser client (so-called "bookmarking"). Using bookmarks involves scanning the hierarchical lists. These bookmarks are accessible only through the browser of the computer originally used to store them. There are only limited methods for sharing bookmarks (and even moving them to a new computer of the same user can be a hassle).
Tagging differs from this traditional bookmarking in several very critical ways. First, tags can be annotated with identifying tags, or keywords, selected by the individual bookmarking the resource as meaningful. The use of tagging does not impose mutually exclusive categorization schemes that hierarchical structures or faceted metadata do. People can retrieve bookmarks by tag (or title or comment) without having to search down long folder paths or even which folder they put it in. Moreover, since bookmarks are typically stored in a central repository, social bookmark collections are accessible from any browser and any machine.
III. Social Tagging
Social tags (also called "folksonomies") are users to publicly tag and share content. On sites with social tagging, users can categorize information both for themselves as well as browse (and often add to) the information categorized by others. There is therefore at once both personal and public aspects to collaborative tagging systems. Furthermore, social tagging is inherently open-ended and can respond almost immediately to changes and innovations in the way users think about content. Think of it as 'open source keywords.'
Social tagging also allows users to follow tags that interest them to find other users with interests or viewpoints similar to theirs (another social aspect). The front page of Del.icio.us shows the most recently added bookmarks (including the tags given to them,who created them, and how many other people have that bookmark in common). There is also a “popular” page, which shows the same information for the URLs that are currently the most popular. One can also see any other user’s personal page and even tag it. By looking at other users’ personal pages as well as the “popular” tags page, users can get a sense of what other people find interesting and hook-up with those with similar interests..
Anyone who uses a service like Del.icio.us knows that some tags will be useful for many people (e.g., tagging a picture of a cat as "cat") but other will only be applicable to that individual (tagging a web page as "remembertoprint" or "sendtobob"). If a service has enough people the popular tags will generally overwhelm the individual tags. Interestingly, even personal tags can benefit other users. For example, "if many users find something funny, there is a reasonable likelihood someone else would also find it to be so, and may want to explore it."
IV. Clouds on the Horizon
The good news about tagging is that it's rather simple and it is very useful to find tagged pictures, videos, and other resources with a couple of keywords. However, once tagged, going back and looking for things in the 'tag space' has a number of hard limitations.
First, different people will apply different tags to the same resource (and not just the 'individual' vs. 'many' tags described above). This is okay if you are only searching for things that you filed yourself but that is often not the case. While it's true that this variability can be compensated when a large enough number of users have applied tags, this isn't always possible. Second, from a user perspective coming up with tags from scratch might be more work than they want to do.
Despite these difficulties some in the tagging community adamantly resist any form of taxonomy being imposed on users and the inherent biases that would result. After all Web 2.0 is about listening to users and letting them guide their experiences as much as possible. (You don't want to say, "You can tag this anyway you want as long as you do it the way I want you to."). That said, even tagging pioneers are rethinking their tagging practices to address this issue. For example, Del.icio.us has recently introduce the concept of "bundles" – acknowledging the organization problems of scaling a tagging model.
What are bundles?
Bundles are a way to group together common tags. For instance, if you have the tags "design", "painting", and "moma", you may want to group these together into a bundle called "art".
Similarly, Flickr has its "clusters" (again grouping related tags). While these can in fact improve the tagging experience, many consider it to be only a partial solution.
One compromise approach is the use of collaborative tagging techniques which suggest tags for resource based on what other users use for a particular object addressing both the vocabulary divergence problem as well as the task of having to come up with tags from scratch. In addition, "virtual users" could be employed and which automatically generate content-based tags and at least address the cold start problem–especially for content without broad appeal. (We all know that it's easier to edit something that start with a blank piece of paper.)
V. Tag Spam?
Unfortunately as with many things on the web, spamming is a problem and threat to the integrity of tagging systems and people are already implementing ways to combat it. One method looks at the users who applying the tag:
In order to combat tag spam, we introduce an authority score (or reputation score) for each user. The authority score measures how well each user has tagged in the past. This can be modeled as a voting problem. Each time, a user votes correctly (consistent with the majority of other users), the user gets a higher authority score; the user gets a lower score with more bad votes.
VI. The Future of Tagging
The ultimate answer, at least according to researchers at UC Berkeley and Yahoo could be "a revised, probabilistic model using seed ontologies to induce faceted ontology," which I believe is a fancy way of saying that users shouldn't have to choose between pure tagging or completely closed taxonomic models.
In fact I learned today that there is already a word for a theoretical compromise between collaboration and vocabulary, collabulary, think of it as a mashup of the two. Wikipedia proposed this example: "If two users define an object as being 'white' and one user defines an object as being 'cream' then a relevance can be defined as "more white than cream".
Note the majority of the papers accessed for this post were accessed from the Tagging Workshop at WWW 2006. For those interested in delving deeper into tags this is an excellent place to start and all of the papers are currently available in PDF.