Archive for the ‘Tagging’ Category

Deconstructing Flickr’s “Interestingness!”

May 12, 2006


Since Flickr is one of the most well-known Web 2.0 sites it is worth taking a look at what they do if for no other reason than to use it as a predictor for what the legions of Flickr clones will soon try to copy. Today we are looking at Flickr's method for selecting what it terms "interesting" photographs–purportedly without the intervention of human editors. The results are generally pretty impressive adding to the question what variable they use to distinguish compelling visuals.

Thomas Hawk:

"More than just "interesting," "interestingness" could potentially be a way that Yahoo! reclaims a little piece of search from Google. Today image search at both Google and Yahoo! is largely broken. Do a search for "San Francisco" at both Google and Yahoo! Image Search and you will find a hodge podge of mostly mediocre images.

Says BuzzMachine:

"What’s great about this is that it exposes not the wisdom of the crowd but the taste of the crowd"

Now for the algorithm. Interestingness is described on Flickr as:

"…an amazing new Flickr Feature.

There are lots of things that make a photo 'interesting' (or not) in the Flickr. Where the clickthroughs are coming from; who comments on it and when; who marks it as a favorite; its tags and many more things which are constantly changing. Interestingness changes over time, as more and more fantastic photos and stories are added to Flickr."

Emphasis has been added.

By hinting at the existence of secret sauce, Flickr enters the "we're more than a pretty face" and "trust us we have amazing algorithms underneath" competitions along with people like Digg. If fact it's getting to the point that if you can't work the word algorithm into your "about us" you risk being called "web 1.0"!

Looking at Flickr's description of interestingness above you get a hint at what gets a photograph selected for this distinction and here is what we (and others) have been able to determine.

  • Views, internal and external to Flickr, of the photo
  • Number of comments on the photo, and also who comments on the photo
  • Tags applied to the photo
  • Flickr discussion groups in which the photo appears
  • Favorites, a.k.a Flickr bookmarking, of the photo
  • Time varying behavior of the above factors

Flickr mentions "who" twice in their one paragraph description of their process, but that is a much more difficult aspect to deconstruct. However one blogger noted an absence of the amaturish photos that seem so prevalent and wondered out loud how this might be so:

"One conclusion to draw might be that the professional and semi-professional photographers who make up a minority of Flickr's users are having a disproportionate influence on the metrics that go into Interestingness because they are more active. They make more comments, mark more photos as favorites, look at more pictures not by their current contacts, and therefore their activity has a greater weight in the algorithms that choose the Interesting photos."

Flickr appears to have tinkered with their algorithm (in Feb?) and introducing a penalty for those who appear to try and game the system by uploading to numerous Flickr groups. Aocording to one Flickr user:

"…some recent changes to the algorithm devalued the interestingness of photos submitted to too many groups. This had sparked controversy with a specific kind of Flickr user affectionately referred to as a 'group whore'."…"Group whores are users who send their photos to tons of different groups in a desperate attempt to garner attention (read: views, favs and comments) which in turn would hypothetically lead to a higher level of the coveted interestingness."

Here is a debate between flummoxed Flickr users and the Flickr founder over the issue.

Flickr is right, it is an amazing new feature.

Update: Thomas Hawk adds this hypothesis to Flickr's Interesting algorithm:

One major change that has also occured with regards to interestingness (in my guestimation of course) is that averaging has been introduced for more popular photographers to prevent them from overly dominating interestingness.


The “ Lesson,” Now don’t forget it!

May 11, 2006

The " Lesson" is very simple. So much so that it is far too often forgotten by Web 2.0 technologists. The lesson was defined by Josha Porter who states that "personal value precedes network value: that selfish use comes before shared use."

Even though we’re definitely benefiting from the value of networked software, we’re still not doing so unless the software is valuable to us on a personal level first.

In a later post, Porter continues:

What this means is that if we are to build networks of value, then each person on the network needs to find value for themselves before they can contribute value to the network. In the case of, people find value saving their personal bookmarks first and foremost. All other usage is secondary.

As people use more, and in order to gain more personal value, they use tags to be able to find their bookmarks later. Tagging isn’t even the primary function of Most of the tagging done on is done secondarily, and for personal use.

The social value of tags on is only a happy side-effect. Even though most of the ink spilled about is about the social value, it’s really not the reason why people use it.

For all of the folks designing or (or perhaps Killer Clones?) it is extremely important for them to remember this lesson before they blog endlessly in their founder's blogs in the "about us" sections of their closed beta web sites about the social aspects of their software or the incredible network effect that's going to change their users' worlds.

Porter continues further explaining how tags are different than meta keywords. Tags provide personal benefit, keywords provide only social benefit. Unfortunately people don't really enjoy tagging for the sake of tagging. Sure bloggers would add keywords to their posts because they're hoping it will help drive traffic (in fact bloggers would pretty much do anything to drive traffic). However, for the rank and file user the social benefit of tags pales in comparison to the immediate personal benefit of easily finding sites and information that user had selected for follow-up. As Porter points out, the social benefits and the network effect is just a nice bonus. A bonus of course which has garnered most of the attention from a Web 2.0-hungry world.

Here is a link to a graphic of stats. As you can imagine it goes to the right and up and up and up.


Tagging 2.0: Would a ‘rose’ tagged by any other name smell as sweet?

May 10, 2006

A Survey of Tagging Trends to Answer the Question "What's all of the fuss about?"

When one digs (or is it diggs?) into tagging, what started out as a really simple way to identify resources quickly gets complicated (anyone up for "deduplication" or "morphological analyses?)." With the hundreds of millions of resources being tagged and the purpose of tagging being the ability to quickly and easily find them, tagging has become an enormous area of focus for entrepeneurs, academics, and researchers, the same way that Internet search grew from basic site cataloging and indexing of a decade ago to today's mind-boggling search/ranking algorithms. But we are getting ahead of ourselves, let's get back to what tagging is.

I. Tagging Defined

"Tagging" is a lightweight and flexible approach to classifying information that allows users to apply whatever terms they think are appropriate to describe or recall an asset without the burden of selecting a category from a known taxonomy. It is an extremely important aspect of 2.0 thinking since it puts the power of classification into users' hands. In short it is a tool set that is both created by the community and beneficial to the community.

II. Tagging Vs. Traditional Bookmarking

Computer users have long stored the URLs of useful web resources locally in a browser client (so-called "bookmarking"). Using bookmarks involves scanning the hierarchical lists. These bookmarks are accessible only through the browser of the computer originally used to store them. There are only limited methods for sharing bookmarks (and even moving them to a new computer of the same user can be a hassle).

Tagging differs from this traditional bookmarking in several very critical ways. First, tags can be annotated with identifying tags, or keywords, selected by the individual bookmarking the resource as meaningful. The use of tagging does not impose mutually exclusive categorization schemes that hierarchical structures or faceted metadata do. People can retrieve bookmarks by tag (or title or comment) without having to search down long folder paths or even which folder they put it in. Moreover, since bookmarks are typically stored in a central repository, social bookmark collections are accessible from any browser and any machine.

III. Social Tagging

Social tags (also called "folksonomies") are users to publicly tag and share content. On sites with social tagging, users can categorize information both for themselves as well as browse (and often add to) the information categorized by others. There is therefore at once both personal and public aspects to collaborative tagging systems. Furthermore, social tagging is inherently open-ended and can respond almost immediately to changes and innovations in the way users think about content. Think of it as 'open source keywords.'

Social tagging also allows users to follow tags that interest them to find other users with interests or viewpoints similar to theirs (another social aspect). The front page of shows the most recently added bookmarks (including the tags given to them,who created them, and how many other people have that bookmark in common). There is also a “popular” page, which shows the same information for the URLs that are currently the most popular. One can also see any other user’s personal page and even tag it. By looking at other users’ personal pages as well as the “popular” tags page, users can get a sense of what other people find interesting and hook-up with those with similar interests..

Anyone who uses a service like knows that some tags will be useful for many people (e.g., tagging a picture of a cat as "cat") but other will only be applicable to that individual (tagging a web page as "remembertoprint" or "sendtobob"). If a service has enough people the popular tags will generally overwhelm the individual tags. Interestingly, even personal tags can benefit other users. For example, "if many users find something funny, there is a reasonable likelihood someone else would also find it to be so, and may want to explore it."

IV. Clouds on the Horizon

The good news about tagging is that it's rather simple and it is very useful to find tagged pictures, videos, and other resources with a couple of keywords. However, once tagged, going back and looking for things in the 'tag space' has a number of hard limitations.

First, different people will apply different tags to the same resource (and not just the 'individual' vs. 'many' tags described above). This is okay if you are only searching for things that you filed yourself but that is often not the case. While it's true that this variability can be compensated when a large enough number of users have applied tags, this isn't always possible. Second, from a user perspective coming up with tags from scratch might be more work than they want to do.

Despite these difficulties some in the tagging community adamantly resist any form of taxonomy being imposed on users and the inherent biases that would result. After all Web 2.0 is about listening to users and letting them guide their experiences as much as possible. (You don't want to say, "You can tag this anyway you want as long as you do it the way I want you to."). That said, even tagging pioneers are rethinking their tagging practices to address this issue. For example, has recently introduce the concept of "bundles" – acknowledging the organization problems of scaling a tagging model.


What are bundles?

Bundles are a way to group together common tags. For instance, if you have the tags "design", "painting", and "moma", you may want to group these together into a bundle called "art".

Similarly, Flickr has its "clusters" (again grouping related tags). While these can in fact improve the tagging experience, many consider it to be only a partial solution.

One compromise approach is the use of collaborative tagging techniques which suggest tags for resource based on what other users use for a particular object addressing both the vocabulary divergence problem as well as the task of having to come up with tags from scratch. In addition, "virtual users" could be employed and which automatically generate content-based tags and at least address the cold start problem–especially for content without broad appeal. (We all know that it's easier to edit something that start with a blank piece of paper.)

V. Tag Spam?

Unfortunately as with many things on the web, spamming is a problem and threat to the integrity of tagging systems and people are already implementing ways to combat it. One method looks at the users who applying the tag:

In order to combat tag spam, we introduce an authority score (or reputation score) for each user. The authority score measures how well each user has tagged in the past. This can be modeled as a voting problem. Each time, a user votes correctly (consistent with the majority of other users), the user gets a higher authority score; the user gets a lower score with more bad votes.

VI. The Future of Tagging

The ultimate answer, at least according to researchers at UC Berkeley and Yahoo could be "a revised, probabilistic model using seed ontologies to induce faceted ontology," which I believe is a fancy way of saying that users shouldn't have to choose between pure tagging or completely closed taxonomic models.

In fact I learned today that there is already a word for a theoretical compromise between collaboration and vocabulary, collabulary, think of it as a mashup of the two. Wikipedia proposed this example: "If two users define an object as being 'white' and one user defines an object as being 'cream' then a relevance can be defined as "more white than cream".

Note the majority of the papers accessed for this post were accessed from the Tagging Workshop at WWW 2006. For those interested in delving deeper into tags this is an excellent place to start and all of the papers are currently available in PDF.

Additional Recommended Reading: Why People Tag,

Let your users tag, tag, tag

May 5, 2006

When "tagging" became all the rage awhile back, many website operators couldn't understand all of the fuss particularly if they had put time into setting up a content classification system they felt was working. But since 'tag clouds' looked so cool, some of them were convinced enough to add the keywords to complement their taxonomies.

However, in a Web 2.0 world, that is simply not enough. Why? Because unless publishers will never be able to anticipate all of the ways that users see and seek out their material. Only by giving users the ability to tag content can publishers be assured that they are not missing key methods for discovering their work.

Fred Wilson notes:

User tagging is vastly superior to self tagging because it is the consumers who are navigating and trying to find the stuff. The way they describe it is the same way they will try to find it. And it's really hard for publishers to figure out all the keywords up front.

What's more, users like to tag (read giving yet another reason to add this functionality to a web site. But with this tendency comes a wise word of caution:

Many apps focus on being the new social killer-app when, in general, people don’t have time to worry about what other people are doing, and will only use software that benefits them personally at every step. You could call this selfishness or laziness, but I would call it optimization. For example, we simply don’t have time to tag things for tagging sake. Instead, we might tag things if we think that it will help us in the future, but adding tags to an app does not a solution make.