People frequently inquire as to my perspective on the use of computer automation to conduct content analysis and/or aggregation. This is certainly a topic I have given considerable consideration to in my role as founder of CustomScoop, a media intelligence company. When we started out in 2000, the primary demand we had from clients was to be able to know what was being said about their company, products, and competitors. Today, an increasing number accept the data as the foundation, but want tools and services layered on top of the basic information to develop sophisticated analysis that leads to meaningful intelligence.

So what’s my answer? In short, I believe computers have a role to play — but in assisting humans rather than replacing them. The nuances of language and the understanding of communities and influence that humans develop over time simply cannot be replicated in computers with the same level of accuracy. And there’s the rub. Accuracy. If you are willing to accept considerably less accuracy, computers can often meet those needs. But show me one customer who would accept a Wall Street Journal story categorized as positive based solely on vocabulary when a human would correctly discern its negative impact. Many would be willing to accept inaccuracy at some level — but only if they could control where the inaccuracy occurs.

Of course, judging sentiment of media articles isn’t the only role that computers can play in automated analysis and aggregation. For instance, some companies attempt to use computer algorithms and word analysis to do such things as:

  • determine the demographic profile of a blogger
  • assess the influence of a blog or group of blogs
  • predict patterns of message dissemination
  • eliminate duplicate articles
  • cultivate recommended reading lists

These are all admirable goals and some have seen more success than others.

Let’s take a look at a few examples.

TechMeme and Digg. I enjoy both of these sites for their ability to help bubble up interesting content. In the case of TechMeme, it acts as sort of a Cliff’s Notes version of the day’s tech news. Digg serves a much broader range of purposes. In both cases, a computer algorithm assesses the activity of humans to make automated editorial decisions. TechMeme gauges bloggers’ linking behavior (along with other undisclosed variables) to reach its conclusions; Digg takes user votes and applies mathematical analysis to judge velocity and other factors in creating its rankings.

But in looking at both sites on a regular basis, it would be hard to argue that a human editor might not be able to polish the results just a bit and make them that much more useful. Some might argue that would introduce bias to the equation, but at least it would be overt bias, whereas secret formulas can be hard to assess. With news outlets that have an overt slant, everybody knows what to expect. Those that attempt to achieve neutrality rarely do so, thus misleading audiences.

Media Analysis. It is not for me to name names here, as some of the companies involved could be considered to be competitive with CustomScoop, thus potentially tainting the specifics of my comments. Rather, let me address the key problem with complete automation here.

As I noted above, accuracy leaves much to be desired. KD Paine, a widely respected leader in the field of public relations measurement, often makes the point that computers cannot effectively identify sarcasm, for instance. That’s but one of the challenges of linguistic analysis. There’s more, of course. Computers have a much harder time evaluating the overall impact of an article. A somewhat extreme example that helps illustrate the challenge would be an obituary of a civic leader that fairly identifies many of his accomplishments over the years, but leads with a paragraph indicating that he was accused of being a child molester, but never formally charged. A human would know that most people would walk away from the article with a negative impression, whereas a computer might well judge that since 90% of the facts were positive, so too was the article.

Bottom Line. Fundamentally, I believe the most successful approaches will be those that seek to facilitate the role of humans in aggregation and analysis. A role for experts continues to exist, even with the best computer algorithms currently available.