For the longest time, Google (news, site) has expressed disdain against black-hat search engine optimization methods. These include keyword-stuffing, redirects and all sorts of underhanded tactics. Recently, though, the spotlight has been pointed toward so-called content farms, which are essentially websites with large collections of articles that are supposedly of low-quality, and designed specifically to monetize on-page advertisements.

Crowdsourcing Quality Control

Google has traditionally been against search engine spam, and has regularly fine-tuned its algorithm to prevent spammers from ranking high in SERPs. The mushrooming of content farms has probably resulted in headaches for Google engineers, given community clamor against low-quality, mass-produced content littering the top search results. The massive amount of fresh content and traffic on content farms makes them a formidable force, though. For instance, Demand Studios supposedly generates about 5,700 new articles per day. AOL produces 1,700 daily, and Yahoo! Associated Content, 1,500.

As "pure webspam" has decreased over time, attention has shifted instead to "content farms," which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content."

The search giant seems to have handed the responsibility to the user by coming up with a Chrome browser extension called the Personal Blocklist. The extension lets a user identify domains to exclude from search results. This is essentially a crowd-sourced initiative to curb content farming, as the lists of blocked sites are also sent back to Google, and are used to fine-tune the search algorithm, considering the websites that users find to be spammy.

We've been exploring different algorithms to detect content farms, which are sites with shallow or low-quality content. One of the signals we're exploring is explicit feedback from users. To that end, today we're launching an early, experimental Chrome extension so people can block sites from their web search results. If installed, the extension also sends blocked site information to Google, and we will study the resulting feedback and explore using it as a potential ranking signal for our search results."

What Constitutes Content Farming, Anyway?

Now, the questions that everyone seems to be asking are how to define a content farm, and why they are bad for the Web. The online communities don't seem to be in agreement as to these two questions. Proponents of Google's move to minimize content farms' role in search results argue that these are spammy because of the low quality and lack of originality in the articles.

For instance, many articles on eHow, AnswerBag and Associated Content -- which have been identified as examples of content-farms -- are made by writers paid a few dollars apiece, and are likely to be rewritten from other websites' content.Stories about the automated and mass-produced nature of content farms add to the explanation of why quality on these sites can be difficult to ensure.

On the other side of the coin are those who argue that there is no difference between a supposed content farm and a search-optimized website. Will this mean, then, that legitimate web development businesses that focus on search- and user-friendly webpages are being targeted for content farming? Perhaps the difference, in this case, is scale. Content farms optimize pages for search results on a massive scale, and it's not just one or two keywords that are being optimized for.

Demand-Driven vs. User-Generated

And then there's the argument that, if quality and originality are the basis of being classified as a content farm, websites that feature user-generated content will also be guilty of spammy tactics. Will this mean, then, that, Wikipedia and even Google's own are perpetrators?

Learning Opportunities

One good measure by which to define content farming is the motivation for generating content. It can be argued that content farms generate content (articles, videos, photos) based on demand. Most content farms use an algorithm for determining what people are searching for, and will come up with a list of titles to this effect. This way, the content being generated will be targeted toward addressing the demand. This is in contrast to an author creating content that he deems to be important and relevant.

Who Wins and Who Loses?

Content farming is big business. Demand Media, for example, has recently gone public, and is currently valued at US$ 1.5 billion. However, advertisements are a big part of a content farm's business model, in which Google itself is a big stakeholder. Many content farms run different advertising schemes, which include Google's own contextual AdSense. Some even feature premium AdSense listings, which might indicate a preferred status.

In a sense, then, Google is a big stakeholder in the content farming game. It will both gain and lose in its quest to clean up the search results of so-called fastfood-ized content. Its granting premium AdSense status to some of these content farms could also mean that the sites were considered to be legitimate and important. Is this why Google is taking a hands-off approach, and instead asking the community for inputs on what can be constituted as spam? One might also wonder about the timing of it all.

In the end, it's all about the user experience. Some will attest to the usefulness of farmed-out content, particularly for very focused and very niched content that no one else will have the time and resources to produce. However, some would rather see clean search results, without spammy links and ad-riddled sites whose raison d'etre is to earn from clickthroughs. We can only suppose that, either way, Google wins.