Google Changes How it Handles Paid Content

December 2, 2009

Google has made a change to the way it treats its "first click free" option for publishers. The option was designed for legitimate publishers to get around Google's cloaking policies, which discourage the showing of one web page to a crawler while the user sees something different.

With the policy, Google users have been able to access one article from a publication that has a pay wall in place, but are then unable to access other content via links on the site without registering. However, users have been able to get around this in the past, simply by searching for the desired piece of content and starting over from Google.

Now Google has implemented a change that will only allow users to view five pages of content from such a source in a 24 hour period. In a post today on the Google News Blog, Senior Business Product Manager Josh Cohen explains, "If you're a Google user, this means that you may start to see a registration page after you've clicked through to more than five articles on the website of a publisher using First Click Free in a day. We think this approach still protects the typical user from cloaking, while allowing publishers to focus on potential subscribers who are accessing a lot of their content on a regular basis."

Wall Street Journal Paywall

"In addition to First Click Free, we offer another solution: We will crawl, index and treat as 'free' any preview pages - generally the headline and first few paragraphs of a story - that they make available to us," Cohen notes. "This means that our crawlers see the exact same content that will be shown for free to a user. Because the preview page is identical for both users and the crawlers, it's not cloaking."

Google would label stories like this as "subscription" when indexed in Google News. According to Cohen, they would rank based on the same criteria as other sites (paid or free).

He points out that paid content may not rank as well, simply because of the popularity of the content. Less people are likely to link to content that requires a subscription to read, particularly if there is a similar piece of content that is available for free. Google has always favored links and it would be not different in this case.


Have You Read This?

Obvious: People Don't Want to Pay for Online News

> Murdoch On Blocking Search Engines: "I Think We Will"

> Google Okay With Blocking News Corp.

> Is it Really Crazy to Block Google?

Ensuring Your Site is Indexed in Google’s Mobile Search

November 24, 2009

In this day and age, you pretty much can't ignore mobile users. The rate at which consumers are accessing the web via mobile devices is growing rapidly, largely thanks to the increasing popularity and production of smartphones.

Just having a mobile site isn't even enough. Sure, it's a great start, but you have to start thinking about a mobile site just as you would a regular site. Can people find it? Just because you have a good ranking in Google does not mean that your mobile site has a good ranking in Google's mobile search engine, or is even indexed at all.

Google recently shared a few important tips for making sure your mobile site is being indexed in Google's Mobile Search.

1. Create a mobile sitemap and submit it to Google so Google knows it exists. This can be done using Google Webmaster Tools, just like with a regular sitemap.

2. To make sure Googlebot-Mobile can access your site, allow any User-agent to access it.

"You should also be aware that Google may change its User-agent information at any time without notice, so it is not recommended that you check if the User-agent exactly matches 'Googlebot-Mobile' (which is the string used at present)," says Jun Mukai, a software engineer on Google's mobile search team. "Instead, check whether the User-agent header contains the string 'Googlebot-Mobile'. You can also use DNS Lookups to verify Googlebot."

3. Check that your mobile-friendly URLs' DTD (Doc Type Definition) declaration is in an appropriate mobile format such as XHTML Mobile or Compact HTML.

If you run both a regular site and a mobile version of it, there is a possibility that the wrong version will show up in the wrong search results. There are ways you can prevent this.

Getting Indexed in Mobile Search

"When a mobile user or crawler (like Googlebot-Mobile) accesses the desktop version of a URL, you can redirect them to the corresponding mobile version of the same page," explains Mukai. "Google notices the relationship between the two versions of the URL and displays the standard version for searches from desktops and the mobile version for mobile searches."

If you do use a redirect, you should make sure content on the corresponding URL matches as closely as possible, because Google finds sites that abuse the practice in order to try and boost their rankings. Google says this should be avoided at all costs, so you can probably expect to be penalized for such an action.

Another way you can make sure a user is pointed to the right version of your site is simply to provide a link. In fact, that is what Google itself does. If you access the mobile version of Google, you will find a link to the desktop version.

Another way still, is to switch content based on the User-agent, so mobile users automatically see the mobile version and desktop users see the desktop version, even though both are accessing the same URL.

Google warns, however, that if you use this method, there is a chance that if you fail to configure your site correctly, it could be mistaken for cloaking, which you can be penalized for.

"To remain within our guidelines, you should serve the same content to Googlebot as a typical desktop user would see, and the same content to Googlebot-Mobile as you would to the browser on a typical mobile device," says Mukai. "It's fine if the contents for Googlebot are different from the one for Googlebot-Mobile."

Have you taken the necessary steps to ensure you are being indexed in Google's mobile search engine? Have you been left out due to cloaking-related confusion? Discuss here.


Have You Read This?

> Google Launches Custom Search For Smartphones

> Google Gives Mobile Searchers More Options

> Google Revamps Mobile Local Search Experience

Tips for Getting Crawled Faster by Google

August 11, 2009

Probably the most important step in getting your site found in a search engine is the one in which the search engine crawls it. There are things that can be done and things that can be avoided to make this process as painless as possible for the search engine, which will in turn, make it as painless as possible for the webmaster.

Since Google dominates the search market share by such a large market share, it is always a good idea to listen to what they have to say about such matters. So when they post a presentation with tips on optimizing crawling and indexing, you'll probably want to pay attention.

Google has done just that, highlighting things to stay away from, and things you can do to enhance your site's crawlability. Here is that presentation with specific examples of URLs.

"The Internet is a big place; new content is being created all the time," says Google Webmaster Trends Analyst Susan Moskwa. "Google has a finite number of resources, so when faced with the nearly-infinite quantity of content that's available online, Googlebot is only able to find and crawl a percentage of that content. Then, of the content we've crawled, we're only able to index a portion."

"URLs are like the bridges between your website and a search engine's crawler: crawlers need to be able to find and cross those bridges (i.e., find and crawl your URLs) in order to get to your site's content," continues Moskwa. "If your URLs are complicated or redundant, crawlers are going to spend time tracing and retracing their steps; if your URLs are organized and lead directly to distinct content, crawlers can spend their time accessing your content rather than crawling through empty pages, or crawling the same content over and over via different URLs."

If you want to get crawled faster by Google, you should remove user-specific details from URLs. Specifics of this can be viewed in the slideshow.  Basically, URL parameters that don't change the content of the page, should be removed and put into a cookie. This will reduce the number of URLs that point to the same content, and speed up crawling.

Google says infinite spaces are a waste of time and bandwidth for all, which is why you should consider taking action when you have calendars that link to infinite numbers of past/future dates with unique URLs, or other paginated data.

Tell Google to ignore pages it can't crawl. This includes things like log-in pages, contact forms, shopping carts, and other pages that require users to perform actions that crawlers can't perform themselves. You can do this with the robots.txt file.

Finally, avoid duplicate content when possible. Google likes to have one URL for each piece of content. They do recognize that this is not always possible though (because of content management systems and what have you), which is why the canonical link element exists to let you specify the preferred URL for a particular piece of content.