AJAX and SEO

AJAX and SEO: What You Need To Know

Necessity is the mother of all inventions – Plato

I am a developer first before I am a marketing professional. I care more about performance, decoupling code and reusability than whether Google will crawl my applications. I understand APIs and scripting languages better than I can read Google Analytics traffic data. But after joining Rand Group I learned the importance of having search engine optimized websites. Concepts of SEO ranking and duplicate content penalties became things I have to consider as equally important as page reflows/repaint. Some decisions are painful and I need to step back if my programmer side wants the project goals to win. One of those decisions is if I can use AJAX on a page and if that would hurt or help the page I’m building.

What is AJAX?

In comes AJAX or Asynchronous JavaScript and XML, a web technology that includes multiple techniques that promised a better way of updating website content. What is AJAX exactly? To answer that we need to understand How a Browser Works.
AJAX_diagram

Basically, when you open your browser to a website page via a URL or Uniform Resource Locator, an HTTP request is sent through the ether to a webserver hosting that page/website. The Webserver then, to put it simply, serves that resource ( image, HTML etc.) with its corresponding mime type. This response is received by the browser and loads the current view (browser window) with that document, image etc. Before the advent of AJAX this meant you lost the previous page you were viewing and the window is refreshed with the newly requested resource. This was enough 40 years ago in the days of ARPANET where the internet was just a bunch of documents linked to other documents, but the netizen of today requires more dynamic and interactive websites. That means every site becomes increasingly heavy and clunky to load. It’s a developers job to make those pages more performant and more desktop application like. This meant the long accepted browser use needs changing. AJAX became the “invention” that aroused from that need. Microsoft added a new object called XMLHttpRequest accessible via JavaScript. With XMLHttpRequest, a web page can have JavaScript that asynchronously(a fancy term which means running in the background) request an asset from the webserver and receive that response as data that can be manipulated with JavaScript. The most common use was to insert dynamic markup into the DOM. To illustrate, imagine you have a page with a sidebar with a ticker for the latest stocks info. With AJAX, you only need to click a link or button that triggers a request to the stocks exchange API and load the document it returns. All without refreshing the page. This saves time and keeps the user engaged and lessens the load on the webserver.

At this point, you can see a bit clearer now the advantages of AJAX. Lesser load on the server, better experience for the user, web pages resemble desktop applications, flexibility to any resource (HTML, text files, images etc.). Although quite useful, it isn’t without its shortcomings. AJAX messes with the browser’s history meaning, unlike normally loaded sites, clicking the back or forward button will not show the different versions of the page. This leads to confusion and frustration on the part of the user not to mention the difference in implementation with various browsers. AJAX is more geared to applications than documents. Applications care for the state of a view to accomplish a task. Documents are “containers” of information. Extra effort is required to build out this difference in code to gain a uniform interface to issue AJAX requests. Which led to the popularity of a plethora of Javascript frameworks and libraries like jQuery and AngularJS.

AJAX Broke The Internet

For a web professional, the greatest disadvantage AJAX has that it’s not an optimal solution to get your page indexed and crawlable by search engines. Search engine automated modules, more affectionately known as bots, are smart applications that scour the internet, going one link after another to catalog the internet into documents of words and URI. Bots have to be optimized to accomplish the task, so for the longest time, all search bots only rendered the initial markup of a website document. Information after the DOMContentLoaded event, like an AJAX request, was summarily ignored. So if you had a website dependent on AJAX then you’re out of luck. Like I mentioned earlier, AJAX is better suited for applications where the state of the current view exists to indicate a task to be accomplished. Quite incompatible with document-oriented bots.

Search engine bots are highly efficient parsers, whose sole purpose is to read text and archive the URL pointing to this document by storing the comprising content in a certain manner. Simpler still imagine a library of books, where each book is a website page, and the library catalog matches each book not by general topics but by the content of the book, every word, and sentence. To borrow a book you will need to pass a slip with a passage on the catalog card for that book to the librarian, a search query. AJAX generated data, on the other hand, would be likened to a loose page that you can borrow from the librarian whenever you get to the page before it. But because the catalog was built without including that extra page, you can’t use the contents of that torn page to submit to the librarian so he/she can locate the book and lend it to you. Like a torn page, bots will not be able to use the information on that extra page to identify your “book”.

In 2009, Google seeing the significance of AJAX to the future of the internet, proposed a set of practices to create linkable states in a website. This is done by enabling the server to identify a request and serve a precompiled version or snapshot of the page’s state. A state is what the page looks like after a change on the page has occurred e.g. clicking page 2 and loading that set of lines using AJAX. The URL of the page state is identified using a special attribute called an escape fragment.

Developers the world over rejoiced on their newfound superpower. Google soon announced that they are no longer recommending the scheme and announced their crawlers/bots can understand and render pages like modern web browsers. There are caveats to this though, pages have to be performant and care has to be given to the features incorporated into a page meaning no fancy bleeding edge tech.

I consider this bit of news to be very promising indeed. To confirm Google’s claim I designed some experiments taking inspiration from klikki.

The first question I wanted to answer was if content built via Javascript is crawlable to Google bots. Second, was what kind of AJAX data is crawl-safe.

My Experiment

I created a page with an area where I load AJAX content. First, a section is built with a snippet (.shtml) with normal HTML markup. I added a unique string to the file thinking it would make it easier to test from Google search.

ajax-first-section

After adding this section, I went to Google Webmaster Tools and submitted the page to index. After a few minutes, I went to Google Search:

site-console-first-string

 

 

So far so good. So the answer to the question if my first test is crawl-able is a resounding yes.

The second section is a schedule of links built from a JSON data source and the magic of jQuery DOM manipulation. Same with the first section, I created a unique string built from the concatenated categories.

ajax-second-section

This time, I wanted Google to index the page naturally. I waited for an hour after resubmitting my sitemap:

site-search-second-string-after-fetching-google-console

It was not a surprise to learn the page had not yet been indexed. I decided to leave the experiment there and wait for a few days. After five days, lo and behold, I was successfully able to query the second unique string and pull up the host page:

site-search-second-string-after-a-few-hours

We have proven that both HTML snippet and generated markup is crawl-able from Google.

To see if the page will not get penalties from Googlebot, I resubmitted my index and waited for a few more days:

site-search-after-a-few-days

I did not observe any preference on the type of markup. Paragraphs and anchor titles were indexed and searchable from Google.

At the same time, I submitted my page for indexing. I logged in to Bing and submitted the same page. As of the writing of this page, I am yet to be able to use the AJAX content to target the page using Bing.

The experiment above demonstrates that Google’s crawler can render and crawl JavaScript generated content. What my experiment lacks is indexing states. The state like I mentioned earlier is a snapshot of the page after an action has occurred. Extra effort is required to let the browser know of the change.

Push State and History Object

HTML5 introduced history.pushState()and history.replaceState() methods. These methods help the browser history object which the browser uses to refer different pages. Both methods accept a URL as a parameter. Whenever, for example, the pushState method is called the current location is stored as a referrer and an event is triggered. The page’s scripting will catch this event and its job to do the appropriate action. With this you can wire a page to have different “snapshots”. You can now use the history button to navigate a page’s session without escaping or refreshing the current page. The states of the page can then be passed as a URL to be indexed whether by a link on the page or on entries in a sitemap. This development brought about a new type of application called an SPA or Single Page Application.

single-page-web-applications

The topic of my next experiment will be Single Page Applications. So watch for that.

Conclusion

AJAX and the methods that developers have taken to alleviate the browser’s shortcomings have advanced how we experience websites and thus how we expect websites should behave. AJAX is definitely a great tool in a web developer’s belt. Like all technologies, care must be taken when including this in a project. Especially if a project’s goals are better searchability and efficient indexing.

Need help creating a better user experience? Contact Rand Group Digital for a marketing audit to review your current website and help identify areas for improvement.

– Software Delivered as Promised. No Surprises.

Print Friendly, PDF & Email
Chris Bautista

Insight written by Chris Bautista

Chris Bautista has over 10 years of programming experience in website development, database maintenance, and software solution design.

Ask Chris a Question or call (866) 714-8422