AJAX and SEO: What you need to know

AJAX and SEO: What you need to know

It is common for a developer to consider development first before marketing. More consideration is placed on performance, decoupling code, and reusability rather than if Google will crawl your applications. A developer can understand APIs and scripting languages better than Google Analytics traffic data. But a developer should not dismiss the importance of having search engine optimized websites. Concepts of SEO ranking and duplicate content penalties must be considered as equally important as page reflows/repaint. Some decisions can be painful however one must step back from their programmer side, and be open to all of the project goals to win. One of these decisions is whether to use AJAX on a page and if that would hurt or help the page being built.

What is AJAX?

In simple terms, AJAX, or Asynchronous JavaScript and XML, is a web technology that includes multiple techniques that promise a better way of updating website content. What is AJAX exactly? To answer that we need to understand How a Browser Works.

How a browser works

Basically, when you open your browser to a website page via a URL or Uniform Resource Locator, an HTTP request is sent through the ether to a webserver hosting that page/website. The Webserver then, to put it simply, serves that resource ( image, HTML etc.) with its corresponding mime type. This response is received by the browser and loads the current view (browser window) with that document, image etc. Before the advent of AJAX this meant you lost the previous page you were viewing and the window is refreshed with the newly requested resource. This was enough 40 years ago in the days of ARPANET where the internet was just a bunch of documents linked to other documents, but the netizen of today requires more dynamic and interactive websites. That means every site becomes increasingly heavy and clunky to load. It’s a developers job to make those pages more performant and more desktop application like. This meant the long accepted browser use needs changing. AJAX became the “invention” that aroused from that need. Microsoft added a new object called XMLHttpRequest accessible via JavaScript. With XMLHttpRequest, a web page can have JavaScript that asynchronously(a fancy term which means running in the background) request an asset from the webserver and receive that response as data that can be manipulated with JavaScript. The most common use was to insert dynamic markup into the DOM. To illustrate, imagine you have a page with a sidebar with a ticker for the latest stocks info. With AJAX, you only need to click a link or button that triggers a request to the stocks exchange API and load the document it returns. All without refreshing the page. This saves time and keeps the user engaged and lessens the load on the webserver.

At this point, you can see a bit clearer now the advantages of AJAX. Lesser load on the server, better experience for the user, web pages resemble desktop applications, flexibility to any resource (HTML, text files, images etc.). Although quite useful, it isn’t without its shortcomings. AJAX messes with the browser’s history meaning, unlike normally loaded sites, clicking the back or forward button will not show the different versions of the page. This leads to confusion and frustration on the part of the user not to mention the difference in implementation with various browsers. AJAX is more geared to applications than documents. Applications care for the state of a view to accomplish a task. Documents are “containers” of information. Extra effort is required to build out this difference in code to gain

What is AJAX?

For a web professional, the greatest disadvantage AJAX has that it’s not an optimal solution to get your page indexed and crawlable by search engines. Search engine automated modules, more affectionately known as bots, are smart applications that scour the internet, going one link after another to catalog the internet into documents of words and URI. Bots have to be optimized to accomplish the task, so for the longest time, all search bots only rendered the initial markup of a website document. Information after the DOMContentLoaded event, like an AJAX request, was summarily ignored. So if you had a website dependent on AJAX then you’re out of luck. Like I mentioned earlier, AJAX is better suited for applications where the state of the current view exists to indicate a task to be accomplished. Quite incompatible with document-oriented bots.

Search engine bots are highly efficient parsers, whose sole purpose is to read text and archive the URL pointing to this document by storing the comprising content in a certain manner. Simpler still, imagine a library of books where each book is a website page, and the library catalog matches each book not by general topics but by the content of the book, every word, and sentence. To borrow a book you will need to pass a slip with a passage on the catalog card for that book to the librarian, a search query. AJAX generated data, on the other hand, would be likened to a loose page that you can borrow from the librarian whenever you get to the page before it. But because the catalog was built without including that extra page, you can’t use the contents of that torn page to submit to the librarian so he/she can locate the book and lend it to you. Like a torn page, bots will not be able to use the information on that extra page to identify your “book”.

In 2009, Google seeing the significance of AJAX to the future of the internet, proposed a set of practices to create linkable states in a website. This is done by enabling the server to identify a request and serve a precompiled version or snapshot of the page’s state. A state is what the page looks like after a change on the page has occurred e.g. clicking page 2 and loading that set of lines using AJAX. The URL of the page state is identified using a special attribute called an escape fragment.

Developers the world over rejoiced on their newfound superpower. Google soon announced that they are no longer recommending the scheme and announced their crawlers/bots can understand and render pages like modern web browsers. There are caveats to this though, pages have to be performant and care has to be given to the features incorporated into a page meaning no fancy bleeding-edge tech.

This bit of news was considered to be very promising indeed. To confirm Google’s claim Rand Group designed some experiments taking inspiration from klikki.

The first question we wanted to answer was if content built via Javascript is crawlable to Google bots. Second, was what kind of AJAX data is crawl-safe.

The experiment

A page was created with an area where AJAX content was added. First, a section is built with a snippet (.shtml) with normal HTML markup. Then a unique string was added to the file with the idea that this might make it easier to test from Google search.

AJAX experiement

After adding this section, we went to Google Webmaster Tools and submitted the page to index. After a few minutes, we tested the results in Google Search:

Webmaster tools

So far so good. So the answer to the question if our first test is crawl-able is a resounding yes.

The second section is a schedule of links built from a JSON data source and the magic of jQuery DOM manipulation. Same with the first section, we created a unique string built from the concatenated categories.

Concatenated categories

This time, we wanted Google to index the page naturally. We waited for an hour after resubmitting our sitemap:

Google index

It was not a surprise to learn the page had not yet been indexed. We decided to leave the experiment there and wait for a few days. After five days, lo and behold, we were successfully able to query the second unique string and pull up the host page:

Google results

We have proven that both HTML snippet and generated markup is crawl-able from Google.

To see if the page will not get penalties from Googlebot,we resubmitted my index and waited for a few more days:

Google penalties

We did not observe any preference in the type of markup. Paragraphs and anchor titles were indexed and searchable from Google.

At the same time, we submitted my page for indexing. We logged in to Bing and submitted the same page. As of the writing of this page, We have yet to be able to use the AJAX content to target the page using Bing.

The experiment above demonstrates that Google’s crawler can render and crawl JavaScript generated content. What my experiment lacks are indexing states. The state like we mentioned earlier is a snapshot of the page after an action has occurred. Extra effort is required to let the browser know of the change.

pushState and history object

HTML5 introduced history.pushState()and history.replaceState() methods. These methods help the browser history object which the browser uses to refer different pages. Both methods accept a URL as a parameter. Whenever, for example, the pushState method is called the current location is stored as a referrer and an event is triggered. The page’s scripting will catch this event and its job to do the appropriate action. With this you can wire a page to have different “snapshots”. You can now use the history button to navigate a page’s session without escaping or refreshing the current page. The states of the page can then be passed as a URL to be indexed whether by a link on the page or on entries in a sitemap. This development brought about a new type of application called an SPA or Single Page Application.

Single page web applications

Conclusion

AJAX and the methods that developers have taken to alleviate the browser’s shortcomings have advanced how we experience websites and thus how we expect websites should behave. AJAX is definitely a great tool in a web developer’s belt. Like all technologies, care must be taken when including this in a project. Especially if a project’s goals are better searchability and efficient indexing.

Need help with creating a better user experience? Contact Rand Group for a marketing audit to review your current website and help identify areas for improvement.

Modernize your data strategy

Business Intelligence is a necessary investment to gain insight into your data. Yet a surprising number of organizations are still utilizing yesterday’s tools to try to solve the problems of tomorrow. We have compiled the lowest cost, highest return methods to add immediate value to your analytics and reporting investments in our 5 Tips for Modernizing Your Data Strategy white paper.

LEARN MORE

Subscribe to our Insights

Stay up to date on the latest business and marketing insights.

Follow Us

Related Insights

Let’s talk about how we can transform your business