AJAX and SEO: What You Need To Know
Necessity is the mother of all inventions – Plato
I am a developer first before I am a marketing professional. I care more about performance, decoupling code and reusability than whether Google will crawl my applications. I understand APIs and scripting languages better than I can read Google Analytics traffic data. But after joining Rand Group I learned the importance of having search engine optimized websites. Concepts of SEO ranking and duplicate content penalties became things I have to consider as equally important as page reflows/repaint. Some decisions are painful and I need to step back if my programmer side wants the project goals to win. One of those decisions is if I can use AJAX on a page and if that would hurt or help the page I’m building.
What is AJAX?
AJAX Broke The Internet
For a web professional, the greatest disadvantage AJAX has that it’s not an optimal solution to get your page indexed and crawlable by search engines. Search engine automated modules, more affectionately known as bots, are smart applications that scour the internet, going one link after another to catalog the internet into documents of words and URI. Bots have to be optimized to accomplish the task, so for the longest time, all search bots only rendered the initial markup of a website document. Information after the DOMContentLoaded event, like an AJAX request, was summarily ignored. So if you had a website dependent on AJAX then you’re out of luck. Like I mentioned earlier, AJAX is better suited for applications where the state of the current view exists to indicate a task to be accomplished. Quite incompatible with document-oriented bots.
Search engine bots are highly efficient parsers, whose sole purpose is to read text and archive the URL pointing to this document by storing the comprising content in a certain manner. Simpler still imagine a library of books, where each book is a website page, and the library catalog matches each book not by general topics but by the content of the book, every word, and sentence. To borrow a book you will need to pass a slip with a passage on the catalog card for that book to the librarian, a search query. AJAX generated data, on the other hand, would be likened to a loose page that you can borrow from the librarian whenever you get to the page before it. But because the catalog was built without including that extra page, you can’t use the contents of that torn page to submit to the librarian so he/she can locate the book and lend it to you. Like a torn page, bots will not be able to use the information on that extra page to identify your “book”.
In 2009, Google seeing the significance of AJAX to the future of the internet, proposed a set of practices to create linkable states in a website. This is done by enabling the server to identify a request and serve a precompiled version or snapshot of the page’s state. A state is what the page looks like after a change on the page has occurred e.g. clicking page 2 and loading that set of lines using AJAX. The URL of the page state is identified using a special attribute called an escape fragment.
Developers the world over rejoiced on their newfound superpower. Google soon announced that they are no longer recommending the scheme and announced their crawlers/bots can understand and render pages like modern web browsers. There are caveats to this though, pages have to be performant and care has to be given to the features incorporated into a page meaning no fancy bleeding edge tech.
I consider this bit of news to be very promising indeed. To confirm Google’s claim I designed some experiments taking inspiration from klikki.
I created a page with an area where I load AJAX content. First, a section is built with a snippet (.shtml) with normal HTML markup. I added a unique string to the file thinking it would make it easier to test from Google search.
After adding this section, I went to Google Webmaster Tools and submitted the page to index. After a few minutes, I went to Google Search:
So far so good. So the answer to the question if my first test is crawl-able is a resounding yes.
The second section is a schedule of links built from a JSON data source and the magic of jQuery DOM manipulation. Same with the first section, I created a unique string built from the concatenated categories.
This time, I wanted Google to index the page naturally. I waited for an hour after resubmitting my sitemap:
It was not a surprise to learn the page had not yet been indexed. I decided to leave the experiment there and wait for a few days. After five days, lo and behold, I was successfully able to query the second unique string and pull up the host page:
We have proven that both HTML snippet and generated markup is crawl-able from Google.
To see if the page will not get penalties from Googlebot, I resubmitted my index and waited for a few more days:
I did not observe any preference on the type of markup. Paragraphs and anchor titles were indexed and searchable from Google.
At the same time, I submitted my page for indexing. I logged in to Bing and submitted the same page. As of the writing of this page, I am yet to be able to use the AJAX content to target the page using Bing.
Push State and History Object
HTML5 introduced history.pushState()and history.replaceState() methods. These methods help the browser history object which the browser uses to refer different pages. Both methods accept a URL as a parameter. Whenever, for example, the pushState method is called the current location is stored as a referrer and an event is triggered. The page’s scripting will catch this event and its job to do the appropriate action. With this you can wire a page to have different “snapshots”. You can now use the history button to navigate a page’s session without escaping or refreshing the current page. The states of the page can then be passed as a URL to be indexed whether by a link on the page or on entries in a sitemap. This development brought about a new type of application called an SPA or Single Page Application.
The topic of my next experiment will be Single Page Applications. So watch for that.
AJAX and the methods that developers have taken to alleviate the browser’s shortcomings have advanced how we experience websites and thus how we expect websites should behave. AJAX is definitely a great tool in a web developer’s belt. Like all technologies, care must be taken when including this in a project. Especially if a project’s goals are better searchability and efficient indexing.
Need help creating a better user experience? Contact Rand Group Digital for a marketing audit to review your current website and help identify areas for improvement.
– Software Delivered as Promised. No Surprises.