Solved - Does Google Crawl Angularjs website pages? If not is there any solution?

23 replies
  • SEO
  • |
Hey everyone I wa working on this http://www.crushprice.com Angularjs website Since 3 months, I have noted that It is not getting crawled by spiders.

How to overcome this situation. Thank you
#angularjs #crawl #google #pages #solution #website
Avatar of Unregistered
  • Profile picture of the author pauloadaoag
    Administrator
    Some suggestions after looking at your site:
    - your sitemap.xml doesnt contain dates for when pages get updated
    - you load *ALL* your content via ajax. Your page w/o javascript is *completely* empty

    Have you looked into doing even *some* server side rendering?
    You may also want to check this out Does google execute javascript?

    Update: From Google (ca. 2015) Deprecating our AJAX crawling scheme
    Signature
    {{ DiscussionBoard.errors[11162190].message }}
    • Profile picture of the author Rajender Smts
      Hey Thanks for the answer, How to update dates on sitemap. Can you help me in this? Our team trying to rectify this javascript issues

      Thank you
      {{ DiscussionBoard.errors[11163561].message }}
      • Profile picture of the author Mike Anthony
        Originally Posted by Rajender Smts View Post

        Hey Thanks for the answer, How to update dates on sitemap. Can you help me in this? Our team trying to rectify this javascript issues

        Thank you
        can't you use your backend to access the api and then serve it to your frontend? that would solve that issue. Personally for SEO I hate the all Js approach. Yes Google gets better each year reading Js but its not 100% and using Js to show all content isn't even a good use of Js
        Signature

        {{ DiscussionBoard.errors[11164350].message }}
  • Profile picture of the author pauloadaoag
    Administrator
    AH I FIGURED IT OUT.

    Your angular app, fetches content from api.crushprice.com to render the items on your store.


    Your robots.txt however contains this.


    Your robots.txt blocks the resources that google needs to render your page correctly!

    I suggest removing api.crushprice.com from your robots.txt, add date updated to your sitemap entries and then resubmitting sitemap to google webmaster tools.

    If it works, please keep us updated .
    Signature
    {{ DiscussionBoard.errors[11162197].message }}
    • Profile picture of the author Rajender Smts
      Thank you for the amazing Response, I don't want to remove that api.crushprice.com as it contains site security files. I will try to remove that need to discuss with my team .
      Once again thank you
      {{ DiscussionBoard.errors[11163560].message }}
  • As far I know, if you resubmit a sitemap it will trigger and crawler of your site..
    To submit your sitemap:
    1. Select your site on your Google Search Console home page.
    2. Click Crawl.
    3. Click Sitemaps.
    4. Click ADD/TEST SITEMAP.
    5. Type sitemap.xml.
    6.Click Submit Sitemap.
    {{ DiscussionBoard.errors[11162207].message }}
  • Profile picture of the author yukon
    Banned
    Originally Posted by Rajender Smts View Post

    Hey everyone I wa working on this hxxp://www.crushprice.com Angularjs website Since 3 months, I have noted that It is not getting crawled by spiders.

    How to overcome this situation. Thank you



    There's 4,720 pages indexed, what's the problem?


    If the on-page text is in javascript (I didn't look) you're pretty much hiding the content on purpose because the Google algo. is based on plain text. Google can crawl some basic javascript but you should never depend on it for showing up in the SERPs.

    Matter of fact I've used javascript for years to hide content from Google, stuff like text/links I did't want skewing my optimized plain text/links.
    {{ DiscussionBoard.errors[11163582].message }}
    • Profile picture of the author Rajender Smts
      Hey yukon,
      We are using Server side rendering tools like Prerender.is that is why we have indexed our pages but we are having issues with indexing of keywords as the whole page is not getting crawled and some more OnPage issues as well.
      {{ DiscussionBoard.errors[11163610].message }}
      • Profile picture of the author yukon
        Banned
        Originally Posted by Rajender Smts View Post

        ...we are having issues with indexing of keywords as the whole page is not getting crawled and some more OnPage issues as well.


        That's because you're displaying content browser side. The Google algo isn't a browser like what we use, their algo is a text scraper. Google scrapes plain text from the HTML source code.

        If you hide that text (ex: javascript), well, you're gambling on purpose when it comes to Google scraping on-page text while indexing the webpage/s.

        If the content is important for SEO you don't have any other option than using plain text in the HTML.

        Look at the Google cache (text version), that will show you what Google is parsing.

        Show me a link to one of your problem pages that's indexed, I'll show you how Google is looking at the page (text version).
        {{ DiscussionBoard.errors[11163614].message }}
        • Profile picture of the author Rajender Smts
          Thanks for the response,

          Here is one of the URL's I have checked some Search bot simulators all those results are not effective. Please go ahead with this http://www.crushprice.com/all-mobile...es-with-prices
          {{ DiscussionBoard.errors[11163627].message }}
          • Profile picture of the author yukon
            Banned
            Originally Posted by Rajender Smts View Post

            Thanks for the response,

            Here is one of the URL's I have checked some Search bot simulators all those results are not effective. Please go ahead with this http://www.crushprice.com/all-mobile...es-with-prices



            That URL isn't indexed, I meant a URL that's indexed but you thought it wasn't showing all the content.

            You can still do a Google cache (plain text) simulation with a web/dev browser plugin, even for offline content. Use these plugin settings below:
            1. Disable > Disable Javascript > All Javascript
            2. CSS > Disable Styles > All Styles
            3. Images > Disable Images > All Images

            I went ahead and did a cache (text version) simulation, this is extremely close to what Google will display once the page is indexed. As you can see, your page might as well be a blank page because there's nothing as far as SEO.







            {{ DiscussionBoard.errors[11163637].message }}
            • Profile picture of the author Rajender Smts
              Thank you Yukon,
              Is there any way to over come this? with AngularJs project as my management are not affordable to built a new one. How can get my Keywords indexed. It will be great for me to resolve this issue.
              {{ DiscussionBoard.errors[11163646].message }}
              • Profile picture of the author yukon
                Banned
                Originally Posted by Rajender Smts View Post

                Thank you Yukon,
                Is there any way to over come this? with AngularJs project as my management are not affordable to built a new one. How can get my Keywords indexed. It will be great for me to resolve this issue.


                No, as you can see with the screenshot I posted earlier you might as well be trying to rank a blank page because Google will ignore the javascript content.

                Matter of fact, again, looking at that screenshot you're doing it completely backwards. You should be trying to hide all that fluff (social links, subscribe text, invalid credential errors, etc...) and focus on showing Google the real content of the product you're trying to sell.

                The only solution is to include the content as plain text on the webpage.
                {{ DiscussionBoard.errors[11163673].message }}
              • Profile picture of the author pauloadaoag
                Administrator
                Unblock the api endpoint from google.

                While zero javascript needed to render a page is optimal, there is evidence that google *does* run javascript to render a page. You just have to give it the resources to do so. You mentioned some security credentials being in your endpoint, is it possible for you to just remove them?
                Does google execute javascript
                Signature
                {{ DiscussionBoard.errors[11164338].message }}
                • Profile picture of the author yukon
                  Banned
                  Originally Posted by pauloadaoag View Post

                  Unblock the api endpoint from google.

                  While zero javascript needed to render a page is optimal, there is evidence that google *does* run javascript to render a page. You just have to give it the resources to do so. You mentioned some security credentials being in your endpoint, is it possible for you to just remove them?
                  Does google execute javascript




                  There's also evidence that some people have won big money in Las Vegas casinos.

                  You willing to bet the farm (all search traffic) on javascript or go for the guarantee (plain text)?

                  Suggesting to use javascript for SEO is bad advice.
                  {{ DiscussionBoard.errors[11164341].message }}
                  • Profile picture of the author pauloadaoag
                    Administrator
                    I agree but I think in this case, it would not be trivial for the OP to re engineer his site from scratch.
                    Signature
                    {{ DiscussionBoard.errors[11164731].message }}
                    • Profile picture of the author yukon
                      Banned
                      Originally Posted by pauloadaoag View Post

                      I agree but I think in this case, it would not be trivial for the OP to re engineer his site from scratch.

                      Trivial?

                      You can't possibly be serious.

                      Look at that screenshot above, he's got nothing. As long as he's playing games (javascript) he'll continue to have nothing as far as onpage SEO.
                      {{ DiscussionBoard.errors[11164792].message }}
                      • Profile picture of the author pauloadaoag
                        Administrator
                        yep I agree. I do think my recommendation (just unblock the api url) will help somewhat and its just a single line change rather than overhauling his entire architecture.

                        Checking
                        this even shows google trying to fetch his own data, but failing due to the restrictions.
                        Signature
                        {{ DiscussionBoard.errors[11164820].message }}
  • Profile picture of the author Rajender Smts
    Hey Warriors,

    Thank you for the amazing information. I have removed api.crushprice.com and using prerender software to render all the content for spiders and crawlers.
    Checking Checking
    Thank you
    {{ DiscussionBoard.errors[11165715].message }}
  • Profile picture of the author Justinspencer
    Hi,
    you can resubmit the sitemap.
    {{ DiscussionBoard.errors[11176870].message }}
  • Profile picture of the author pauloadaoag
    Administrator
    Rajender Smts Any updates? I see that your page now has 11k pages indexed in google
    Signature
    {{ DiscussionBoard.errors[11176872].message }}
    • Profile picture of the author Rajender Smts
      thank you for the message. Yeah we got 11k pages on Google and we are using prerender.io tool for rendering content and as you people told i have removed api.crushprice.com. and made it available for robots to access.
      {{ DiscussionBoard.errors[11176874].message }}
  • Profile picture of the author markowe
    Thought I'd add some observations about Google crawling Ajax-rendered content that I have seen (disclaimer - this site is my first Ajax-heavy site and still has not been indexed despite upwards of 5000 pages a day being crawled, I am just saying what I have seen):

    1) Google definitely SOMETIMES executes abritrary JS - I have a custom-made site stats routine that is triggered asyncronously via Ajax code on the page (the Ajax basically fetches a PHP file which doesn't actually return anything, just logs the visit in the dbase). The GoogleBot triggers this code maybe once every 30 visits (comparing the number of logged visits with actual visits from my Apache access_log), but evidently recognises it as something not worth crawling regularly. The asynchronous stuff on my page is deferred until the static content has fully rendered, so the bot may be executing the deferral routine, and not just the Ajax call.

    2) Google definitely fetches JSON data that is loaded via Ajax in order to fully render a page, and it does so ALMOST as frequently as it fetches the static page that the Ajax was called from (again, I can see this from my Apache logs). What is weird is that at times (more rarely) it will fetch the Ajax content within seconds of crawling the static page, at other times (more commonly) it will fetch the JSON content completely independently (at a different time). E.g like this:

    Code:
    14:39:01	66.249.79.135	GET	/wp-json/rest/v1/sitedata/?data=12345678910 [crawl of JSON data usually fetched via Ajax - the JSON contains fully-rendered html]
    12:25:52	66.249.79.135	GET	/data-search/?data=12345678910 [static page crawl]
    It seems like it has a pretty robust approach to crawling and processing that data, and given its interest in that content one would imagine that it will all contribute to the ranking of the given page, so it's all getting put back together somewhere behind the scenes.

    3) About Google not actually rendering pages, well, it is rare to see in the logs that the bot has fetched CSS, but then it doesn't need to do it often, I am sure it is cached. Many have discussed how Google does appear to fully render pages (including JS and CSS) when indexing them. Maybe full rendering of pages is reserved for "important" sites, but we sure as hell know the crawler can do it, because you can Fetch pages via Google's various tools and see them fully rendered. E.g. run a random JS/html5 demo through the PageSpeed tool and you will see it rendered: https://developers.google.com/speed/...ge.sh%2F&hl=en

    It has long been theorised that the Chrome browser in fact IS the technology that is used to render pages on the back end of Google.

    So my feeling is that in this day and age it is no longer true that Google is crawling only flat HTML and that the technology is very much able to render even very complicated web apps - it's going to have to be with HTML5/JS running the modern web. Which doesn't explain the problems of OP, I just thought I'd add this as a point of discussion.

    I do still agree with Yukon that your static content is a big deal, especially in terms of page speed, since it's no good making your visitors wait 5 seconds before anything is rendered at all, and that is something I addressed on my own site after we had a discussion about that in another thread last month and saw not only big increases in my pagespeed score (maybe Google only scoring the static part?!) but also a huge increase in crawl rate, from a few hundred pages a day to thousands. Now, if only it would actually index them, lol.
    Signature

    Who says you can't earn money as an eBay affiliate any more? My stats say otherwise

    {{ DiscussionBoard.errors[11217398].message }}
Avatar of Unregistered

Trending Topics