WMTools - URL SILO with dynamic scripting

14 replies
  • SEO
  • |
I am reworking a large scripted website with over a million pages. The site was developed a while ago. Much has changed in the Internet world since it's original implementation. The site is dynamically scripted and there is no way around this.

It's existing format is:

www.domain.com/scriptname?{querystr}.

Many of the query strings are quite long and SEO unfriendly.

I have devised a rather elaborate virtual SILO path system to place in front of the script?{querystr}. Every page now has a unique SILO path such as:

www.domain.com/Books/Fiction/Title/Contents/scriptname?{querystr}

or

www.domain.com/California-Law/Probate-Code/Index/A/scriptname?{querystr}

These are somewhat simplified examples but the principle is exemplified. Every page now has a completely unique path with an associated canonical SILOPATH/script?{querystr} reference. The pages also have completely unique titles and H* tags (and of course, content).

The paths correlate to overall site structure and hierarchy while minimizing nested sub-directories wherever possible.

In theory, search engines should be able to differentiate pages from the SILO path alone, query parameters notwithstanding. The query string IS required for the server to deliver the page however.

My first question is about the Webmaster tools. Can I tell WMT to ignore all parameters? Will this cause bots to not retain the full url? Or will this simply cause the bot to sort/index pages by the SILO without trying to figure out all the variations of the query string?

In other words, does telling Google to ignore a parameter (such as session id) remove them from an indexed SERP URL?

Also, the script-name portion of the URL is ALWAYS the same name. Only the path and query parameters are different. I'm assuming this is ok as google should see:

/cars/buicks/index.html

and

/cars/dodge/index.html

as two different files.

Has anyone else ever experimented with a SILO path in front of a scripted?{querystr} format?
#dynamic #scripting #silo #url #wmtools
  • Profile picture of the author savidge4
    Originally Posted by Steviebone View Post

    In other words, does telling Google to ignore a parameter (such as session id) remove them from an indexed SERP URL?

    Has anyone else ever experimented with a SILO path in front of a scripted?{querystr} format?
    I honestly cant say that I have worked on a site specifically like you are talking. However, I stayed in a holiday in express last night and have this to share....

    I have a website that was getting a ton of use with the search function. What was happening is I was watching analytics and my site page count was growing and growing... there was a missed check in something and long story short I was basically indexing each and every search as a page. So I went into WM and removed /?s=. there is no option to keep a portion of the URL... you are pulling any url with /?s or you are choosing to keep it. as far as I know there is no fine line there.
    Signature
    Success is an ACT not an idea
    {{ DiscussionBoard.errors[9875728].message }}
    • Profile picture of the author Steviebone
      So I went into WM and removed /?s=. there is no option to keep a portion of the URL... you are pulling any url with /?s or you are choosing to keep it. as far as I know there is no fine line there.
      Interesting, but I am not sure that is entirely correct. WMT says you can use the feature to ignore things like session id's. On many sites, this would be found on almost EVERY page.

      As for search pages I just use nofollow noindex on the links to the search as well as on the result pages themselves.

      Putting the ?/S in a robots.txt file is a whole other thing entirely. Maybe that's what you were referring to?

      My question was actually more to do with how Google stores the indexed URL for the SERP. In the case of session ids, if the url string is indexed with a query parameter that has been excluded, is it stripped from the string in the SERP? Or just ignored in the indexing. This is important to me because theoretically I could tell Google to ignore ALL the parameters and just use the SILO path so long as the returned SERP url includes the entire query string.

      Thanks.
      {{ DiscussionBoard.errors[9878007].message }}
      • Profile picture of the author savidge4
        Here is the thing that throws everything off. your /?script HAS to be there. You CAN go into GA and for filtering purposes remove that. However, when it comes down to tracking landing pages etc. you would only know to what silo, and not specifically what page.

        As I see it the hassle starts in the back end ( tracking issues ) and to carry that across to the front end of things... basically you are saying you remove that /?string and your links would take the visitor no where.

        So the essence of the question becomes mute. if you are unable to remove the /?string due to usability, and more importantly navigation... how do you expect to remove it and still maintain the function? I guess really the question you were asking is do the back end GA changes effect the front end. And the answer is NO. a GA filter is NOT in anyway shape or form going to change the URL structure on YOUR site, just the way it sees those things in GA specifically.

        So now as you pointed out in my situation. I had issues with ever developing pages. I tried knocking them back in GA, but that didn't "Stop" the web side behavior, and ultimately I had to create a rule in .htaccess to correct my issue

        you are in the same boat of sorts. you need to get in and create a "ReWriteRule" function to shorten the "visible". Here is a thread in a forum youmight find of interest. HTML Rewrite

        Hope that Helps!

        that staying at Holiday in express stuff is a lie! I am much smarter after sleeping at home! LOL
        Signature
        Success is an ACT not an idea
        {{ DiscussionBoard.errors[9878391].message }}
        • Profile picture of the author Steviebone
          Well... I don't think I making myself clear. This has nothing to do with analytics.

          In Webmaster Tools, under Crawl then URL Parameters you can tell Google how to handle every parameter. See this page:

          https://support.google.com/webmasters/answer/6080550

          What I am asking is how google stores the url's for the serps (not how it filters duplicate content). The issue of duplicate content is handled thru canonicals anyway.

          What I am trying to determine is how the URL's for SERPS are stored when google is told to ignore parameters such as session ids.

          If i tell Google to ignore a parameter such as 'sessid=xxx', I'm assuming the parameter does not appear in the SERP link. For a session id that's fine. For other parameters, not so.

          Each and EVERY silo path is unique, so theoretically, for the purposes of filtering duplicate content, Google could ignore ALL parameters. But not if it would then strip all these parameters out of the SERP url.

          This may all be moot since Google can simply use the declared canonical for each page.

          After reading more, I am convinced this has more to with crawling than anything else. At the bottom of the page the use of canonicals is mentioned. For now, I think I am going to rely on the use of canonicals and let google parameter itself to it's heart's content.
          {{ DiscussionBoard.errors[9883071].message }}
          • Profile picture of the author savidge4
            Originally Posted by Steviebone View Post

            Well... I don't think I making myself clear. This has nothing to do with analytics. In Webmaster Tools
            I think that makes 2 of us here. but be it GA or GWT, you are only changing the way that Google deals with it. it no way are you changing the actual URL.. just the way Google is looking at it. IF you are wanting to change that actual physical URL and the way it is displayed to the world, you have to do that within the confines of your site / domain / server.

            With the link you posted look at the "Purpose" column, and see exactly what alterations are doing and what Google does in each of those instances. In NONE of those does it say it changes the URL... it is simply saying if you change this the Google bot will do this, if you extract that, Google bot will do that.

            The greater question at hand regardless if the idea that changing the URL in Google could happen; why all the focus on Google only? server side changes would be UNIVERSAL in effect. And again the ONLY way you can change the URL is on your side.
            Signature
            Success is an ACT not an idea
            {{ DiscussionBoard.errors[9883164].message }}
            • Profile picture of the author Steviebone
              I think that makes 2 of us here. but be it GA or GWT, you are only changing the way that Google deals with it. it no way are you changing the actual URL.. just the way Google is looking at it. IF you are wanting to change that actual physical URL and the way it is displayed to the world, you have to do that within the confines of your site / domain / server.
              I am not wanting to change the physical URL, all that is already handled by the scripting on the server. I was merely curious how Google would serve the URL back in SERPS.

              After reading and experimentation it appears that the Webmaster tool in question only affects how the site is crawled and how Google interprets duplicate content on scripted site.

              I think at this point the entire question is made moot by the proper use of canonical links for all pages. Google should serve the canonical URL as stated in any SERPS. So for now, I am leaving all that parameter stuff alone.

              The greater question at hand regardless if the idea that changing the URL in Google could happen; why all the focus on Google only? server side changes would be UNIVERSAL in effect. And again the ONLY way you can change the URL is on your side.
              The focus was on google only because the question had to do with the specific Google Webmaster tool in question. It never had anything to do with anything else. Bing may have a similar tool, I don't know, haven't spent all that much time with the Bing webmaster stuff at this point. Bing represents about 1/100th of the referral traffic I get from Google.

              The confusion began because Google spoke about ignoring certain parameters like session ids. This made me wonder, if I told Google to ignore a parameter used for a session id would it not serve it up in a SERP.
              {{ DiscussionBoard.errors[9890179].message }}
  • Profile picture of the author Steviebone
    PS: As for tracking, I have in house for all that anyway. Analytics has never been able to handle that to my satisfaction.
    {{ DiscussionBoard.errors[9883074].message }}
  • Profile picture of the author yukon
    Banned
    Originally Posted by Steviebone View Post

    I am reworking a large scripted website with over a million pages. The site was developed a while ago. Much has changed in the Internet world since it's original implementation. The site is dynamically scripted and there is no way around this.

    It's existing format is:

    www.domain.com/scriptname?{querystr}.

    Many of the query strings are quite long and SEO unfriendly.

    I have devised a rather elaborate virtual SILO path system to place in front of the script?{querystr}. Every page now has a unique SILO path such as:

    www.domain.com/Books/Fiction/Title/Contents/scriptname?{querystr}

    or

    www.domain.com/California-Law/Probate-Code/Index/A/scriptname?{querystr}

    These are somewhat simplified examples but the principle is exemplified. Every page now has a completely unique path with an associated canonical SILOPATH/script?{querystr} reference. The pages also have completely unique titles and H* tags (and of course, content).

    The paths correlate to overall site structure and hierarchy while minimizing nested sub-directories wherever possible.

    In theory, search engines should be able to differentiate pages from the SILO path alone, query parameters notwithstanding. The query string IS required for the server to deliver the page however.

    My first question is about the Webmaster tools. Can I tell WMT to ignore all parameters? Will this cause bots to not retain the full url? Or will this simply cause the bot to sort/index pages by the SILO without trying to figure out all the variations of the query string?

    In other words, does telling Google to ignore a parameter (such as session id) remove them from an indexed SERP URL?

    Also, the script-name portion of the URL is ALWAYS the same name. Only the path and query parameters are different. I'm assuming this is ok as google should see:

    /cars/buicks/index.html

    and

    /cars/dodge/index.html

    as two different files.

    Has anyone else ever experimented with a SILO path in front of a scripted?{querystr} format?
    Looking at a random ebay URL shows Google caching the URLs below (same ebay webpage) with two different dates which tells me the query string at the end of the URL is enough to skew SEO even If it's the same webpage on the same domain.

    [cached: Feb 13, 2015 13:10:36 GMT]



    [cached: Dec 25, 2014 09:27:19 GMT]
    In the ebay example above a canonical tag is used on the 1st & 2nd URLs/page pointing back to the shorter URL . That tells Google the URL that includes the query string is redundant.

    Code:
    <link rel="canonical" href="http://www.ebay.com/itm/53-710-GPH-Submersible-Pump-Aquarium-Fish-Tank-Fountain-Water-Hydroponic-/281540553477" >
    This is typical on eCommerce sites, not my idea of fun (things can get touchy).
    {{ DiscussionBoard.errors[9891872].message }}
    • Profile picture of the author Steviebone
      Originally Posted by yukon View Post

      Looking at a random ebay URL shows Google caching the URLs below (same ebay webpage) with two different dates which tells me the query string at the end of the URL is enough to skew SEO even If it's the same webpage on the same domain.

      In the ebay example above a canonical tag is used on the 1st & 2nd URLs/page pointing back to the shorter URL . That tells Google the URL that includes the query string is redundant.

      Code:
      <link rel="canonical" href="http://www.ebay.com/itm/53-710-GPH-Submersible-Pump-Aquarium-Fish-Tank-Fountain-Water-Hydroponic-/281540553477" >
      This is typical on eCommerce sites, not my idea of fun (things can get touchy).

      Thanks Yukon. Yes I know the query strings skew and can create duplicate content, hence the need for canonicals. And yes, constructing canonicals on a scripted site working from a large database can be quite tedious.

      My question was about what impact using the WMT tool had on the URL served in the SERP. In other words, if I tell google to ignore a session id parameter (in WMT) is it removed from the url served in the SERP?

      I think the question may be moot however since the use of a canonical (for example without a session id) would solve this problem assuming then that the only page indexed and served in a SERP would be the canonical reference. Yes?
      {{ DiscussionBoard.errors[9895868].message }}
      • Profile picture of the author patadeperro
        Originally Posted by Steviebone View Post

        My question was about what impact using the WMT tool had on the URL served in the SERP. In other words, if I tell google to ignore a session id parameter (in WMT) is it removed from the url served in the SERP?

        I think the question may be moot however since the use of a canonical (for example without a session id) would solve this problem assuming then that the only page indexed and served in a SERP would be the canonical reference. Yes?
        Here you are confusing how Google saves the pages, how the pages are ranked and the use of the canonicals, using the configuration I explained above will avoid the urls to be saved in Google cache, but the canonical page will keep ranking, the parameters after the "?" (for what I understand) are or tracking parameters or parameters to sort out the displayed information.
        {{ DiscussionBoard.errors[9895884].message }}
      • Profile picture of the author yukon
        Banned
        Originally Posted by Steviebone View Post

        Thanks Yukon. Yes I know the query strings skew and can create duplicate content, hence the need for canonicals. And yes, constructing canonicals on a scripted site working from a large database can be quite tedious.

        My question was about what impact using the WMT tool had on the URL served in the SERP. In other words, if I tell google to ignore a session id parameter (in WMT) is it removed from the url served in the SERP?

        I think the question may be moot however since the use of a canonical (for example without a session id) would solve this problem assuming then that the only page indexed and served in a SERP would be the canonical reference. Yes?
        I would stick with the canonical tag.

        I have no idea about removing query string URLs with WMT, never needed it.
        {{ DiscussionBoard.errors[9896086].message }}
  • Profile picture of the author patadeperro
    Originally Posted by Steviebone View Post

    In other words, does telling Google to ignore a parameter (such as session id) remove them from an indexed SERP URL?
    Yes there is, you go to Google Webmaster tools, URL parameters and you configure them there:




    This needs to be done as well as Yukon's suggenstions
    {{ DiscussionBoard.errors[9893393].message }}
    • Profile picture of the author Steviebone
      Originally Posted by patadeperro View Post

      Yes there is, you go to Google Webmaster tools, URL parameters and you configure them there:




      This needs to be done as well as Yukon's suggenstions
      <sigh>Please read all the posts? I know this page exists, I already mentioned it several times. My question was about HOW it effected the stored URL results stored in the SERPS. Does telling WMT to ignore a parameter (such as a session id) remove it from the url served in the SERP?
      {{ DiscussionBoard.errors[9895847].message }}
      • Profile picture of the author patadeperro
        Originally Posted by Steviebone View Post

        <sigh>Please read all the posts? I know this page exists, I already mentioned it several times. My question was about HOW it effected the stored URL results stored in the SERPS. Does telling WMT to ignore a parameter (such as a session id) remove it from the url served in the SERP?
        First you were asking where in Webmaster tools to do this, that is what I answered to you, and if you know where it is and you have done it before you will see that there is an option that says "Let Google decide" or "Representative parameter" and what happends when you chose the first is that Google will index the urls that are constantly being requested (even with the parameters after the"?") and will send you error mesages when those parameters don't exist anymore (404 errors) on the other configuration those urls will not be saved in Googles Cache.
        {{ DiscussionBoard.errors[9895880].message }}

Trending Topics