Finding when a URL was first indexed

17 replies
  • SEO
  • |
This is sort of SEO related but not. Let me know if I'm posting in the wrong place. Is there a way to find out when Google and other search engines first indexed a particular URL? The Internet Archive gets me to within a month or so but I need to narrow it down a bit more.
#finding #indexed #url
  • Profile picture of the author Joshua Uebergang
    Originally Posted by troyy0206 View Post

    This is sort of SEO related but not. Let me know if I'm posting in the wrong place. Is there a way to find out when Google and other search engines first indexed a particular URL? The Internet Archive gets me to within a month or so but I need to narrow it down a bit more.
    What's your reason?

    One strategy for new pages is to set a Google Alert.
    {{ DiscussionBoard.errors[8680828].message }}
    • Profile picture of the author troyy0206
      Originally Posted by Joshua Uebergang View Post

      What's your reason?

      One strategy for new pages is to set a Google Alert.
      I have a legal case where I need to prove when a certain URL first appeared in the search indexes of the major search indexes. I know within about six weeks of when it appeared based on the sitemap file and the Internet Archive. If might be able to pin the case down further if I can show when the URL first appeared in the indexes. Since their sitemap says the updates are weekly, I should be able to prove within a week of when the URL appeared. If I can do that, it solidifies the case that the URL was placed on the site in coordination with some other documents being produced at the same time.

      BTW, I'm not the attorney, I'm a consultant.
      {{ DiscussionBoard.errors[8680920].message }}
  • Profile picture of the author yukon
    Banned
    Maybe check your server logs for Googlebot, since you know the approx. date?
    {{ DiscussionBoard.errors[8680999].message }}
    • Profile picture of the author troyy0206
      Originally Posted by yukon View Post

      Maybe check your server logs for Googlebot, since you know the approx. date?

      No, no, no...I'm trying to prove when SOMEONE ELSE added a specific URL.

      As I mentioned, I have it narrowed to about a 40 day window and suspect it was the date the sitemap was updated as it corresponds with other information, but the SE indexes might confirm that, to a degree.
      {{ DiscussionBoard.errors[8681018].message }}
      • Profile picture of the author yukon
        Banned
        Originally Posted by troyy0206 View Post

        No, no, no...I'm trying to prove when SOMEONE ELSE added a specific URL.

        As I mentioned, I have it narrowed to about a 40 day window and suspect it was the date the sitemap was updated as it corresponds with other information, but the SE indexes might confirm that, to a degree.
        I kinda doubt you'll get an exact date If you don't have access to the site/logs.
        {{ DiscussionBoard.errors[8681113].message }}
      • Profile picture of the author MikeFriedman
        Originally Posted by troyy0206 View Post

        No, no, no...I'm trying to prove when SOMEONE ELSE added a specific URL.

        As I mentioned, I have it narrowed to about a 40 day window and suspect it was the date the sitemap was updated as it corresponds with other information, but the SE indexes might confirm that, to a degree.
        The problem is, I highly doubt that your method of narrowing it down would hold up in court. Just because something is indexed in the archive on a specific date, does not correspond in any way with when the URL was created. That could easily be discredited. Same situation with the sitemap thing.

        For anything definitive, you would have to get the server logs.
        Signature

        For SEO news, discussions, tactics, and more.
        {{ DiscussionBoard.errors[8682219].message }}
        • Profile picture of the author Mike Anthony
          As others have indicated the answer is no without server logs. The only thing Google would offer is its search by date function but I doubt that s specific enough.
          Signature

          {{ DiscussionBoard.errors[8682376].message }}
          • Profile picture of the author yukon
            Banned
            Originally Posted by Mike Anthony View Post

            As others have indicated the answer is no without server logs. The only thing Google would offer is its search by date function but I doubt that s specific enough.
            That SERP date filter is easy to fake.

            I have 6 year old pages with SERP dates that show less than a week old.
            {{ DiscussionBoard.errors[8683150].message }}
            • Profile picture of the author paulgl
              Google keeps very good data records.

              If your case is legit, and that date is important to a judge,
              I guarantee it would be easy to subpoena the records.
              As well as the host of the page, as an afterthought.

              All of those are a big if.

              Server logs for the site's host would tell when it was first
              crawled by a googlebot.

              If this is a nickel and dime case, forget it.

              High profile criminal case, everything is out in the open.
              Nothing truly ever gets wiped clean in most instances.

              Paul
              Signature

              If you were disappointed in your results today, lower your standards tomorrow.

              {{ DiscussionBoard.errors[8683422].message }}
              • Profile picture of the author yukon
                Banned
                Originally Posted by paulgl View Post

                Google keeps very good data records.

                If your case is legit, and that date is important to a judge,
                I guarantee it would be easy to subpoena the records.
                As well as the host of the page, as an afterthought.

                All of those are a big if.

                Server logs for the site's host would tell when it was first
                crawled by a googlebot.

                If this is a nickel and dime case, forget it.

                High profile criminal case, everything is out in the open.
                Nothing truly ever gets wiped clean in most instances.

                Paul
                I doubt it would be easy to force Google to do anything. I'm sure they have the best lawyers on retainer.
                {{ DiscussionBoard.errors[8683469].message }}
  • Profile picture of the author webdevpro
    As you need the accurate results, so only analyzing the server logs will server you in the best way. If these are not accessible directly you can ask from you web host.
    {{ DiscussionBoard.errors[8681231].message }}
  • Profile picture of the author patco
    I investigated this and I didn't find a way to do this... You could add any log to show this in your hosting, but I don't think you can do this for another webmaster's website!
    Signature

    A blog that will show you How to Lose Weight with a cool Quick Weight Loss guide...
    Also enjoy some of my favorite Funny pictures and photos that will make you smile :)

    {{ DiscussionBoard.errors[8682193].message }}
  • Profile picture of the author Kevin Maguire
    Originally Posted by troyy0206 View Post

    This is sort of SEO related but not. Let me know if I'm posting in the wrong place. Is there a way to find out when Google and other search engines first indexed a particular URL? The Internet Archive gets me to within a month or so but I need to narrow it down a bit more.
    Might help, but would never stand up in court. If the url does not have a post date, you can find last index by using the "Search Tools" in Google search results and search by custom time.

    I'd go like this

    site:url-you-want-to-find-without-the-http://-part





    Search Tools>Any Time





    Any Time>Custom Range





    And use this to try narrow down the time of last index, by finding the first page that looks like this.




    You will end up with the day of index, but as others pointed out, it can be manipulated. And you probably wouldn't be able to build a case around it.

    Also just to add, you asked for first time of "Index". Which does not reflect the time it was created. Some urls can take days, weeks, months or never index at all. The 2 don't go hand in hand.
    {{ DiscussionBoard.errors[8683883].message }}
    • Profile picture of the author yukon
      Banned
      Originally Posted by Kevin Maguire View Post

      Might help, but would never stand up in court. If the url does not have a post date, you can find last index by using the "Search Tools" in Google search results and search by custom time.

      I'd go like this

      site:url-you-want-to-find-without-the-http://-part





      Search Tools>Any Time





      Any Time>Custom Range





      And use this to try narrow down the time of last index, by finding the first page that looks like this.




      You will end up with the day of index, but as others pointed out, it can be manipulated. And you probably wouldn't be able to build a case around it.

      Also just to add, you asked for first time of "Index". Which does not reflect the time it was created. Some urls can take days, weeks, months or never index at all. The 2 don't go hand in hand.





      That date filter is what I was talking about earlier in this thread.

      Here's an example of a skewed SERP date.

      This WSJ page was last cached on Oct 29, 2013, today is Nov 8, 2013, Google SERPs has a date for that page that's 6 hours old. The actual web page has a date of Updated June 30, 2013 7:53 p.m. ET










      {{ DiscussionBoard.errors[8684247].message }}
      • Profile picture of the author troyy0206
        Thanks for the suggestions, everyone. As I mentioned, I have for certain a date the content was not on the website and I also have the date the URL was either added or edited in the sitemap.xml that's currently on the site, along with the same date that shows the pages that link to the content was added. Those things alone combined with some other evidence of emails sent the same day tend to leave a trail of bread crumbs. I was planning to suggest if they feel the need more certainty, they will need to subpoena the host and/or the search engines to find out when the URL was first indexed and that would provide a much more solid foundation that proves, alongside my screenshot I first mentioned, then the content was NOT on the site. I'll just have to see how important they think the evidence is. However, if this company has their own servers they manage themselves in a data center or in-house, the search engine indexes are the only place to get more information and I don't think it's going to help much more than what I've already proven. It's not something the Feds are going to raid and snag the hardware without notice or anything.

        It's definitely a case worth them trying to figure it out as long as they feel the evidence is going to do something to help and not be blown off by the judge. It's a huge mass tort case.
        {{ DiscussionBoard.errors[8684533].message }}
  • Profile picture of the author troyy0206
    The Custom Date search verified my findings. The date shows the exact same date the sitemap shows the page was created. Thanks everyone!!
    {{ DiscussionBoard.errors[8684541].message }}
    • Profile picture of the author yukon
      Banned
      Originally Posted by troyy0206 View Post

      The Custom Date search verified my findings. The date shows the exact same date the sitemap shows the page was created. Thanks everyone!!
      ...again that SERP date can easily be skewed, even If it wasn't intentionally done by a webmaster. In other words, that SERP date isn't reliable proof of when the page was first indexed.

      Any lawyer that knows the SERPs or has their own hired research done can prove those SERP dates aren't dependable. All they have to do is reference most news related sites like the WSJ example.

      IMO, your only reliable date source are the domain server logs for the web page your researching.
      {{ DiscussionBoard.errors[8684564].message }}

Trending Topics