Would this be a violation of Google's TOS?

by carlos123

Posted: 14 years ago 3 replies

I have been looking at ways to speed up my keyword research without violating Google's TOS and was wondering if anyone could give me some input on whether the following would run afoul of Google.

Under Linux I can run several different browsers. One such browser is called Lynx. It is a text only browser that is still used by some when they want to just browse a text only internet with no images.

I can do the following at a command prompt...

lynx -dump "http://www.google.com/search?hl=en&q=black+tuxedos" | grep Results | awk -F "of about" '{print $2}' | awk '{print $1}'

What that will do is return to me the number of pages that show up for the exact phrase "black tuxedos".

A couple of things to note here.

I am the one initiating the query through a browser. Just like I would through Firefox. The browser, in this case Lynx, is fetching the appropriate page for me just like Firefox would.

And I am in the one typing in the query phrase which I wish to look up. Just like I would have to do if I were using the normal way of accessing Google through Firefox.

The only real difference here is that I am typing the search phrase in through the command line and giving it to Lynx instead of Firefox.

The other difference is that Lynx (and some Linux utilities) search the page that is returned for only the information I am interested in and do not display that actual SERP page to me. But when I am looking for these numbers through Firefox I don't much look at the SERP results page either so it's really not very different if at all from what I do through Firefox.

Any thoughts on whether this would be frowned upon by Google?

For that matter would Google frown upon Niche Bot, Market Samaria, and all the other one's that do the searching for their users?

Like I said I don't want to violate the Google terms of use if I can help it. I am just trying to reduce the time it takes me to do keyword research.

Carlos

PS. Please be aware that the Google may see your responses to this post. Answer at your own risk.

#google #tos #violation

Harvey Affcash

14 years ago

Wow... paranoid much? Google's algorithm isn't some omniscient entity, nor is it likely to be able to match people's aliases here to their sites, considering that most names are not unique, most people don't use their forum handles in WhoIs info, and Google has no access to the IP records here.

Now, to answer your actual question.
It'll be fine. Most people don't remember a time before IE, but I remember a time before Mosaic, when text browsing was all that was available. Considering that the back end operations run by Lynx are nearly exactly the same as FF or IE, with the only difference being that the modern browsers will render them in a GUI.
Where you might run into a problem is the limitation of only getting results form one page of returned information at a time, and I have no idea what command parameters you'd need to use to extend the number of results per page, or crawl further pages.

You may also run into an issue that if you do figure out how to surmount the previous problem that you could very well end up with a buffer overflow, unless the niche doesn't have too many results in it. "Black tuxedos" has ~156,000, but something with more even one order of magnitude more competition like that is going to be a major resource hog.

Thanks
2 replies

{{ DiscussionBoard.errors[1513361].message }}

carlos123 14 years ago

Originally Posted by Harvey Affcash

Wow... paranoid much?

Thanks for the input Harvey. You misunderstand my intent. I am not concerned about the great Google finding my post or seeing me ask such a question. If I was I would not have posted.

I am concerned about violating my own God given conscience and was asking from the standpoint of wanting to discuss this some to see if indeed such a thing violated Google's terms.

I won't quote myself at length but I just posted a response to a blog post that you might want to read.

It's at If you use Web CEO, SpyFu, or SEOmoz’s Rank Checker then you are dishonest. | SEO Industry | The Organic SEO

I am not connected with that blog at all. I just happened to bump into it while doing a search on this issue.

I am not interested in what I can get away with. From that standpoint my question is a mute point as Google will never be able to tell that I using Lynx if I do some other things like change the User Agent.

Of course if using Lynx is deemed to be okay it is a short skip and jump from there to creating a fully Linux based solution to get the data I need in a more automated manner without actually violating Google's TOS...all through the Lynx browser.

I am interested in doing what is right while saving myself some keyword research time.

Carlos
- Thanks
{{ DiscussionBoard.errors[1513379].message }}

carlos123

14 years ago

Having clarified where I am coming from in wanting to discuss this I would like to address some of what you brought up otherwise Harvey....

Originally Posted by Harvey Affcash

Where you might run into a problem is the limitation of only getting results form one page of returned information at a time, and I have no idea what command parameters you'd need to use to extend the number of results per page, or crawl further pages.

The call through the Lynx browser can be put into a bash script with replaceable parameters such that one can call the script with whatever keyword phrase one wants to look up without having to change the Lynx command line every single time.

This script can in turn be controlled by a PHP script which will run the phrase desired through the script which in turn will run them through Lynx.

And once PHP gets into the game there is no end to what I can do to get the data I need in whatever way I want.

All being accessed through Lynx. Just a little ol browser accessing Google through the interface they have set up to be accessible through any other HTTP protocol user agent.

I am not accessing their interface through a script (at least not directly). I am accessing Google through the interface they have set up.

What I do with the data returned to me after I so access them is up to me and Google has no legal or ethical right to tell me that I can't do this or that with it on my computer (assuming they even have a right to force me into a binding legal agreement with them just for visiting their very public site).

You may also run into an issue that if you do figure out how to surmount the previous problem that you could very well end up with a buffer overflow, unless the niche doesn't have too many results in it. "Black tuxedos" has ~156,000, but something with more even one order of magnitude more competition like that is going to be a major resource hog.

Buffer overflows would not even be a consideration. Each access through Lynx is a one shot deal using up no more memory, and in fact a lot less, than would be the case if I accessed Google through Firefox for example. Google would not return 156,000 web pages or otherwise. It would just return one HTML page at a time from which I could extract the data I want. One page at a time. And I would throttle the script controlling the Lynx call to not call Google too often.

Now I know this seems like a big fat hassle to just access Google but as I said my aim is to reduce the time it takes me to do things "manually" while not violating their TOS. Access through Lynx seems like just what I need.

If I wasn't at all interested in doing what was right there are much easier and more efficient ways of accessing Google under PHP than doing so through a call to a script which in turn makes calls to Lynx which in turn does the accessing of Google.

Carlos

Thanks

{{ DiscussionBoard.errors[1513420].message }}

Would this be a violation of Google's TOS?

Trending Topics

Newbie

How did you turn it all around?

A healthy fascination for Hollywood...

Any thoughts on golf's latest phenom...Scottie Scheffler??

best high quality crypto traffic.