Crawling all css files on the internet

by donza
9 replies
Hi,

I have come up with an business idea and the only way to implement it cost effectively would be to crawl sites' css files to find certain information, then return a list of sites containing this information. So is there a search engine that could do this?

Cheers Don
#crawling #css #files #internet
  • Profile picture of the author Brandon Tanner
    Don't know of any search engines that crawl CSS files, but you could create a scraper that crawls specified websites and returns the info from each site's CSS files, then parse that info however you want.

    Although if you want to crawl ALL of the CSS files on the internet, you'd probably need at least a couple hundred servers, an enterprise-level ISP with a really fast connection, and about 10 years of time. :p
    Signature

    {{ DiscussionBoard.errors[7958095].message }}
    • Profile picture of the author donza
      Originally Posted by Brandon Tanner View Post

      Don't know of any search engines that crawl CSS files, but you could create a scraper that crawls specified websites and returns the info from each site's CSS files, then parse that info however you want.

      Although if you want to crawl ALL of the CSS files on the internet, you'd probably need at least a couple hundred servers, an enterprise-level ISP with a really fast connection, and about 10 years of time. :p
      Hi,

      I've just made an offer to google for few hundred gillion so I can implement the idea. I've actually got Larry on hold while I'm writing this

      There is a site where you can search all the html files on the net, for a very cheap price , so if you can do that I can't see why css files would be anymore difficult. I suppose there is just no demand for a css search engine.

      As for searching a selected subset of sites that's a possibility it would probably cut it down to a few million. But the truth is I was hoping there would be an "off the shelf" solution as it isn't a big business opportunity. A site, based on the information might ,at best, generate 500-1000 dollars a week in revenue.

      I came up with the idea as it will solve a personal problem that has cost me a hundred fifty bucks and probably a hundred frustrating hours of my time. I also know there are thousands of other people out there just like me. A site, based on the information scraped from the css files, would solve our problem

      Cheers Don
      {{ DiscussionBoard.errors[7958276].message }}
      • Profile picture of the author Brandon Tanner
        Originally Posted by donza View Post

        There is a site where you can search all the html files on the net, for a very cheap price
        That sounds about as believable as the Nigerian emails I get that say I just inherited millions of dollars from a long-lost uncle I didn't even know I had.

        Just out of curiosity though, what's the URL?
        Signature

        {{ DiscussionBoard.errors[7958560].message }}
        • Profile picture of the author donza
          Originally Posted by Brandon Tanner View Post

          That sounds about as believable as the Nigerian emails I get that say I just inherited millions of dollars from a long-lost uncle I didn't even know I had.

          Just out of curiosity though, what's the URL?
          Globalogiq.com I never tried it out but there is a free trail.
          {{ DiscussionBoard.errors[7959220].message }}
          • Profile picture of the author Brandon Tanner
            Originally Posted by donza View Post

            Globalogiq.com I never tried it out but there is a free trail.
            At $14 per 50,000 results, "all the HTML files on the net" (which is estimated to be over 9 billion webpages now) would cost you somewhere in the neighborhood of $2.5 million dollars.
            Signature

            {{ DiscussionBoard.errors[7959275].message }}
    • Profile picture of the author mojojuju
      Originally Posted by Brandon Tanner View Post


      Although if you want to crawl ALL of the CSS files on the internet, you'd probably need at least a couple hundred servers, an enterprise-level ISP with a really fast connection, and about 10 years of time. :p
      And a garage. Every great IT company started in a garage.
      Signature

      :)

      {{ DiscussionBoard.errors[7958716].message }}
  • Profile picture of the author OmGz
    Google. Use their api, or crawl it the old way by regex'ing search results. They have file filters and all available.
    {{ DiscussionBoard.errors[7958773].message }}
  • Profile picture of the author freeadstime
    We believe Google is already indexing CSS files and they use some portion of them to understand what color is the background or the contents in the page. But they just don't show results from them, since users want to see English and not Codes.
    {{ DiscussionBoard.errors[7959062].message }}
  • Profile picture of the author Andrew H
    This works fairly well.

    Add this to your google search query:

    Code:
    filetype:css
    Source, which is coincidentaly the first google result for 'google search through css files':
    How to search CSS / JavaScript files with Google? - Web Applications Stack Exchange

    Although, if you can't type your questions into google I am unsure of how much success you will have with more complicated functions such as this though.
    Signature
    "You shouldn't come here and set yourself up as the resident wizard of oz."
    {{ DiscussionBoard.errors[7959158].message }}

Trending Topics