The hosting problem you'd like to have.

by HN Banned
8 replies
A few days ago I had to upgrade my hosting plan. I had a $10 hosting plan, so I was enjoying a very good ROI until my sites started to exceed the CPU usage limits. The problem was caused by Googlebot which was constantly crawling my site at 4 pages per second, in other words about 350,000 pages per day. So I moved it to a private server. That site has only 5,000 page views a day, so compared to googlebot activity it's nothing. That's why I was hosting it at shared host and it also was a challenge to optimize the site's engine so that it would process the page as fast as possible. It was processing database queries between 0.0004 ... 0.002 seconds.

My options were either to make the queries even faster, limit the crawl rate to 2 requests per second or upgrade the hosting.

I wonder what would you have done?
#hosting #problem
  • Profile picture of the author DireStraits
    The 2 pages per second limit is the very highest setting in Webmaster Tools, isn't it? You can take it way lower than that, and probably should unless your content changes constantly?
    {{ DiscussionBoard.errors[9051494].message }}
    • Profile picture of the author HN
      Banned
      Originally Posted by DireStraits View Post

      The 2 pages per second limit is the very highest setting in Webmaster Tools, isn't it? You can take it way lower than that, and probably should unless your content changes constantly?
      The default is probably 2.
      For two of my sites I can set the limit to 10 pages per second. All the other sites have max crawl rate of 2 pages per second like you pointed out. It's not that the content changes constantly, it's the new pages that are being crawled. My indexed page count went up from 1.8 million on 2/2/2014 to 5.2 million on 3/16/2014
      It's the first million pages that took the longest, probably a year to get indexed, now they are adding half a million per week.
      I have no idea how many pages my flagship site has, probably 60 million, so it will take two more years before I start getting real traffic.
      {{ DiscussionBoard.errors[9051530].message }}
      • Originally Posted by HN View Post

        The default is probably 2.
        For two of my sites I can set the limit to 10 pages per second. All the other sites have max crawl rate of 2 pages per second like you pointed out. It's not that the content changes constantly, it's the new pages that are being crawled. My indexed page count went up from 1.8 million on 2/2/2014 to 5.2 million on 3/16/2014
        It's the first million pages that took the longest, probably a year to get indexed, now they are adding half a million per week.
        I have no idea how many pages my flagship site has, probably 60 million, so it will take two more years before I start getting real traffic.
        What kind of site are you running!?
        {{ DiscussionBoard.errors[9051608].message }}
      • Profile picture of the author DireStraits
        Originally Posted by HN View Post

        It's the first million pages that took the longest, probably a year to get indexed, now they are adding half a million per week.
        [...]
        I have no idea how many pages my flagship site has, probably 60 million, so it will take two more years before I start getting real traffic.
        Aha - no more need be said. What you've got there is what I'd call a bloody great whale of a site. Different "rules". :p
        {{ DiscussionBoard.errors[9051654].message }}
        • Profile picture of the author HN
          Banned
          Originally Posted by Michael Levanduski View Post

          What kind of site are you running!?
          Can't really say. But it's an autopilot site. Took a day to set up. Stumbled upon the idea accidentally.
          Before that I was doing it old fashioned way, either 1. writing myself or 2. buying content. There are two more ways to do business. 3. having others produce content for free, 4. having others produce it and pay you for the privilege of working for you. That one is pretty cool, huh?

          Originally Posted by DireStraits View Post

          Aha - no more need be said. What you've got there is what I'd call a bloody great whale of a site. Different "rules". :p
          Yeah, I have surpassed the ehow.com by indexed page count 5 mil vs. their 4 million pages, but they probably make 1000 times more money.
          Youtube.com with more than 5 billion pages is out of the reach for now, but half a billion should be doable. (That's my next goal)
          {{ DiscussionBoard.errors[9051721].message }}
  • Profile picture of the author David Beroff
    A few hits a second seems reasonable. My host (a reseller of Rackspace Cloud Sites) charges $5/mo for starters, but the Cloud dynamically grows to adjust to the correct level to handle such traffic. Obviously the cost grows, as well, but that's likely not as much of an issue for you.

    And yes, database optimization is also key.

    Let me know if you have any questions.
    Signature
    Put MY voice on YOUR video: AwesomeAmericanAudio.com
    {{ DiscussionBoard.errors[9051822].message }}
  • Profile picture of the author bgr
    You need to have a host that installs something like Varnish. That will allow you to serve much more traffic then you are now. You also need to tune Apache and set up the swap correctly so you can handle the traffic. After I installed varnish and tuned Apache with a swappiness setting of less then 10 for my swap chache. I have seen 100 requests a second on my server and I never go over 20% cpu.

    If you have any real traffic then hosting accounts are a waste of time. Best advice is setup a vpn at digital ocean. For $10 month you can probably handle 10-30k unique's a day from real traffic.Bots like googlebot are pretty low overhead if your server is setup correctly. If you had a proper cache you overhead would drop a lot.

    The downside You will either have to learn how to run a server or pay someone to help you. If you want to pay someone Rackspace is good or look for a managed VPN somewhere. You also need at least 1gb of ram as well.
    {{ DiscussionBoard.errors[9051919].message }}
    • Profile picture of the author David Beroff
      Originally Posted by bgr View Post

      The downside [of VPN's is that] You will either have to learn how to run a server or pay someone to help you. If you want to pay someone Rackspace is good or look for a managed VPN somewhere.
      Agreed, and managing a server is a lot of responsibility. (I used to do that years ago, but no more.) That's one of the reasons why I recommend Rackspace's Cloud Sites (link above), because server management is included in that $5/mo rate.
      Signature
      Put MY voice on YOUR video: AwesomeAmericanAudio.com
      {{ DiscussionBoard.errors[9052179].message }}

Trending Topics