Online XML sitemap generators that dont need to be installed on your own server

5 replies
Hey guys

I am in a bit of a pickle right now with my XML sitemap generation.

Background:

My website is over 400,000 pages and growing constantly every day. This is proving to be hard to keep up with when actually tryign to find out exactly how many pages there are. the site is built on code ignitor PHP framework and references to 5 DBs.

I have tried to create an xml sitemap via a script on my server that generates URLs from the DB. This is not working really as it breaks every now and then. Its too server intensive and kills the site. I have had my developers try to optimise this by creating delays in the DB queries etc and what not.. there are just too many pages to index.

I did a few google searches and found xml-sitemaps.com which is also a script that is installed on the server to do pretty much the same thing, only it doesnt access the DB tables but actually crawls all the pages. This is also not really working as it breaks all the time and never gets to complete the full sitemap generation. By introducing delays in the indexing, it will take more than 5 days to index everything. But I am not able to get it to work continously. It breaks every few hours and I have to restart.

Anyways, I am looking into alternative solutions where I can get an external sitemap generation tool to do all of this for me. This way, I don't have to worry about an extra load on the server resources. I would like to have some control over the crawling like introducing delays after indexing every 10 pages so that the site speed is not affected.

Any ideas where I can find such a tool / service?
#generators #installed #online #server #sitemap #xml
  • Profile picture of the author mogulmedia
    Originally Posted by alitech View Post


    My website is over 400,000 pages and growing constantly every day.
    Holy cow Batman! You've been busy!

    Is there no way you can set your script up to run a certain number of pages at a time so it doesn't timeout?

    Surely if your site is that big though, Google will find it impossible not to notice all the pages as long as you have a good internal link structure?

    Apart from that I can't help, sorry!
    Signature

    Converting sales copy and professional press releases -> Here <-

    {{ DiscussionBoard.errors[2470072].message }}
    • Profile picture of the author Crew Chief
      Originally Posted by alitech View Post

      My website is over 400,000 pages and growing constantly every day. This is proving to be hard to keep up with when actually tryign to find out exactly how many pages there are.

      I have tried to create an xml sitemap via a script on my server that generates URLs from the DB. This is not working really as it breaks every now and then. Its too server intensive and kills the site. I have had my developers try to optimise this by creating delays in the DB queries etc and what not.. there are just too many pages to index.
      With 400k pages, and based on the issues you are having at the moment, sounds to me that you have a "Site Structure" issue that very few if any software applications are going to magically solve.

      Unfortunately, you really need to look at making [major] adjustments to your site's architecture.

      Think about it, if your site is set up to parse and then render 400k+ pages every time a BOT, spider or person calls up your sitemap, you would essentially need a redundant array of super servers to meet such a demand.

      Giles, the Crew Chief
      Signature
      Tools, Strategies and Tactics Used By Savvy Internet Marketers and SEO Pros:

      ProSiteFlippers.com We Build Monetization Ready High-Value Virtual Properties
      {{ DiscussionBoard.errors[2471105].message }}
  • Profile picture of the author DogScout
    This has been used for sites up to 10 million pages with no issues by some people I know.
    Google has a limit as to how many pages it will index a day, (why they don't require 'a redundant array of super server' to crawl the site, but may take several weeks to index every page.
    Good luck

    http://code.google.com/p/googlesitemapgenerator/
    {{ DiscussionBoard.errors[2471136].message }}
    • Profile picture of the author alitech
      This google software seems to be good. Lots of detail on how to use it. Only the installation of it looks like a major piece of work for a Linux virgin. It is also installed on the server, which may not be the best solution though.
      Any other suggestions?
      {{ DiscussionBoard.errors[2471508].message }}
  • Profile picture of the author DogScout
    You could try coffee cup sitemap maker. It would probably take all night to do that many pages though. It is on your computer and uses your computer's resources, so may not be able to do much else while it runs unless you have a dual core or better. It is cheap. And I think you'd have to redo the entire map every time you added pages, it doesn't just add new pages (or didn't when I bought it) so I used mostly for static sites that were not being updated often.

    (Just looked and apparently it does do 'update only' mode so it doesn't have to crawl the entire site. No idea how it will hold up on a site as large as yours, the most I have used it on is a 800 page site.)

    Sitemapper - Better Search Engine Visibility in Just a Few Clicks. | CoffeeCup Software

    Might be worth a try? Has a free trial before buying, so you could test it to see if it will handle your size site.
    {{ DiscussionBoard.errors[2471613].message }}

Trending Topics