SEO: How do I get 1 million pages indexed?

11 replies
  • SEO
  • |
Hi all!
I have a very specific question regarding SEO. My translation website is massive because it has 14 million words in 570 languages and some 80 million translations (think URLs to index). So what is the best way to get this or parts of it indexed? The crawl budget will probably be very limiting.

I have the possibility to reduce the quantity of words because I have the word use frequency data, so basically I could just put the most used words in the sitemap. This should make it possible to get down to 1 million URLs. Still a lot...

Some of the languages have word definitions (currently 4 but will in the future be around 90), which means that some pages have a lot of text content and the others little.

[site info removed by moderator]

This means that some pages will have little content. How does that affect the indexing?
#indexed #million #pages #seo
  • Profile picture of the author MikeFriedman
    A couple of things.

    First, you can only have 50,000 URLs in a single sitemap.

    I would create multiple sitemaps based on some sort of category structure and then a sitemap or multiple sitemaps that contain those sitemaps.

    Second, you will want to make sure you have a very good site architecture. How that would look for a site like this, I'm not entirely sure off the top of my head. I would look at some competitors to see how they are doing it. Make sure you have lots of internal links throughout the site to make everything reachable.

    Lastly, if the purpose of the site is just to offer translations or definitions of words, I would strongly rethink this project before you get too deep into it. Google offers those answer right now directly in the SERPs, so the chance of getting search engine traffic is pretty low.
    {{ DiscussionBoard.errors[11714586].message }}
  • Profile picture of the author HenrikA
    Yes, I know about the 50k limit which goes for both the sitemap and the sitemap index files. Fortunately, Google accepts links to up to 500 sitemap index files which give the theoretical max of 1.25 Tera links.

    I'm way too deep into it already However, Google only has some 120 languages and I have 570.

    The structure of the site is basically

    <domain>/<languages>/<language:word>

    Where
    - languages is a comma-separated list of the user's languages. Example: `en,fr,de`
    - language:word is the language and the word looked up. Example: `en:hi`

    So looking up the English word run in French and Spanish would be:
    /en,fr,es/en:run

    That's it in terms of structure...
    {{ DiscussionBoard.errors[11714599].message }}
    • Profile picture of the author MikeFriedman
      Originally Posted by HenrikA View Post

      Yes, I know about the 50k limit which goes for both the sitemap and the sitemap index files. Fortunately, Google accepts links to up to 500 sitemap index files which give the theoretical max of 1.25 Tera links.

      I'm way too deep into it already However, Google only has some 120 languages and I have 570.

      The structure of the site is basically

      <domain>/<languages>/<language:word>

      Where
      - languages is a comma-separated list of the user's languages. Example: `en,fr,de`
      - language:word is the language and the word looked up. Example: `en:hi`

      So looking up the English word run in French and Spanish would be:
      /en,fr,es/en:run

      That's it in terms of structure...

      And I would imagine a lot of those 450 additional languages are spoken in places where the internet is not very accessible.

      When I mentioned the site structure, I'm mostly concerned with internal linking. That is largely what will determine whether things get crawled and indexed or not. If a lot of pages are 5-6 hops from the home page, they will be a lot less likely to be indexed.

      Anyhow, good luck.
      {{ DiscussionBoard.errors[11714705].message }}
  • Profile picture of the author valida
    Yes, same problem as you, Google is slow to include, only 1/5 per file.
    {{ DiscussionBoard.errors[11714974].message }}
  • Profile picture of the author Rahullsharma
    Some strategies:


    Try to Index 40-50k URL in Sitemap

    Google Webmaster Tools allows you to request an increased crawl rate. Try doing that if you haven't already.

    Take another look at your navigation architecture to see if you can't improve access to more of your content. Look at it from a user's perspective: If it's hard for a user to find a specific piece of information, it may be hard for search engines as well.

    Make sure you don't have duplicate content because of inconsistent URL parameters or improper use of slashes. By eliminating duplicate content, you cut down on the time Googlebot spends crawling something it has already indexed.

    Use related content links and in-site linking within your content whenever possible.
    Randomize some of your links. A sidebar with random internal content is a great pattern to use.
    {{ DiscussionBoard.errors[11715638].message }}
  • Profile picture of the author HenrikA
    Thanks for the advice!

    The navigation architecture is search-based. You basically search and select a word to open its page. Very similar to WordReference and GTranslate.

    Each word page (one per language translated into) has internal links to all its translations and synonyms.

    I can put in place a word index for each language, but already that is a lot of words: i.e. I have some 600,000 English words, and in the index, I present them in a folder-based tree structure with max 100 words per page giving an index with 600 pages (a lot less for other smaller languages). That leads to 1-5 hops to any of the word pages.

    Would that be helpful?
    {{ DiscussionBoard.errors[11715685].message }}
  • Profile picture of the author HenrikA
    A related question:

    Do I have to put a link in the sitemap for every page (word) I want Google to index? Or is it enough to put the above-mentioned index pages and Google would happily hop around and collect it all?
    {{ DiscussionBoard.errors[11715688].message }}
  • Profile picture of the author ZephyrIon
    Before you waste your time. Just because you upload the site maps does not mean they will be indexed.

    Do you mean these pages you're making or auto generated they will have much of the same content.

    Because the search engine crawled your page that you uploaded in a site map. Also does not mean that it will be indexed.

    Read more here https://developers.google.com/search...w-search-works
    Signature

    Buy new book and be entered to win $100 to your Zelle, PayPal or CashApp! Plus, check out the free sample on https://Amazon.com/dp/B0BPL5VQ34. Thank you for your support! Contest ends at 1,000 purchases.

    See the life of a real affiliate on Instagram https://instagram.com/ckrecicki

    {{ DiscussionBoard.errors[11717582].message }}
  • I have tried to promote such types of websites...

    It's not working. If you don't have unique content with 500+ words per page, and all pages totally unique. Then your content not be included in index. Definitions can`t be unique content because there are a lot of Dictionary websites with such content.
    {{ DiscussionBoard.errors[11720520].message }}
  • Profile picture of the author doppcall
    Same problem facing for last couple of month...
    {{ DiscussionBoard.errors[11731793].message }}
  • Profile picture of the author JamesMan
    try submitting your site map and schema.
    {{ DiscussionBoard.errors[11732128].message }}

Trending Topics