what factors should i consider in choosing proxies?

3 replies
  • OFF TOPIC
  • |
i'm planning to have a PBN and still searching of what good proxies to use?
what factors should i consider in choosing proxies?

thanks,
Cat
#proxies
  • Profile picture of the author Yair Ida
    Hi Cat, here is an overlook about the cost and benefits of proxy options.

    Questions you must ask to understand your proxy service costs, and why looking at list price vastly underestimates costs.


    A major part of the cost of web data harvesting is the cost of a proxy service — required for sending your requests through multiple IPs for anonymity. Looking at their list price alone vastly underestimates their costs.

    Let's say you are comparing 2 proxy services, one is priced at $1000/month and another at $3000/month. You go for the $1000 bargain, only to find that your fail rate is 50%, requiring you to allocate a full-time developer for IP rotation.

    Furthermore, you find that in 30% of you requests the website is sending you back misleading data (i.e. your deception rate is 30%), which reduces your products' profitability. These cost you thousands — so much for the bargain…

    To make better decisions you should be looking at TCO (Total Cost of Operation). TCO is much harder to evaluate, so to help you understand TCO we've provided the 6 key questions you must ask when comparing proxy services and data harvesting solutions. We've also provided a quick reference table comparing your alternatives.

    Proxy Service Alternatives

    The primary approaches for setting up web data harvesting are:

    Signing up to a data harvesting service

    Web data harvesting services, or web scraping services, provide the software and the underlying infrastructure for data collection. They may include proxy servers, data management and structuring, and a management layer for geo-location and IP rotation. Consider signing up to a data harvesting service if your project is simple, not planned to scale much, or if you lack the knowhow for developing data harvesting software.

    Developing the software and licensing the infrastructure

    In this case you would develop the data harvesting software yourself, either in-house, or by outsourcing, and would license the infrastructure that will allow you to route your requests through IPs in the target geo-locations.

    Infrastructure licensing breaks down into 3 alternatives:

    Renting cloud infrastructure:

    This includes services that offer cloud infrastructure in various locations around the world and allow you to route your requests through these locations. Geographical distribution is typically limited. For example: the DigitalOcean cloud infrastructure service is available in 7 cities as of Dec 1st 2015.

    Using a traditional proxy service:

    Traditional proxy services may provide thousands of IPs in multiple geo-locations as well as a management layer for IP rotation, IP allocation, and geo-location selection.

    Because proxy IPs are widely known, they are frequently blocked by websites, or fed with misleading data meant to jeopardize your research. This problem pertains to all above approaches including cloud infrastructure and data harvesting services, because they all use identifiable IPs.

    Using a peer-to-peer proxy network:

    P2P proxy networks are very large networks of residential IPs. Unlike the other alternatives, the IPs are identified as personal and unlikely to be blocked or deceived. P2P networks can offer millions of exit nodes in every city in the world. A vendor currently offering this approach is Luminati, by Hola.

    The 6 Questions You Must Ask to Understand Your TCO

    What is the cost of resources for integration?

    Based on the expected integration complexity you should evaluate the total cost of developers, project managers, IT and QA for the integration phase.

    Consider whether you should develop the data harvesting software in-house or license it as a service. Developing in-house increases your integration costs, and reduces your ongoing costs. Review the API: services offering a convenient API will reduce your integration time and costs.

    What fail rate can I expect?

    Fail rate is the percentage of requests blocked by the website you are researching. Fail rate has the most dramatic impact on your TCO and is the single most underestimated TCO component. Once an IP is blocked you must switch (rotate) to a new IP. the process of IP rotation is resource-intensive and may require one or more developers at full capacity. Increased fail rates can bump up your TCO by thousands of dollars each month.

    Because traditional proxies and data harvesting services use known IPs, they are identified by the website and easily blocked, while a P2P proxy network uses residential IPs, so its fail rate is minimal.

    Fail rate is also impacted by the type of data you collect: harvesting a social or ecommerce website will produce higher fail rates than harvesting a weather server. Using a high fail-rate solution and harvesting high fail-rate data will result in catastrophic budgets. Also bear in mind that fail rate impacts your project timeline and data freshness.

    What deception rate can I expect?

    Deception rate is the percentage of misleading data you are getting during your data harvesting. Because proxy IPs are easy to identify, websites that are interested in jeopardizing your data harvesting will reply to your requests with misleading data. This can affect your profitability and damage your brand. This TCO risk applies to any solution that uses known IPs.

    For example: a distributor monitors its retailers to make sure that its premium brand products are not being offered under minimum price. Some of the affiliate sites are actually violating their agreements and selling below minimum price. However, when they are monitored by the distributor they identify the proxy IP and send back misleading prices that are legitimate. As a result the distributor’s profitability is reduced and furthermore - brand positioning is damaged.

    Will I be receiving clean and structured data?

    Data harvesting services provide “cookie cutter” software which does not fit all types of websites. When using these services expect to get redundant and unstructured data and prepare to allocate resources for data cleansing and structuring.

    For long term projects consider developing your software in-house to avoid recurring data cleansing work. For short term projects this may not be worth your investment.

    Consider a data harvesting service for simple websites. For complex projects develop your own, custom software so that you get the data precisely in the format you want it.

    How fast can I scale my operation up and down?

    Data harvesting often scales in both directions. For example: price monitoring activity can fluctuate significantly by season, so make sure you understand how long it will take your vendor to scale its service up or down.

    When you need to scale up and increase the number of IPs it might take your vendor a few hours to several days to configure and integrate new servers. This extends your project timeline and, as a result, your time to revenue. Furthermore, your staff is already allocated to the project and will be put on hold until the scaling process is completed.

    On the other hand, when you need to scale down and being delayed by your vendor, you are paying for resources you do not need.

    Overall, every hour of scaling delays impacts your cost.

    Does the price plan allow me to pay for what I actually use?

    If you’ve signed up to a pricing tier that allows you 1 million requests/month, and used only 500K requests out of the 1 million, you’ve wasted money. Look for a flexible pricing model that allows you to scale up and down according to your actual volume.
    If you expect your traffic to be dynamic you should prefer pricing models that offer unlimited activity, or pay-as-you-go models that allow you to pay for the actual traffic you have used.

    TCO Compared


    Use this table as a quick reference to compare the alternative TCOs. It is indicative, with the comparative costs rated high, medium or low.

    Conclusion

    While there are many considerations when comparing your data harvesting tools, hopefully this article was useful in helping you understand TCO, which we found to be the most critical yet vastly underestimated topic for companies involved in web data collection. Asking these 6 questions during your evaluation is the first important step in making better decisions and controlling your budget.

    Luminati
    {{ DiscussionBoard.errors[10959183].message }}
  • Profile picture of the author yukon
    Banned
    Originally Posted by catzornitta View Post

    i'm planning to have a PBN and still searching of what good proxies to use?
    what factors should i consider in choosing proxies?

    thanks,
    Cat


    That doesn't make sense.

    Why do you care about proxies just because you're building a PBN? Who are you hiding from?
    {{ DiscussionBoard.errors[10959242].message }}
    • Profile picture of the author MikeFriedman
      Originally Posted by yukon View Post

      That doesn't make sense.

      Why do you care about proxies just because you're building a PBN? Who are you hiding from?

      Exactly. You do not need proxies to build a private network.
      {{ DiscussionBoard.errors[10959337].message }}

Trending Topics