We are happy that we reached 900.000 indexed proxies but at the same time we are sad to announce that we had a small bug in our API that was showing incorrect data for active proxies. This was because of an API bug that caused the whole thing.
We don’t want to mislead our clients, nor to pursue for new customers using marketing tricks. Skyul is here to provide a fresh proxy list daily.
Because of the limitation in our API, we delivered only 10.000 active proxies at once instead of delivering them all. Because of that, probably we have been missing some of the active proxies in the past month. This wasn’t done on purpose, it was just an API limitation that remained in our codes since the first versions and we didn’t figured out until now.
Another feature that was changed in our proxy checker is that we changed the connection time for each ip from 5 seconds to 4 seconds, as a small tweak to scan list even faster. This feature isn’t the same as the proxy connection timeout.
Sorry for this little inconvenience, new proxies are here to arrive, without a limit!
When you want to start web scraping it’s always indicated that you use proxies to protect your host from getting mislead results or to get banned from the source you are trying to scrape.
Web crawling it’s not that hard to do, but it requires some extra steps to make sure that you don’t get bad results.
An example of a simple web crawler:
#plain and simple web scraper
$str = file_get_contents('http://example.com');
As I mentioned earlier it’s not enough, I guess you will want a long term solution. First of all you will have to make sure that you find something constant in the website, to ensure that you always fetch the full page.
My first suggestion when scraping a page is to add a retry option and search for that constant you found in the website.
$retries = 4;
for($int = 0; $int < $retries; $int++)
$str = file_get_contentst('http://example.com');
if( substr($str, '<constant valued>') === true)
// when scraped value was found break the for
In this example we just wanted to demonstrate how easy it is to do basic web scraping in a website and a quick method to improve the results. In the next post we will be adding proxy support and curl for our web crawler.