Spoofing your Scrapy bot IP using tsocks

It is well known that many websites show different content depending on the region where they’re accessed. For example, some retailer sites show products available only for the region (US, Europe) of the user accessing the site.

Although this can be quite convenient for the website customers, it can be a pain for developers writing a spider for the site and running it from their local machines.

There is a simple way to proxy all requests as if they came from another server. You only need SSH access to this other server, no need to install any HTTP proxy. For this, you can use a program called tsocks.

Here’s how to do it in Ubuntu, though this recipe should be easy to extended to other Linux distros.

First, install tsocks with:

$ apt-get install tsocks

Then add this content to ~/.tsocksrc (update: recent versions settings are stored at ~/.tsocks.conf, but it may vary across distributions):

server = 127.0.0.1
server_type = 5
server_port = 9999

Next, SSH to the remote server you want to use:

$ ssh -D 9999 some_remote_server

And finally, in another terminal (without closing the SSH console), just run Scrapy by prefixing it with the tsocks command, like this:

$ tsocks scrapy crawl myspider

That’s all. Your spider will run in your local machine but proxying all communication through the remote server. No need to change any settings or configuration.

July 07, 2017 In "Web Scraping" , "Scrapy" , "python" , "scrapy" , "web crawling" , "infinite scroll"
April 19, 2017 In "Releases" , "Scrapy" , "Scrapinghub" , "scrapy" , "Scrapy Cloud" , "scrapy cloud" , "deploy" , "github"
November 24, 2016 In "Web Scraping" , "price monitor" , "Scrapy" , "Web Data" , "Scrapinghub" , "scrapy" , "Scrapy Cloud" , "competitive intelligence"
Scrapy

Be the first to know. Gain insights. Make better decisions.

Use web data to do all this and more. We’ve been crawling the web since 2010 and can provide you with web data as a service.

Tell me more

Welcome

Here we blog about all things related to web scraping and web data.

If you want to learn more about how you can use web data in your company, check out our Data as a Services page for inspiration.

Learn More

Recent Posts