There are times when you need to run lengthy operations for hours. Slow internet connections can make this task even more time-consuming. You might consider buying a VPS for a whole month just for a single scraping task, and although you might receive money back or use a free trial initially, eventually, you'll need to make a purchase.
Fortunately, there's a cost-effective solution provided by Google that's accessible to every user. In this post, we'll guide you through setting it up, creating a scraping environment, uploading your code, and running it. So, grab a cup of coffee and let's embark on this journey.
-
Requirements:
All you need for this is a basic understanding of Linux commands and a Google account.
-
Getting Started:
To begin, simply visit this Google Shell link and wait for the session to load. Google grants you a root shell account with 16GB RAM, 4 cores CPU and 4GB storage (which can be linked to Google Drive) and a persistent tmux session that remains open as long as your browser window is active. While knowledge of tmux browsing and shortcuts is recommended, it's not mandatory.
From there, you can create a virtual environment to install all the Python packages you require.
To upload your script, we'll use a web service called "transfer.sh" and the "curl" tool. First, compress your script folder into a single zip file and upload it online using the following commands on your local machine:
zip -r script.zip script_folder
curl --upload-file https://transfer.sh/script.zip
The command will return a URL for the uploaded "script.zip" file, which you can download into the Google Shell using this command:
wget https://transfer.sh/vlkmfv/script.zip
Afterward, simply unzip the file with this command:
unzip script.zip
With these steps completed, your scraper folder should be ready. Just navigate to it, install your packages, and you're all set to launch your scraping tasks.
-
Conclusion:
Google provides a high-speed internet connection, a 4-core CPU, and 16GB of RAM, making it suitable for a wide range of web scraping tasks, from simple Python requests to browser automation with tools like Selenium or Playwright. This is particularly valuable for scrapers requiring these capabilities without overloading their own work environments.
However, there are some downsides to using this service. The storage space is limited to 4GB, which can be problematic for scraping large files or images. Additionally, you must keep the browser window with the session webpage open at all times, or you risk losing your scraping progress. Despite these limitations, Google's free VPS is an excellent choice for one-time scraping tasks where speed is crucial.