This Moment IS My Life

There is a powerful yet calm invitation within my being calling me to live increasingly in the space of the present moment. I feel such deep gratitude for the wholeness of this very moment; for it’s…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Web Scraping with Scrapy and Splash

Using Scrapy framework and Splash to scrape Yescapa website

Have you ever considered renting a caravan for your next adventure? I have recently found myself drawn to the idea of caravan renting, and with 2023 in full swing, I can’t help but wonder if this is still a smart investment.

It is an open-source fast web crawling and web scraping framework for Python. It includes built-in support for handling common web scraping tasks such as handling cookies, user agents, and pagination. Additionally, it provides a built-in mechanism for handling web page parsing using XPath and CSS selectors.

There are several steps before crawling the website.

1) Docker Installation and running Splash

This command should fetch the browser on localhost:8050.

2) Install libraries and create a Scrapy project

Once Scrapy is installed, start a project by typing the following command:

Your project will look something like this:

3) Splash configuration on the Scrapy project

Inside your project folder, open the settings.py file and add the following lines:

Scrapy works with spiders, which are classes that Scrapy uses to scrape information from a website. Start by creating a Python file inside the spiders’ folder. Mine is called vanspider.py.

Let’s start by creating our Spider class.

I have three main functions inside my class.

One good way to know which class or element to scrape is by using the shell:

You’ll get something like this:

Image by Author

Fetch the desired URL:

Now use the XPath or CSS selectors to extract the elements.

In the example above, I use the command line to ensure I tackle the right element before using it on the script. In this case, I get the dates of the reviews for each motorhome.

Image from Yescapa Website (edited by author)

Finally, once your script is finished, you can crawl the website and save the output into a CSV, by using the following command on your terminal:

As you can see to run this command, Scrapy takes the name of the class (van).

If you enjoyed the reading, don’t forget to support me:

Add a comment

Related posts:

Bibit Mangga Okyong Tanaman Buah Okulasi Madiun Terlaris

bibit mangga okyong tanaman buah okulasi Bibit Tanaman Buah Mangga Okyong Okulasi. Tinggi: +/- 50–70 cm Asal bibit : okulasi Pernah dengar nama mangga okyong ? Mangga okyong tidak lain mangga…

I. Introduction

Indoor cats have a more sedentary lifestyle than outdoor cats, and without regular stimulation, they can become bored and develop behavioral problems. Toys and activities provide a way for indoor…

Gratitude

I used to ask this question very often to folks. Not in a professional environment though, but when having casual or informal convos I tend to throw this question quite frequently. The answers I get…