Table of Contents

What Is Scrapy

Scrapy is an application framework which will act like a web crawler that mainly used to extract the data from the website. Today, our topic is very much bound to explore about Scrapy hence we’re going to implement web scrapping in Python using Scrapy in our project.
This blog will hopefully cover the following topics :

How To Install Scrapy
Create A Scrapy Project
Export Scraped Data As CSV

Scrappy will only run on python 2.7 and python 3.4 or run above. If you’re using Anaconda, you can install the package from the conda-forge channel packages on Linux, Windows and OS X.

How To Install Scrapy:

You can install scrappy either using conda or if you’re familiar with the installation of Python packages, you can install Scrapy and its dependencies from PyPI itself.

Install Scrappy Using Anaconda

conda install -c conda-forge scrapy

Install Scrapy Using PyPI

pip install Scrapy

Install Scrapy On Ubuntu 14.04 Above

Ubuntu 14.04 and above, If you install scrapy on Ubuntu systems, you need to install these dependencies:

sudo apt-get install python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev

Install Scrapy On Python

If you want to install Scrapy on Python 3, you’ll also need Python 3 development headers:

sudo apt-get install python3 python3-dev

Inside a virtualenv, you can install Scrapy with pip :

pip install scrapy

Create A Scrapy Project

Before you start scrapping, we need to create our scrappy project. Now, switch to the desired directory where we should run the scrapy project.

scrapy startproject project_name

This will create the following directory structure:

project_name/
scrapy.cfg         # deploy configuration file
project_name/          # project's Python module, you'll import your code from here
    __init__.py
    items.py       # project items definition file
    middlewares.py # project middlewares file
    pipelines.py   # project pipelines file
    settings.py    # project settings file
    spiders/       # a directory where you'll later put your spiders
        __init__.py

The two most important files we should consider are:
settings.py – This file will hold all the settings you have set for your project.
spiders/ – This folder will store all your custom spiders used in the project.

Related : Introduction To Web Scraping With Node JS

Create A Scrapy Spider :

Spiders are the classes which you define and that Scrapy uses to scrape information from a website (or a group of websites).
Here’s the code for a spider that scrapes famous quotes from website http://quotes.toscrape.com, following the pagination:

import scrapy
class QuotesSpider(scrapy.Spider):
name = 'quotes'
start_urls = [
    'http://quotes.toscrape.com/tag/humor/',
]
def parse(self, response):
    for quote in response.css('div.quote'):
        yield {
            'text': quote.css('span.text::text').get(),
            'author': quote.xpath('span/small/text()').get(),
        }
    next_page = response.css('li.next a::attr("href")').get()
    if next_page is not None:
        yield response.follow(next_page, self.parse)

The Spider subclasses scrapy.Spider and defines some attributes and methods:
Name: which indicates the spider, the name must be unique in the project and we can’t assign the same name to another file.
start_requests(): return our request in an iterative way so when the crawl begins then our request will be processed successively from the initial request to end.
parse(): This method is mainly called to handle our response in download, based on our “request.Response” method is an instance of TextResponse that holds the page content.
Other side, The parse() method will also parse the response and extract the crawled data as dicts & finds new URLs to follow and creating new requests (Request) from them.

How To Run Spider From Scrapy

To make your spider work, go to the project’s top level directory and run:

scrapy crawl quotes

This command will run the spider and generate following output,

... (omitted for brevity)
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Spider opened
2016-12-16 21:24:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-16 21:24:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-1.html
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-2.html
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Closing spider (finished)

Also Read: Writing a web crawler with Scrapy and Scrapinghub

Export Scraped Data As CSV :

We can still extract all the data in the command line but it is always good to export the scraped data in various formats like CSV, Excel, JSON, etc. This saves lots of our time and also can be imported into programs else wherever we want. To make this process even easier, Scrapy provides the functions called “nifty” which allows you to export the downloaded content in various formats.
To do that, just add the following code block in settings.py file:

#Export as CSV Feed
FEED_FORMAT = "csv"
FEED_URI = "your csv name.csv"

That’s all guys! we have successfully exported the data as CSV. Now we know to implement web Scraping Using Scrapy.

0Likes

Implementing Web Scraping In Python Using Scrapy

What Is Scrapy

How To Install Scrapy:

Install Scrappy Using Anaconda

Install Scrapy Using PyPI

Install Scrapy On Ubuntu 14.04 Above

Install Scrapy On Python

Create A Scrapy Project

Related : Introduction To Web Scraping With Node JS

Create A Scrapy Spider :

How To Run Spider From Scrapy

Also Read: Writing a web crawler with Scrapy and Scrapinghub

Export Scraped Data As CSV :

Allan Watts

Implementing Web Scraping In Python Using Scrapy

What Is Scrapy

How To Install Scrapy:

Install Scrappy Using Anaconda

Install Scrapy Using PyPI

Install Scrapy On Ubuntu 14.04 Above

Install Scrapy On Python

Create A Scrapy Project

Related : Introduction To Web Scraping With Node JS

Create A Scrapy Spider :

How To Run Spider From Scrapy

Also Read: Writing a web crawler with Scrapy and Scrapinghub

Export Scraped Data As CSV :

Allan Watts

Related Blogs

How to Fix ‘A JavaScript Error Occurred in the Main Process’ Error in Discord?

Black Friday SaaS Deals: A Comprehensive SaaS List for 2020

Top 10 PHP Frameworks To Rule Web In 2021

React Concurrent Mode – Everything You Should Know

A Celebration Of Success: Agira's 4th Year Anniversary

What is Agile Development and Its Best Practices