Using Scrapy with Django

22 October, 2017
Python Django Scrapy

Refer to my stackoverflow answer here Stack Overflow

In here i have create a sample project which uses scrapy inside django. And uses Django models and ORM in the one of the pipelines.

https://github.com/bipul21/scrapy_django

The directory structure starts with your django project. In this case the the project name is django_project. Once inside the base project you create your scrapy project i.e. scrapy_project here

In your scrapy project settings add the following line to setup initialize django
import os
import sys
import django

sys.path.append(os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), ".."))
os.environ['DJANGO_SETTINGS_MODULE'] = 'django_project.settings'

django.setup()
In the pipeline i have made a simple query to Question Model
from questions.models import Questions

class ScrapyProjectPipeline(object):
    def process_item(self, item, spider):
        try:
            question = Questions.objects.get(identifier=item["identifier"])
            print "Question already exist"
            return item
        except Questions.DoesNotExist:
            pass

        question = Questions()
        question.identifier = item["identifier"]
        question.title = item["title"]
        question.url = item["url"]
        question.save()
        return item
You can check in the project for any further details like model schema.
Thank you for taking the time to read this post. If you're considering using Digital Ocean, the hosting provider this blog is hosted on, please consider using this link to sign up.