Contributing

Important

Read through the Scrapy Contribution Docs for tips relating to writing patches, reporting bugs, and coding style.

Issues and bugs

Report on GitHub.

Tests

Include tests in your pull requests.

To run unit tests:

pytest tests

To run integration tests:

printf "[scrapyd]\nusername = hello12345\npassword = 67890world\n" > scrapyd.conf
mkdir logs
scrapyd &
pytest integration_tests

Installation

To install an editable version for development, clone the repository, change to its directory, and run:

pip install -e .[test,docs]

Developer documentation

Configuration

Pass the config object to a class’ __init__ method, but don’t store it on the instance (#526).

Processes

Scrapyd starts Scrapy processes. It runs scrapy crawl in the launcher, and scrapy list in the schedule.json (to check the spider exists), addversion.json (to return the number of spiders) and listspiders.json (to return the names of spiders) webservices.

Environment variables

Scrapyd uses environment variables to communicate between the Scrapyd process and the Scrapy processes that it starts.

SCRAPY_PROJECT

The project to use. See scrapyd/runner.py.

SCRAPYD_EGG_VERSION

The version of the project, to be retrieved as an egg from eggstorage and activated.

SCRAPY_SETTINGS_MODULE

The Python path to the settings module of the project.

This is usually the module from the entry points of the egg, but can be the module from the [settings] section of a scrapy.cfg file. See scrapyd/environ.py.

Jobs

A pending job is a dict object (referred to as a “message”), accessible via an ISpiderQueue’s pop() or list() methods.

Note

The short-lived message returned by IPoller’s poll() method is also referred to as a “message”.

A running job is a ScrapyProcessProtocol object, accessible via Launcher.processes (a dict), in which each key is a slot’s number (an int).

  • Launcher has a finished attribute, which is an IJobStorage.

  • When the process ends, the callback fires. The Launcher service calls IJobStorage’s add() method, passing the ScrapyProcessProtocol as input.

A finished job is an object with the attributes project, spider, job, start_time and end_time, accessible via an IJobStorage’s list() or __iter__() methods.

Concept

ISpiderQueue

IPoller

ScrapyProcessProtocol

IJobStorage

Project

not specified

_project

project

project

Spider

name

_spider

spider

spider

Job ID

_job

_job

job

job

Egg version

_version

_version

Scrapy settings

settings

settings

args (-s k=v)

Spider arguments

remaining keys

remaining keys

args (-a k=v)

Environment variables

env

Process ID

pid

Start time

start_time

start_time

End time

end_time

end_time