API

If basic authentication is enabled, you can use curl’s -u option in the examples below, for example:

curl -u yourusername:yourpassword http://localhost:6800/daemonstatus.json

daemonstatus.json

Added in version 1.2.0.

To check the load status of a service.

Supported request methods

GET

Example:

$ curl http://localhost:6800/daemonstatus.json
{"node_name": "mynodename", "status": "ok", "pending": 0, "running": 0, "finished": 0}

addversion.json

Add a version to a project in eggstorage, creating the project if needed.

Supported request methods

POST

Parameters
project (required)

the project name

version (required)

the project version

Scrapyd uses the packaging Version to interpret the version numbers you provide.

egg (required)

a Python egg containing the project’s code

The egg must set an entry point to its Scrapy settings. For example, with a setup.py file:

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = projectname.settings']},
)

Do this easily with the scrapyd-deploy command from the scrapyd-client package.

Example:

$ curl http://localhost:6800/addversion.json -F project=myproject -F version=r23 -F egg=@myproject.egg
{"node_name": "mynodename", "status": "ok", "spiders": 3}

schedule.json

Schedule a job. (A job is a Scrapy crawl.)

If the logs_dir setting is set, log files are written to {logs_dir}/{project}/{spider}/{jobid}.log. Set the jobid parameter to configure the basename of the log file.

Important

Like Scrapy’s scrapy.Spider class, spiders should allow an arbitrary number of keyword arguments in their __init__ method, because Scrapyd sets internally-generated spider arguments when starting crawls.

Supported request methods

POST

Parameters
project (required)

the project name

spider (required)

the spider name

_version

the project version (the latest project version by default)

jobid

the job’s ID (a hexadecimal UUID v1 by default)

priority

the job’s priority in the project’s spider queue (0 by default, higher number, higher priority)

setting

a Scrapy setting

For example, using DOWNLOAD_DELAY:

curl http://localhost:6800/schedule.json -d setting=DOWNLOAD_DELAY=2 -d project=myproject -d spider=somespider
Any other parameter

a spider argument

For example, using arg1:

curl http://localhost:6800/schedule.json -d arg1=val1 -d project=myproject -d spider=somespider

Warning

When such parameters are set multiple times, only the first value is sent to the spider.

To change this behavior, please open an issue.

Example:

$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider
{"node_name": "mynodename", "status": "ok", "jobid": "6487ec79947edab326d6db28a2d86511e8247444"}

status.json

Added in version 1.5.0.

Get the status of a job.

Supported request methods

GET

Parameters
job (required)

the job ID

project

the project name

Example:

$ curl http://localhost:6800/status.json?job=6487ec79947edab326d6db28a2d86511e8247444
{"node_name": "mynodename", "status": "ok", "currstate": "running"}

cancel.json

Cancel a job.

  • If the job is pending, it is removed from the project’s spider queue.

  • If the job is running, the process is sent a signal to terminate.

Supported request methods

POST

Parameters
project (required)

the project name

job (required)

the job ID

signal

the signal to send to the Scrapy process (BREAK by default on Windows and INT by default, otherwise)

Example:

$ curl http://localhost:6800/cancel.json -d project=myproject -d job=6487ec79947edab326d6db28a2d86511e8247444
{"node_name": "mynodename", "status": "ok", "prevstate": "running"}

listprojects.json

Get the projects.

Supported request methods

GET

Example:

$ curl http://localhost:6800/listprojects.json
{"node_name": "mynodename", "status": "ok", "projects": ["myproject", "otherproject"]}

listversions.json

Get the versions of a project in eggstorage, in order, with the latest version last.

Supported request methods

GET

Parameters
project (required)

the project name

Example:

$ curl http://localhost:6800/listversions.json?project=myproject
{"node_name": "mynodename", "status": "ok", "versions": ["r99", "r156"]}

listspiders.json

Get the spiders in a version of a project.

Note

If the project is configured via a scrapy.cfg file rather than uploaded via the addversion.json webservice, don’t set the version parameter.

Supported request methods

GET

Parameters
project (required)

the project name

_version

the project version (the latest project version by default)

Example:

$ curl http://localhost:6800/listspiders.json?project=myproject
{"node_name": "mynodename", "status": "ok", "spiders": ["spider1", "spider2", "spider3"]}

listjobs.json

Get the pending, running and finished jobs of a project.

  • Pending jobs are in spider queues.

  • Running jobs have Scrapy processes.

  • Finished jobs are in :ref:job storage<jobstorage>`.

    Note

    • The default jobstorage setting stores jobs in memory, such that jobs are lost when the Scrapyd process ends.

    • log_url is null in the response if logs_dir is disabled or the file doesn’t exist.

    • items_url is null in the response if items_dir is disabled or the file doesn’t exist.

Supported request methods

GET

Parameters
project

filter results by project name

Example:

$ curl http://localhost:6800/listjobs.json?project=myproject | python -m json.tool
{
    "node_name": "mynodename",
    "status": "ok",
    "pending": [
        {
            "id": "78391cc0fcaf11e1b0090800272a6d06",
            "project": "myproject",
            "spider": "spider1",
            "version": "0.1",
            "settings": {"DOWNLOAD_DELAY=2"},
            "args": {"arg1": "val1"},
        }
    ],
    "running": [
        {
            "id": "422e608f9f28cef127b3d5ef93fe9399",
            "project": "myproject",
            "spider": "spider2",
            "pid": 93956,
            "start_time": "2012-09-12 10:14:03.594664",
            "log_url": "/logs/myproject/spider3/2f16646cfcaf11e1b0090800272a6d06.log",
            "items_url": "/items/myproject/spider3/2f16646cfcaf11e1b0090800272a6d06.jl"
        }
    ],
    "finished": [
        {
            "id": "2f16646cfcaf11e1b0090800272a6d06",
            "project": "myproject",
            "spider": "spider3",
            "start_time": "2012-09-12 10:14:03.594664",
            "end_time": "2012-09-12 10:24:03.594664",
            "log_url": "/logs/myproject/spider3/2f16646cfcaf11e1b0090800272a6d06.log",
            "items_url": "/items/myproject/spider3/2f16646cfcaf11e1b0090800272a6d06.jl"
        }
    ]
}

delversion.json

Delete a version of a project from eggstorage. If no versions of the project remain, delete the project, too.

Supported request methods

POST

Parameters
project (required)

the project name

version (required)

the project version

Example:

$ curl http://localhost:6800/delversion.json -d project=myproject -d version=r99
{"node_name": "mynodename", "status": "ok"}

delproject.json

Delete a project and its versions from eggstorage.

Supported request methods

POST

Parameters
project (required)

the project name

Example:

$ curl http://localhost:6800/delproject.json -d project=myproject
{"node_name": "mynodename", "status": "ok"}