API¶
If basic authentication is enabled, you can use curl
’s -u
option in the examples below, for example:
curl -u yourusername:yourpassword http://localhost:6800/daemonstatus.json
daemonstatus.json¶
Added in version 1.2.0.
To check the load status of a service.
- Supported request methods
GET
Example:
$ curl http://localhost:6800/daemonstatus.json
{"node_name": "mynodename", "status": "ok", "pending": 0, "running": 0, "finished": 0}
addversion.json¶
Add a version to a project in eggstorage, creating the project if needed.
- Supported request methods
POST
- Parameters
project
(required)the project name
version
(required)the project version
Scrapyd uses the packaging Version to interpret the version numbers you provide.
egg
(required)a Python egg containing the project’s code
The egg must set an entry point to its Scrapy settings. For example, with a
setup.py
file:setup( name = 'project', version = '1.0', packages = find_packages(), entry_points = {'scrapy': ['settings = projectname.settings']}, )
Do this easily with the
scrapyd-deploy
command from the scrapyd-client package.
Example:
$ curl http://localhost:6800/addversion.json -F project=myproject -F version=r23 -F egg=@myproject.egg
{"node_name": "mynodename", "status": "ok", "spiders": 3}
schedule.json¶
Schedule a job. (A job is a Scrapy crawl.)
If the logs_dir setting is set, log files are written to {logs_dir}/{project}/{spider}/{jobid}.log
. Set the jobid
parameter to configure the basename of the log file.
Important
Like Scrapy’s scrapy.Spider
class, spiders should allow an arbitrary number of keyword arguments in their __init__
method, because Scrapyd sets internally-generated spider arguments when starting crawls.
- Supported request methods
POST
- Parameters
project
(required)the project name
spider
(required)the spider name
_version
the project version (the latest project version by default)
jobid
the job’s ID (a hexadecimal UUID v1 by default)
priority
the job’s priority in the project’s spider queue (0 by default, higher number, higher priority)
setting
a Scrapy setting
For example, using DOWNLOAD_DELAY:
curl http://localhost:6800/schedule.json -d setting=DOWNLOAD_DELAY=2 -d project=myproject -d spider=somespider
- Any other parameter
a spider argument
For example, using
arg1
:curl http://localhost:6800/schedule.json -d arg1=val1 -d project=myproject -d spider=somespider
Warning
When such parameters are set multiple times, only the first value is sent to the spider.
To change this behavior, please open an issue.
Example:
$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider
{"node_name": "mynodename", "status": "ok", "jobid": "6487ec79947edab326d6db28a2d86511e8247444"}
status.json¶
Added in version 1.5.0.
Get the status of a job.
- Supported request methods
GET
- Parameters
job
(required)the job ID
project
the project name
Example:
$ curl http://localhost:6800/status.json?job=6487ec79947edab326d6db28a2d86511e8247444
{"node_name": "mynodename", "status": "ok", "currstate": "running"}
cancel.json¶
Cancel a job.
If the job is pending, it is removed from the project’s spider queue.
If the job is running, the process is sent a signal to terminate.
- Supported request methods
POST
- Parameters
project
(required)the project name
job
(required)the job ID
signal
the signal to send to the Scrapy process (
BREAK
by default on Windows andINT
by default, otherwise)
Example:
$ curl http://localhost:6800/cancel.json -d project=myproject -d job=6487ec79947edab326d6db28a2d86511e8247444
{"node_name": "mynodename", "status": "ok", "prevstate": "running"}
listprojects.json¶
Get the projects.
- Supported request methods
GET
Example:
$ curl http://localhost:6800/listprojects.json
{"node_name": "mynodename", "status": "ok", "projects": ["myproject", "otherproject"]}
listversions.json¶
Get the versions of a project in eggstorage, in order, with the latest version last.
- Supported request methods
GET
- Parameters
project
(required)the project name
Example:
$ curl http://localhost:6800/listversions.json?project=myproject
{"node_name": "mynodename", "status": "ok", "versions": ["r99", "r156"]}
listspiders.json¶
Get the spiders in a version of a project.
Note
If the project is configured via a scrapy.cfg file rather than uploaded via the addversion.json webservice, don’t set the version
parameter.
- Supported request methods
GET
- Parameters
project
(required)the project name
_version
the project version (the latest project version by default)
Example:
$ curl http://localhost:6800/listspiders.json?project=myproject
{"node_name": "mynodename", "status": "ok", "spiders": ["spider1", "spider2", "spider3"]}
listjobs.json¶
Get the pending, running and finished jobs of a project.
Pending jobs are in spider queues.
Running jobs have Scrapy processes.
Finished jobs are in :ref:job storage<jobstorage>`.
Note
The default jobstorage setting stores jobs in memory, such that jobs are lost when the Scrapyd process ends.
log_url
isnull
in the response if logs_dir is disabled or the file doesn’t exist.items_url
isnull
in the response if items_dir is disabled or the file doesn’t exist.
- Supported request methods
GET
- Parameters
project
filter results by project name
Example:
$ curl http://localhost:6800/listjobs.json?project=myproject | python -m json.tool
{
"node_name": "mynodename",
"status": "ok",
"pending": [
{
"id": "78391cc0fcaf11e1b0090800272a6d06",
"project": "myproject",
"spider": "spider1",
"version": "0.1",
"settings": {"DOWNLOAD_DELAY=2"},
"args": {"arg1": "val1"},
}
],
"running": [
{
"id": "422e608f9f28cef127b3d5ef93fe9399",
"project": "myproject",
"spider": "spider2",
"pid": 93956,
"start_time": "2012-09-12 10:14:03.594664",
"log_url": "/logs/myproject/spider3/2f16646cfcaf11e1b0090800272a6d06.log",
"items_url": "/items/myproject/spider3/2f16646cfcaf11e1b0090800272a6d06.jl"
}
],
"finished": [
{
"id": "2f16646cfcaf11e1b0090800272a6d06",
"project": "myproject",
"spider": "spider3",
"start_time": "2012-09-12 10:14:03.594664",
"end_time": "2012-09-12 10:24:03.594664",
"log_url": "/logs/myproject/spider3/2f16646cfcaf11e1b0090800272a6d06.log",
"items_url": "/items/myproject/spider3/2f16646cfcaf11e1b0090800272a6d06.jl"
}
]
}
delversion.json¶
Delete a version of a project from eggstorage. If no versions of the project remain, delete the project, too.
- Supported request methods
POST
- Parameters
project
(required)the project name
version
(required)the project version
Example:
$ curl http://localhost:6800/delversion.json -d project=myproject -d version=r99
{"node_name": "mynodename", "status": "ok"}
delproject.json¶
Delete a project and its versions from eggstorage.
- Supported request methods
POST
- Parameters
project
(required)the project name
Example:
$ curl http://localhost:6800/delproject.json -d project=myproject
{"node_name": "mynodename", "status": "ok"}