Developer API reference¶
Interfaces¶
- interface scrapyd.interfaces.IEggStorage[source]¶
A component to store project eggs.
- put(eggfile, project, version)¶
Store the egg (a file object), which represents a
version
of theproject
.
- get(project, version=None)¶
Return
(version, file)
for the egg matching theproject
andversion
.If
version
isNone
, the latest version and corresponding file are returned.If no egg is found,
(None, None)
is returned.Tip
Remember to close the
file
when done.
- list(project)¶
Return all versions of the
project
in order, with the latest version last.
- list_projects()¶
Return all projects in storage.
Added in version 1.3.0: Move this logic into the interface and its implementations, to allow customization.
- delete(project, version=None)¶
Delete the egg matching the
project
andversion
. Delete theproject
, if no versions remains.
- interface scrapyd.interfaces.IPoller[source]¶
A component that tracks capacity for new jobs, and starts jobs when ready.
- queues¶
An object (like a
dict
) with a__getitem__
method that accepts a project’s name and returns itsspider queue
of pending jobs.
- poll()¶
Called periodically to start jobs if there’s capacity.
- next()¶
Return the next pending job.
It should return a Deferred that will be fired when there’s capacity, or already fired if there’s capacity.
The pending job is a
dict
containing at least the_project
name,_spider
name and_job
ID. The job ID is unique, at least within the project.The pending job is later passed to
scrapyd.interfaces.IEnvironment.get_environment()
.
- update_projects()¶
Called when projects may have changed, to refresh the available projects, including at initialization.
- interface scrapyd.interfaces.ISpiderQueue[source]¶
A component to store pending jobs.
The
dict
keys used by the chosenISpiderQueue
implementation must match the chosen:launcher service (which calls
scrapyd.interfaces.IPoller.next()
)IEnvironment
implementation (seescrapyd.interfaces.IPoller.next()
)webservices that schedule, cancel or list pending jobs
- add(name, priority, **spider_args)¶
Add a pending job, given the spider
name
, crawlpriority
and keyword arguments, which might include the_job
ID, egg_version
and Scrapysettings
depending on the implementation, with keyword arguments that are not recognized by the implementation being treated as spider arguments.Changed in version 1.3.0: Add the
priority
parameter.
- pop()¶
Pop the next pending job. The pending job is a
dict
containing the spidername
. Depending on the implementation, other keys might include the_job
ID, egg_version
and Scrapysettings
, with keyword arguments that are not recognized by the receiver being treated as spider arguments.
- list()¶
Return the pending jobs.
- count()¶
Return the number of pending jobs.
- remove(func)¶
Remove pending jobs for which
func(job)
is true, and return the number of removed pending jobss.
- clear()¶
Remove all pending jobs.
- interface scrapyd.interfaces.ISpiderScheduler[source]¶
A component to schedule jobs.
- schedule(project, spider_name, priority, **spider_args)¶
Schedule a crawl.
Changed in version 1.3.0: Add the
priority
parameter.
- list_projects()¶
Return all projects that can be scheduled.
- update_projects()¶
Called when projects may have changed, to refresh the available projects, including at initialization.
- interface scrapyd.interfaces.IEnvironment[source]¶
A component to generate the environment of jobs.
The chosen
IEnvironment
implementation must match the chosen launcher service.- get_settings(message)¶
Return the Scrapy settings to use for running the process.
Depending on the chosen launcher, this would be one of more
LOG_FILE
orFEEDS
.Added in version 1.4.2: Support for overriding Scrapy settings via
SCRAPY_
environment variables was removed in Scrapy 2.8.- Parameters:
message – the pending job received from the
scrapyd.interfaces.IPoller.next()
method
- get_environment(message, slot)¶
Return the environment variables to use for running the process.
Depending on the chosen launcher, this would be one of more of
SCRAPY_PROJECT
,SCRAPYD_EGG_VERSION
orSCRAPY_SETTINGS_MODULE
.- Parameters:
message – the pending job received from the
scrapyd.interfaces.IPoller.next()
methodslot – the launcher slot for tracking the process
- interface scrapyd.interfaces.IJobStorage[source]¶
A component to store finished jobs.
Added in version 1.3.0.
- add(job)¶
Add a finished job in the storage.
- list()¶
Return the finished jobs.
- __len__()¶
Return the number of finished jobs.
- __iter__()¶
Iterate over the finished jobs in reverse order by
end_time
.A job has the attributes
project
,spider
,job
,start_time
andend_time
and may have the attributesargs
(scrapy crawl
CLI arguments) andenv
(environment variables).
Config¶
Exceptions¶
- exception scrapyd.exceptions.ScrapydError[source]¶
Base class for exceptions from within this package
- exception scrapyd.exceptions.ConfigError[source]¶
Raised if a configuration error prevents Scrapyd from starting
- exception scrapyd.exceptions.DirectoryTraversalError[source]¶
Raised if the resolved path is outside the expected directory
- exception scrapyd.exceptions.ProjectNotFoundError[source]¶
Raised if a project isn’t found in an IEggStorage implementation