Overview¶

Projects and versions¶

Scrapyd can manage multiple Scrapy projects. Each project can have multiple versions. The latest version is used by default for starting spiders.

Version order¶

The latest version is the alphabetically greatest, unless all version names are version specifiers like 1.0 or 1.0rc1, in which case they are sorted as such.

How Scrapyd works¶

Scrapyd is a server (typically run as a daemon) that listens for API and Web interface requests.

The API is especially used to upload projects and schedule crawls. To start a crawl, Scrapyd spawns a process that essentially runs:

scrapy crawl myspider

Scrapyd runs multiple processes in parallel, and manages the number of concurrent processes. See Launcher options for details.

If you are familiar with the Twisted Application Framework, you can essentially reconfigure every part of Scrapyd. See Configuration for details.

Web interface¶

Scrapyd has a minimal web interface for monitoring running processes and accessing log files and item fees. By default, it is available at at http://localhost:6800/ Other options to manage Scrapyd include: