Scrapyd searches for configuration files in the following locations, and parses them in order with the latest one taking more priority:
/etc/scrapyd/conf.d/*(in alphabetical order, Unix)
~/.scrapyd.conf(users home directory)
The configuration file supports the following options (see default values in the example).
The TCP port where the HTTP JSON API will listen. Defaults to
The IP address where the website and json webservices will listen.
New in version 1.3.
password to non-empty to enable basic authentication.
New in version 1.3.
username option above.
The maximum number of concurrent Scrapy process that will be started. If unset
0 it will use the number of cpus available in the system multiplied by
the value in
max_proc_per_cpu option. Defaults to
The maximum number of concurrent Scrapy process that will be started per cpu.
Whether debug mode is enabled. Defaults to
off. When debug mode is enabled
the full Python traceback will be returned (as plain text responses) when there
is an error processing a JSON API call.
The directory where the project eggs will be stored.
The directory where the project databases will be stored (this includes the spider queues).
The directory where the Scrapy logs will be stored. If you want to disable storing logs set this option empty, like this:
New in version 0.15.
The directory where the Scrapy items will be stored.
This option is disabled by default
because you are expected to use a database or a feed exporter.
Setting it to non-empty results in storing scraped item feeds
to the specified directory by overriding the scrapy setting
New in version 0.15.
The number of finished jobs to keep per spider.
This refers to logs and items.
This setting was named
logs_to_keep in previous versions.
New in version 0.14.
The number of finished processes to keep in the launcher.
This only reflects on the website /jobs endpoint and relevant json webservices.
The interval used to poll queues, in seconds.
Can be a float, such as
The module that will be used for launching sub-processes. You can customize the Scrapy processes launched from Scrapyd by using your own module.
A function that returns the (Twisted) Application object to use. This can be used if you want to extend Scrapyd by adding and removing your own components and services.
For more info see Twisted Application Framework
A twisted web resource that represents the interface to scrapyd. Scrapyd includes an interface with a website to provide simple monitoring and access to the application’s webresources. This setting must provide the root class of the twisted web resource.
A class that stores finished jobs. There are 2 implementations provided:
scrapyd.jobstorage.MemoryJobStorage(default) jobs are stored in memory and lost when the daemon is restarted
scrapyd.jobstorage.SqliteJobStoragejobs are persisted in a Sqlite database in
If another backend is needed, one can implement its own class by implementing the IJobStorage interface.
A class that stores and retrieves eggs for running spiders.
The default implementation is FilesystemEggStorage and stores eggs on the file system based on
One can customize the storage by implementing the IEggStorage interface.
New in version 1.1.
The node name for each node to something like the display hostname. Defaults to
Example configuration file#
Here is an example configuration file with all the defaults:
[scrapyd] eggs_dir = eggs logs_dir = logs items_dir = jobs_to_keep = 5 dbs_dir = dbs max_proc = 0 max_proc_per_cpu = 4 finished_to_keep = 100 poll_interval = 5.0 bind_address = 127.0.0.1 http_port = 6800 username = password = debug = off runner = scrapyd.runner jobstorage = scrapyd.jobstorage.MemoryJobStorage application = scrapyd.app.application launcher = scrapyd.launcher.Launcher webroot = scrapyd.website.Root eggstorage = scrapyd.eggstorage.FilesystemEggStorage [services] schedule.json = scrapyd.webservice.Schedule cancel.json = scrapyd.webservice.Cancel addversion.json = scrapyd.webservice.AddVersion listprojects.json = scrapyd.webservice.ListProjects listversions.json = scrapyd.webservice.ListVersions listspiders.json = scrapyd.webservice.ListSpiders delproject.json = scrapyd.webservice.DeleteProject delversion.json = scrapyd.webservice.DeleteVersion listjobs.json = scrapyd.webservice.ListJobs daemonstatus.json = scrapyd.webservice.DaemonStatus