--------------------
This release contains the most changes in a decade. Therefore, a beta release is made first.
Added
~~~~~
- Add ``version`` (egg version), ``settings`` (Scrapy settings) and ``args`` (spider arguments) to the pending jobs in the response from the :ref:`listjobs.json` webservice.
- Add ``log_url`` and ``items_url`` to the running jobs in the response from the :ref:`listjobs.json` webservice.
- Add a :ref:`status.json` webservice, to get the status of a job.
- Add a :ref:`unix_socket_path` setting, to listen on a Unix socket.
- Add a :ref:`poller` setting.
- Respond to HTTP ``OPTIONS`` method requests.
- Add environment variables to override common options. See :ref:`config-envvars`.
Documentation
^^^^^^^^^^^^^
- How to add webservices (endpoints). See :ref:`config-services`.
- How to create Docker images. See :ref:`docker`.
- How to integrate Scrapy projects, without eggs. See :ref:`config-settings`.
Changed
~~~~~~~
- Every :ref:`poll_interval`, up to :ref:`max_proc` processes are started by the default :ref:`poller`, instead of only one process. (The number of running jobs will not exceed :ref:`max_proc`.)
- Drop support for end-of-life Python version 3.7.
Web UI
^^^^^^
- Add basic CSS.
- Add a confirmation dialog to the Cancel button.
- Add "Last modified" column to the directory listings of log files and item feeds.
- The Jobs page responds only to HTTP ``GET`` and ``HEAD`` method requests.
API
^^^
- Clarify error messages, for example:
- ``'project' parameter is required``, instead of ``'project'`` (KeyError)
- ``project 'myproject' not found``, instead of ``'myproject'`` (KeyError)
- ``project 'myproject' not found``, instead of ``Scrapy VERSION - no active project``
- ``version 'myversion' not found``, instead of a traceback
- ``exception class: message``, instead of ``message``
- ``BadEggError``, instead of ``TypeError: 'tuple' object is not an iterator``
- Error messages for non-UTF-8 bytes and non-float ``priority``.
- "Unsupported method" error messages no longer list ``object`` as an allowed HTTP method
CLI
^^^
- Scrapyd uses ``twisted.logger`` instead of the legacy ``twisted.python.log``. Some system information changes:
- ``[scrapyd.basicauthinfo] Basic authentication ...``, instead of ``[-] ...``
- ``[scrapyd.appinfo] Scrapyd web console available at ...``, instead of ``[-] ...``
- ``[-] Unhandled Error``, instead of ``[_GenericHTTPChannelProtocol,0,127.0.0.1] ...``
- Data received from standard error and non-zero exit status codes are logged at error level.
- Correct the usage message and long description.
- Remove the ``--rundir`` option, which only works if ``*_dir`` settings are absolute paths.
- Remove the ``--nodaemon`` (``-n``) option, which Scrapyd enables.
- Remove the ``--python=`` (``-y``) option, which Scrapyd needs to set to its application.
- Remove all ``twistd`` subcommands (FTP servers, etc.). Run ``twistd``, if needed.
- Run the ``scrapyd.__main__`` module, instead of the ``scrapyd.scripts.scrapyd_run`` module.
Library
^^^^^^^
- Move functions from ``scrapyd.utils`` into their callers:
- ``sorted_versions`` to ``scrapyd.eggstorage``
- ``get_crawl_args`` to ``scrapyd.launcher``
- :ref:`jobstorage` uses the ``ScrapyProcessProtocol`` class, by default. If :ref:`jobstorage` is set to ``scrapyd.jobstorage.SqliteJobStorage``, Scrapyd 1.3.0 uses a ``Job`` class, instead. To promote parity, the ``Job`` class is removed.
- Move the ``activate_egg`` function from the ``scrapyd.eggutils`` module to its caller, the ``scrapyd.runner`` module.
- Move the ``job_log_url`` and ``job_items_url`` functions into the ``Root`` class, since the ``Root`` class is responsible for file URLs.
- Change the ``get_crawl_args`` function to no longer convert ``bytes`` to ``str``, as already done by its caller.
- Change the ``scrapyd.app.create_wrapped_resource`` function to a ``scrapyd.basicauth.wrap_resource`` function.
- Change the ``scrapyd.utils.sqlite_connection_string`` function to an ``scrapyd.sqlite.initialize`` function.
- Change the ``get_spider_list`` function to a ``SpiderList`` class.
- Merge the ``JsonResource`` class into the ``WsResource`` class, removing the ``render_object`` method.
Fixed
~~~~~
- Restore support for :ref:`eggstorage` implementations whose ``get()`` methods return file-like objects without ``name`` attributes (1.4.3 regression).
- If the :ref:`items_dir` setting is a URL and the path component ends with ``/``, the ``FEEDS`` setting no longer contains double slashes.
- The ``MemoryJobStorage`` class returns finished jobs in reverse chronological order, like the ``SqliteJobStorage`` class.
- The ``list_projects`` method of the ``SpiderScheduler`` class returns a ``list``, instead of ``dict_keys``.
- Log errors to Scrapyd's log, even when :ref:`debug` mode is enabled.
- List the closest ``scrapy.cfg`` file as a :ref:`configuration source<config-sources>`.
API
^^^
- The ``Content-Length`` header counts the number of bytes, instead of the number of characters.
- The ``Access-Control-Allow-Methods`` response header contains only the HTTP methods to which webservices respond.
- The :ref:`listjobs.json` webservice sets the ``log_url`` and ``items_url`` fields to ``null`` if the files don't exist.
- The :ref:`schedule.json` webservice sets the ``node_name`` field in error responses.
- The next pending job for all but one project was unreported by the :ref:`daemonstatus.json` and :ref:`listjobs.json` webservices, and was not cancellable by the :ref:`cancel.json` webservice.
Security
^^^^^^^^
- The ``FilesystemEggStorage`` class used by the :ref:`listversions.json` webservice escapes project names (used in glob patterns) before globbing, to disallow listing arbitrary directories.
- The ``FilesystemEggStorage`` class used by the :ref:`runner` and the :ref:`addversion.json`, :ref:`listversions.json`, :ref:`delversion.json` and :ref:`delproject.json` webservices raises a ``DirectoryTraversalError`` error if the project parameter (used in file paths) would traverse directories.
- The ``Environment`` class used by the :ref:`launcher` raises a ``DirectoryTraversalError`` error if the project, spider or job parameters (used in file paths) would traverse directories.
- The :ref:`webui` escapes user input (project names, spider names, and job IDs) to prevent cross-site scripting (XSS).
Platform support
^^^^^^^^^^^^^^^^
Scrapyd is now tested on macOS and Windows, in addition to Linux.
- The :ref:`cancel.json` webservice now works on Windows, by using SIGBREAK instead of SIGINT or SIGTERM.
- The :ref:`dbs_dir` setting no longer causes an error if it contains a drive letter on Windows.
- The :ref:`items_dir` setting is considered a local path if it contains a drive letter on Windows.
- The :ref:`jobs_to_keep` setting no longer causes an error if a file to delete can't be deleted (for example, if the file is open on Windows).
Removed
~~~~~~~
- Remove support for parsing URLs in :ref:`dbs_dir`, since SQLite writes only to paths or ``:memory:`` (added in 1.4.2).
- Remove the ``JsonSqliteDict`` and ``UtilsCache`` classes.
- Remove the ``native_stringify_dict`` function.
- Remove undocumented and unused internal environment variables:
- ``SCRAPYD_FEED_URI``
- ``SCRAPYD_JOB``
- ``SCRAPYD_LOG_FILE``
- ``SCRAPYD_SLOT``
- ``SCRAPYD_SPIDER``