This release includes the following bugfixes:
* Sdist building:
* added missing `data_files` in config.cfg.
* Docker Compose / Airflow 2:
* Received the error "daemonic processes are not allowed to have children" when tasks ran that use multiprocessing.Pool, to address it added `AIRFLOW__CORE__EXECUTE_TASKS_NEW_PYTHON_INTERPRETER ` to the Docker Compose file. This is the same error described in [this Stack Overflow post](https://stackoverflow.com/questions/68878031/is-multiprocessing-pool-not-allowed-in-airflow-task-assertionerror-daemonic).
* Set Docker Compose volume paths correctly for editable workflows packages when deployed to Terraform.
* Terraform:
* For Terraform config where Google Cloud Secrets that were made had their value set to the secret key instead of the secret value.
* Update TerraformBuilder so that it builds with the latest changes.
* Observatory API: `postgres` connection prefix deprecated in PostgresSQL 1.4, so changed in Terraform file to `postgresql`.
* Address inconsistent use of dates:
* Change type hints pendulum.datetime to pendulum.DateTime (the class, not function).
* Change datetime.datetime calls to pendulum.datetime.
* Make `select_table_shard_dates` return List[pendulum.Date]
* Add a `make_release_date` function, which returns a pendulum.DateTime instance, which is required for some of the downstream functions that use it.
* get_airflow_connection_url: call get_uri to get the uri.
And the following new features:
* Added black to precommit config.
* load_dags.py:
* When DagBag has import errors, raise an exception that has a message with all of the errors so that the Dag import errors are visible in the
* Testing:
* Add simple threaded httpserver for testing use
* Utilities
* Add get_observatory_http_header to create simple header dict using custom user agent
* Add get_fiename_from_url to get a filename from a http url
* Add get_chunks function to split lists into constant size (unless last chunk) chunks.
* Add get_airflow_connection_url to pull a url from an airflow connection, validate it, and add trailing "/" if necessary.
* Add converter function for csv to jsonl files.
* Add http get response functions for simple interfaces to standardise getting http raw text response, xml -> dict, json->dict.
* Add AsyncHttpFileDownloader with download_file and download_files interfaces for downloading files using http. Allows custom headers to be used in http connection.
* download_files allows concurrent downloading through asyncio and aiohttp. Supports retry on failure with exponential backoff.
* download_file piggybacks off download_files. No speed benefit from asyncio, but provides a simpler interface.
* add get_airflow_connection_password
* add unzip_files function
* add find_replace_file (sed cli replacement)
* add fn to wrap shell cmd calls. treats non zero exit as error.
* Snapshot telescope:
* Add upload_downloaded as a snapshot telescope task. I noticed a lot of the upload_downloaded tasks in snapshot telescopes are identical in implementation. They all just upload the download_files list of files from the release object to the download_bucket in the cloud. Since this is a standard pattern we have adopted, it may as well just be part of the snapshot telescope implementation.
* Add download, extract, transform tasks to template.
* Stream telescope:
* Add download, upload_downloaded, extract, transform tasks to template.