🔒 Security + Forwards Incompatible Change
- S3 and Azure: Do not save sensitive or ephemeral config in the config library (https://github.com/man-group/ArcticDB/pull/803)
This fixes a security issue with ArcticDB where creds were kept in storage for:
- Azure
- AWS if the access keys are supplied in the URI instead of aws_auth=True.
[These instructions](https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/upgrade_storage.md) explain how to upgrade your storage to remove the credentials. See also issue https://github.com/man-group/ArcticDB/issues/802 .
Compatibility matrix
<table>
<tr>
<th>Storage</th>
<th>Library created with < v3. <br>Library accessed with >= v3.</th>
<th>Library created with or upgraded to >= v3. <br>Library accessed with < v3. </th>
</tr>
<tr>
<th>S3 with <code>aws_auth=True</code></th>
<td>Continues to work</td>
<td>
Raises <code>InternalException: E_INVALID_ARGUMENT S3 Endpoint must be specified</code>.<br>
Will work again if <code>access=_RBAC_&secret=_RBAC_&force_uri_lib_config=true</code> is in the URI passed to <code>Arctic()</code></td>
</tr>
<tr>
<th>S3 with <code>access</code> and <code>secret</code>.</th>
<td rowspan="2"><p>Will now use the creds passed to <code>Arctic()</code>, but should continue to work if the creds are sufficient.
<p>A future release might print a warning with instructions to upgrade.</td>
<td>Raises <code>InternalException: E_INVALID_ARGUMENT S3 Endpoint must be specified</code>.<br>
Will work if <code>force_uri_lib_config=true</code> is in the URI passed to <code>Arctic()</code></td>
</tr>
<tr>
<th>Azure</th>
<td>Operations on the library will fail with various internal error messages</td>
</tr>
</table>
Full details:
What's happened?
Whilst reviewing our codebase we discovered a way that access-keys for ArcticDB storage backends could be saved into the storage in clear text.
This behavior was by design, but there is a chance that this has happened for some third-party users without being obvious.
This depends on the backend used and how you connect to the storage.
What is the exact scope of the issue?
If you created an ArcticDB library, either with an S3 bucket and passed the access-keys as part of the URI, or with Azure Blob Storage with the access-keys as part of the connection-string, then the credentials were saved into the storage account as part of the ArcticDB library config.
If you then shared that storage account with others using different roles or access-keys, then those users would in theory have been able to access the credentials used to create the library.
What have you done to address this?
We've updated ArcticDB so that all new libraries do not do this, even if the credentials are passed in with the URI/connection-string.
We've prepared a storage-update script which you can run to see if the credentials are there, and then remove them if they are.
What is the impact if I am affected?
If you have shared that storage account with anyone else using different roles/credentials, then your original credentials have also been accessible to those users.
It's possible those users recorded the credentials, and because those credentials must have had write-access to create the library, they could have made changes to the data or otherwise used those credentials.
What can I do to check if I'm affected?
See these [instructions](https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/upgrade_storage.md).
If needed you can check on previous versions of ArcticDB using the code referenced on github:
https://github.com/man-group/ArcticDB/issues/802#issuecomment-1697814768
What should I do if I am affected?
Follow these [instructions](https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/upgrade_storage.md).
This change is not forwards compatible, so users on earlier clients may need to upgrade:
- S3 libraries created with 3.0.0 will not be readable by earlier ArcticDB versions unless force_uri_lib_config=True in their connection string.
- Azure libraries created with 3.0.0 will not be readable by earlier ArcticDB versions.
Then,
- Rotate your credentials.
- If you've shared access to that storage account then please also check the integrity of your data and anything else accessible via those credentials.
What was the cause?
Previous use cases of ArcticDB had split storage accounts. One account was used to configure libraries and other accounts held the data for those libraries. Credentials to read those data-libraries were then stored into the configuration account and passed to users as needed for access to the data. This code was not caught during our review, and so was not disabled or removed when we made ArcticDB available to others. When we added Azure Blob storage support subsequently, the side-effect of saving anything in the connection-string to storage was not anticipated.
Having reviewed the codebase again we are confident that this was the only way that credentials could be saved into storage using our public API.
We plan to continue supporting our split storage solution for some users, but it should always be very clear when access-keys are being stored and what the risks are for that.
🚀 Features
- Conda-forge build now supports Azure Blob Storage
- Enhancement/728/make iclause responsible for processing structure (https://github.com/man-group/ArcticDB/pull/752)
- Add more info in the CI readme; Prepare var for real storage tests (https://github.com/man-group/ArcticDB/pull/663)
- **Enhancement 702: Add option to create library if it does not exist when calling get_library (https://github.com/man-group/ArcticDB/pull/775)**
- **Enhancement 714: Expose library methods to list symbols with staged data, and to delete staged data (https://github.com/man-group/ArcticDB/pull/778)**
- Enhancement 737: Support empty-type columns in QueryBuilder operations (https://github.com/man-group/ArcticDB/pull/794)
- conda-build: Adapt C++ test suite for Linux (https://github.com/man-group/ArcticDB/pull/713)
🐛 Fixes
- conda-build: Use default compilers for macOS (https://github.com/man-group/ArcticDB/pull/662)
- Bugfix/nativeversionstore write metadata batch should never return dataerror objects (https://github.com/man-group/ArcticDB/pull/782)
- Add handling of unspecified ca path in azure uri (https://github.com/man-group/ArcticDB/pull/771)
- Add dep. on packaging (https://github.com/man-group/ArcticDB/pull/795)
- Fix get_num_rows for NativeVersionStore (https://github.com/man-group/ArcticDB/pull/800)
<details>
<summary>Uncategorized</summary>
- First version of AWS S3 setup guide (https://github.com/man-group/ArcticDB/pull/708)
- fix(docs): central docs URL from API docs homepage (https://github.com/man-group/ArcticDB/pull/755)
- Add none type (https://github.com/man-group/ArcticDB/pull/646)
- Azure getting started guide (https://github.com/man-group/ArcticDB/pull/749)
- Docs fixes (https://github.com/man-group/ArcticDB/pull/762)
- Decouple storage headers from implementations & storage.hpp (https://github.com/man-group/ArcticDB/pull/763)
- Bugfix 554: Remove unused argument from write_batch (https://github.com/man-group/ArcticDB/pull/769)
- Partially revert https://github.com/man-group/ArcticDB/pull/763 for consistency (https://github.com/man-group/ArcticDB/pull/766)
- Make it clear to not commit directly to ArcticDB feedstock but use PRs instead (https://github.com/man-group/ArcticDB/pull/741)
- maint: pandas 2.0 forward compatible changes (https://github.com/man-group/ArcticDB/pull/540)
- test: Test the absence of implace modification on datetime64 normalization for pandas 2.0 (https://github.com/man-group/ArcticDB/pull/801)
- Update README.md (https://github.com/man-group/ArcticDB/pull/799)
- test: Remove test for fallback to pickle (https://github.com/man-group/ArcticDB/pull/805)
- Docs - update release number (https://github.com/man-group/ArcticDB/pull/816)
- conda-build: Pin cmake (https://github.com/man-group/ArcticDB/pull/815)
- Update releasing.md (https://github.com/man-group/ArcticDB/pull/817)
- ArcticDB 3.0.0 update BSL table (https://github.com/man-group/ArcticDB/pull/820)
</details>