* Added `pipeline.json` JSON schema to this package. Made `problem.json` JSON schema
describing parsed problem description's schema. There is also a `d3m.metadata.pipeline`
parser for pipelines in this schema and Python object to represent a pipeline.
[53](https://gitlab.com/datadrivendiscovery/d3m/issues/53)
* Updated README to make it explicit that for tabular data the first dimension
is always rows and the second always columns, even in the case of a DataFrame
container type.
[54](https://gitlab.com/datadrivendiscovery/d3m/issues/54)
* Made `Dataset` container type return Pandas `DataFrame` instead of numpy `ndarray`
and in generaly suggest to use Pandas `DataFrame` as a default container type.
**Backwards incompatible.**
[49](https://gitlab.com/datadrivendiscovery/d3m/issues/49)
* Added `UniformBool` hyper-parameter class.
* Renamed `FeaturizationPrimitiveBase` to `FeaturizationLearnerPrimitiveBase`.
**Backwards incompatible.**
* Defined `ClusteringTransformerPrimitiveBase` and renamed `ClusteringPrimitiveBase`
to `ClusteringLearnerPrimitiveBase`.
**Backwards incompatible.**
[20](https://gitlab.com/datadrivendiscovery/d3m/issues/20)
* Added `inputs_across_samples` decorator to mark which method arguments
are inputs which compute across samples.
[19](https://gitlab.com/datadrivendiscovery/d3m/issues/19)
* Converted `SingletonOutputMixin` to a `singleton` decorator. This allows
each produce method separately to be marked as a singleton produce method.
**Backwards incompatible.**
[17](https://gitlab.com/datadrivendiscovery/d3m/issues/17)
* `can_accept` can also raise an exception with information why it cannot accept.
[13](https://gitlab.com/datadrivendiscovery/d3m/issues/13)
* Added `Primitive` hyper-parameter to describe a primitive or primitives.
Additionally, documented in docstrings better how to define hyper-parameters which
use primitives for their values and how should such primitives-as-values be passed
to primitives as their hyper-parameters.
[51](https://gitlab.com/datadrivendiscovery/d3m/issues/51)
* Hyper-parameter values can now be converted to and from JSON-compatible structure
using `values_to_json` and `values_from_json` methods. Non-primitive values
are pickled and stored as base64 strings.
[67](https://gitlab.com/datadrivendiscovery/d3m/issues/67)
* Added `Choice` hyper-parameter which allows one to define
combination of hyper-parameters which should exists together.
[28](https://gitlab.com/datadrivendiscovery/d3m/issues/28)
* Added `Set` hyper-parameter which samples multiple times another hyper-parameter.
[52](https://gitlab.com/datadrivendiscovery/d3m/issues/52)
* Added `https://metadata.datadrivendiscovery.org/types/MetafeatureParameter`
semantic type for hyper-parameters which control which meta-features are
computed by the primitive.
[41](https://gitlab.com/datadrivendiscovery/d3m/issues/41)
* Added `supported_media_types` primitive metadata to describe
which media types a primitive knows how to manipulate.
[68](https://gitlab.com/datadrivendiscovery/d3m/issues/68)
* Renamed metadata property `mime_types` to `media_types`.
**Backwards incompatible.**
* Made pyarrow dependency a package extra. You can depend on it using
`d3m[arrow]`.
[66](https://gitlab.com/datadrivendiscovery/d3m/issues/66)
* Added `multi_produce` method to primitive interface which allows primitives
to optimize calls to multiple produce methods they might have.
[21](https://gitlab.com/datadrivendiscovery/d3m/issues/21)
* Added `d3m.utils.redirect_to_logging` context manager which can help
redirect primitive's output to stdout and stderr to primitive's logger.
[65](https://gitlab.com/datadrivendiscovery/d3m/issues/65)
* Primitives can now have a dependency on static files and directories.
One can use `FILE` and `TGZ` entries in primitive's `installation`
metadata to ask the caller to provide paths those files and/or
extracted directories through new `volumes` constructor argument.
[18](https://gitlab.com/datadrivendiscovery/d3m/issues/18)
* Core dependencies have been upgraded: `numpy==1.14.2`, `networkx==2.1`.
* LUPI quality in D3M datasets is now parsed into
`https://metadata.datadrivendiscovery.org/types/SuggestedPrivilegedData`
semantic type for a column.
[61](https://gitlab.com/datadrivendiscovery/d3m/issues/61)
* Support for primitives using Docker containers has been put on hold.
We are keeping a way to pass information about running containers to a
primitive and defining dependent Docker images in metadata, but currently
it is not expected that any runtime running primitives will run
Docker containers for a primitive.
[18](https://gitlab.com/datadrivendiscovery/d3m/issues/18)
* Primitives do not have to define all constructor arguments anymore.
This allows them to ignore arguments they do not use, e.g.,
`docker_containers`.
On the other side, when creating an instance of a primitive, one
has now to check which arguments the constructor accepts, which is
available in primitive's metadata:
`primitive.metadata.query()['primitive_code'].get('instance_methods', {})['__init__']['arguments']`.
[63](https://gitlab.com/datadrivendiscovery/d3m/issues/63)
* Information about running primitive's Docker container has changed
from just its address to a `DockerContainer` tuple containing both
the address and a map of all exposed ports.
At the same time, support for Docker has been put on hold so you
do not really have to upgrade for this change anything and can simply
remove the `docker_containers` argument from primitive's constructor.
**Backwards incompatible.**
[14](https://gitlab.com/datadrivendiscovery/d3m/issues/14)
* Multiple exception classes have been defined in `d3m.exceptions`
module and are now in use. This allows easier and more precise
handling of exceptions.
[12](https://gitlab.com/datadrivendiscovery/d3m/issues/12)
* Fixed inheritance of `Hyperparams` class.
[44](https://gitlab.com/datadrivendiscovery/d3m/issues/44)
* Each primitive's class now automatically gets an instance of
[Python's logging](https://docs.python.org/3/library/logging.html)
logger stored into its ``logger`` class attribute. The instance is made
under the name of primitive's ``python_path`` metadata value. Primitives
can use this logger to log information at various levels (debug, warning,
error) and even associate extra data with log record using the ``extra``
argument to the logger calls.
[10](https://gitlab.com/datadrivendiscovery/d3m/issues/10)
* Made sure container data types can be serialized with Arrow/Plasma
while retaining their metadata.
[29](https://gitlab.com/datadrivendiscovery/d3m/issues/29)
* `Scores` in `GradientCompositionalityMixin` replaced with `Gradients`.
`Scores` only makes sense in a probabilistic context.
* Renamed `TIMESERIES_CLASSIFICATION`, `TIMESERIES_FORECASTING`, and
`TIMESERIES_SEGMENTATION` primitives families to
`TIME_SERIES_CLASSIFICATION`, `TIME_SERIES_FORECASTING`, and
`TIME_SERIES_SEGMENTATION`, respectively, to match naming
pattern used elsewhere.
Similarly, renamed `UNIFORM_TIMESERIES_SEGMENTATION` algorithm type
to `UNIFORM_TIME_SERIES_SEGMENTATION`.
Compound words using hyphens are separated, but hyphens for prefixes
are not separated. So "Time-series" and "Root-mean-squared error"
become `TIME_SERIES` and `ROOT_MEAN_SQUARED_ERROR`
but "Non-overlapping" and "Multi-class" are `NONOVERLAPPING` and `MULTICLASS`.
**Backwards incompatible.**
* Updated performance metrics to include `PRECISION_AT_TOP_K` metric.
* Added to problem description parsing support for additional metric
parameters and updated performance metric functions to use them.
[42](https://gitlab.com/datadrivendiscovery/d3m/issues/42)
* Merged `d3m_metadata`, `primitive_interfaces` and `d3m` repositories
into `d3m` repository. This requires the following changes of
imports in existing code:
* `d3m_metadata` to `d3m.metadata`
* `primitive_interfaces` to `d3m.primitive_interfaces`
* `d3m_metadata.container` to `d3m.container`
* `d3m_metadata.metadata` to `d3m.metadata.base`
* `d3m_metadata.metadata.utils` to `d3m.utils`
* `d3m_metadata.metadata.types` to `d3m.types`
**Backwards incompatible.**
[11](https://gitlab.com/datadrivendiscovery/d3m/issues/11)
* Fixed computation of sampled values for `LogUniform` hyper-parameter class.
[47](https://gitlab.com/datadrivendiscovery/d3m/issues/47)
* When copying or slicing container values, metadata is now copied over
instead of cleared. This makes it easier to propagate metadata.
This also means one should make sure to update the metadata in the
new container value to reflect changes to the value itself.
**Could be backwards incompatible.**
* `DataMetadata` now has `set_for_value` method to make a copy of
metadata and set new `for_value` value. You can use this when you
made a new value and you want to copy over metadata, but you also
want this value to be associated with metadata. This is done by
default for container values.
* Metadata now includes SHA256 digest for primitives and datasets.
It is computed automatically during loading. This should allow one to
track exact version of primitive and datasets used.
`d3m.container.dataset.get_d3m_dataset_digest` is a reference
implementation of computing digest for D3M datasets.
You can set `compute_digest` to `False` to disable this.
You can set `strict_digest` to `True` to raise an exception instead
of a warning if computed digest does not match one in metadata.
* Datasets can be now loaded in "lazy" mode: only metadata is loaded
when creating a `Dataset` object. You can use `is_lazy` method to
check if dataset iz lazy and data has not yet been loaded. You can use
`load_lazy` to load data for a lazy object, making it non-lazy.
* There is now an utility metaclass `d3m.metadata.utils.AbstractMetaclass`
which makes classes which use it automatically inherit docstrings
for methods from the parent. Primitive base class and some other D3M
classes are now using it.
* `d3m.metadata.base.CONTAINER_SCHEMA_VERSION` and
`d3m.metadata.base.DATA_SCHEMA_VERSION` were fixed to point to the
correct URI.
* Many `data_metafeatures` properties in metadata schema had type
`numeric` which does not exist in JSON schema. They were fixed to
`number`.
* Added to a list of known semantic types:
`https://metadata.datadrivendiscovery.org/types/Target`,
`https://metadata.datadrivendiscovery.org/types/PredictedTarget`,
`https://metadata.datadrivendiscovery.org/types/TrueTarget`,
`https://metadata.datadrivendiscovery.org/types/Score`,
`https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint`,
`https://metadata.datadrivendiscovery.org/types/SuggestedPrivilegedData`,
`https://metadata.datadrivendiscovery.org/types/PrivilegedData`.
* Added to `algorithm_types`: `ARRAY_CONCATENATION`, `ARRAY_SLICING`,
`ROBUST_PRINCIPAL_COMPONENT_ANALYSIS`, `SUBSPACE_CLUSTERING`,
`SPECTRAL_CLUSTERING`, `RELATIONAL_ALGEBRA`, `MULTICLASS_CLASSIFICATION`,
`MULTILABEL_CLASSIFICATION`, `OVERLAPPING_CLUSTERING`, `SOFT_CLUSTERING`,
`STRICT_PARTITIONING_CLUSTERING`, `STRICT_PARTITIONING_CLUSTERING_WITH_OUTLIERS`,
`UNIVARIATE_REGRESSION`, `NONOVERLAPPING_COMMUNITY_DETECTION`,
`OVERLAPPING_COMMUNITY_DETECTION`.