Tamu-d3m

Latest version: v2022.5.23

Safety actively analyzes 702026 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

2019.4.4

* With this release metadata is not automatically generated anymore when DataFrame or ndarray
is being wrapped into a corresponding container type. Now you have to explicitly set
`generate_metadata` constructor argument to `True` or call `generate` method on metadata
object afterwards.
This has been changed to improve performance of many primitives and operations on
container types which were slowed down because of unnecessary and unexpected
generation of metadata.
This change requires manual inspection of primitive's code to determine what change
is necessary. Some suggestions what to look for:
* `set_for_value` method has been deprecated: generally it can be replaced with `generate`
call, or even removed in some cases:
* `value.metadata = value.metadata.set_for_value(value, generate_metadata=False)` remove.
* `value.metadata = new_metadata.set_for_value(value, generate_metadata=False)` replace with `value.metadata = new_metadata`.
* `value.metadata = new_metadata.set_for_value(value, generate_metadata=True)` replace with `value.metadata = new_metadata.generate(value)`.
* `clear` method has been deprecated: generally you can now instead simply create
a fresh instance of `DataMetadata`, potentially calling `generate` as well:
* `outputs_metadata = inputs_metadata.clear(new_metadata, for_value=outputs, generate_metadata=True)` replace with
`outputs_metadata = metadata_base.DataMetadata(metadata).generate(outputs)`.
* `outputs_metadata = inputs_metadata.clear(for_value=outputs, generate_metadata=False)` replace with
`outputs_metadata = metadata_base.DataMetadata()`.
* Search for all calls to constructors of `container.List`, `container.ndarray`,
`container.Dataset`, `container.DataFrame` container types and explicitly set
`generate_metadata` to `True`. Alternatively, you can also manually update
metadata instead of relying on automatic metadata generation.
* The main idea is that if you are using automatic metadata generation in your primitive,
make sure you generate it only once, just before you return container type from
your primitive. Of course, if you call code which expects metadata from inside your primitive,
you might have to assure or generate metadata before calling that code as well.

Enhancements

* Primitives now get a `temporary_directory` constructor argument pointing
to a directory they can use to store any files for the duration of current pipeline
run phase. The main intent of this temporary directory is to store files referenced
by any ``Dataset`` object your primitive might create and followup primitives in
the pipeline should have access to. To support configuration of the location of these
temporary directories, the reference runtime now has a `--scratch` command line argument
and corresponding `scratch_dir` constructor argument.
[306](https://gitlab.com/datadrivendiscovery/d3m/issues/306)
[!190](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/190)
* Made sure that number of inputs provided to the runtime has to match the number of inputs a pipeline accepts.
[301](https://gitlab.com/datadrivendiscovery/d3m/issues/301)
[!183](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/183)
* Supported MIT-LL dataset and problem schemas version 3.3.0. Now all suggested targets and suggested privileged data
columns are now by default also attributes. Runtime makes sure that if any column is marked as problem description's
target it is not marked as an attribute anymore.
[291](https://gitlab.com/datadrivendiscovery/d3m/issues/291)
[!182](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/182)
**Backwards incompatible.**
* `steps` and `method_calls` made optional in pipeline run schema to allow easier recording of failed pipelines.
[!167](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/167)
* Pipeline run now records also start and end timestamps of pipelines and steps.
[258](https://gitlab.com/datadrivendiscovery/d3m/issues/258)
[!162](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/162)
* `Metadata` has two new methods to query metadata, `query_field` and `query_field_with_exceptions`
which you can use when you want to query just a field of metadata, and not whole metadata object.
Similarly, `DataMetadata` has a new method `query_column_field`.
* `DataMetadata`'s `generate` method has now `compact` argument to control it automatically
generated metadata is compacted (if all elements of a dimension have equal metadata, it is
compacted into `ALL_ELEMENTS` selector segment) or not (default).
There is also a `compact` method available on `Metadata` to compact metadata on demand.
* Automatically generated metadata is not automatically compacted anymore by default
(compacting is when all elements of a dimension have equal metadata, moving that
metadata `ALL_ELEMENTS` selector segment).
* `generate_metadata` argument of container types' constructors has been switched
from default `True` to default `False` to prevent unnecessary and unexpected
generation of metadata, slowing down execution of primitives. Moreover,
`DataMetadata` has now a method `generate` which can be used to explicitly
generate and update metadata given a data value.
Metadata methods `set_for_value` and `clear` have been deprecated and can
be generally replaced with `generate` call, or creating a new metadata
object, or removing the call.
**Backwards incompatible.**
[143](https://gitlab.com/datadrivendiscovery/d3m/issues/143)
[!180](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/180)
* Loading of datasets with many files has been heavily optimized.
[164](https://gitlab.com/datadrivendiscovery/d3m/issues/164)
[!136](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/136)
[!178](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/178)
[!](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/179)
* Extended container's `DataFrame.to_csv` method to use by default
metadata column names for CSV header instead of column names of
`DataFrame` itself.
[!158](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/158).
* Problem parsing has been refactored into extendable system similar to how
dataset parsing is done. A simple `d3m.metadata.problem.Problem` class has
been defined to contain a problem description. Default implementation supports
loading of D3M problems. `--problem` command line argument to reference runtime
can now be a path or URI to a problem description.
[276](https://gitlab.com/datadrivendiscovery/d3m/issues/276)
[!145](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/145)
* Data metadata is not validated anymore at every update, but only when explicitly
validated using the `check` method. This improves metadata performance.
[!144](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/144)

Other

* Top-level runtime functions now also return `Result` (or new `MultiResult`)
objects instead of raising special `PipelineRunError` exception (which has been
removed) and instead of returning just pipeline run (which is available
inside `Result`).
[297](https://gitlab.com/datadrivendiscovery/d3m/issues/297)
[!192](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/192)
**Backwards incompatible.**
* Metrics have been reimplemented to operate on whole predictions DataFrame.
[304](https://gitlab.com/datadrivendiscovery/d3m/issues/304)
[311](https://gitlab.com/datadrivendiscovery/d3m/issues/311)
[!171](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/171)
**Backwards incompatible.**
* Pipeline run implementation has been refactored to be in a single class to
facilitate easier subclassing.
[255](https://gitlab.com/datadrivendiscovery/d3m/issues/255)
[305](https://gitlab.com/datadrivendiscovery/d3m/issues/305)
[!164](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/164)
* Added new semantic types:
* `https://metadata.datadrivendiscovery.org/types/PrimaryMultiKey`
* `https://metadata.datadrivendiscovery.org/types/BoundingPolygon`
* `https://metadata.datadrivendiscovery.org/types/UnknownType`
* Removed semantic types:
* `https://metadata.datadrivendiscovery.org/types/BoundingBox`
* `https://metadata.datadrivendiscovery.org/types/BoundingBoxXMin`
* `https://metadata.datadrivendiscovery.org/types/BoundingBoxYMin`
* `https://metadata.datadrivendiscovery.org/types/BoundingBoxXMax`
* `https://metadata.datadrivendiscovery.org/types/BoundingBoxYMax`

**Backwards incompatible.**
* Added to `primitive_family`:
* `SEMISUPERVISED_CLASSIFICATION`
* `SEMISUPERVISED_REGRESSION`
* `VERTEX_CLASSIFICATION`
* Added to `task_type`:
* `SEMISUPERVISED_CLASSIFICATION`
* `SEMISUPERVISED_REGRESSION`
* `VERTEX_CLASSIFICATION`
* Added to `performance_metric`:
* `HAMMING_LOSS`
* Removed from `performance_metric`:
* `ROOT_MEAN_SQUARED_ERROR_AVG`

**Backwards incompatible.**
* Added `https://metadata.datadrivendiscovery.org/types/GPUResourcesUseParameter` and
`https://metadata.datadrivendiscovery.org/types/CPUResourcesUseParameter` semantic types for
primitive hyper-parameters which control the use of GPUs and CPUs (cores), respectively.
You can use these semantic types to mark which hyper-parameter defines a range of how many
GPUs or CPUs (cores), respectively, a primitive can and should use.
[39](https://gitlab.com/datadrivendiscovery/d3m/issues/39)
[!177](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/177)
* Added `get_hyperparams` and `get_volumes` helper methods to `PrimitiveMetadata`
so that it is easier to obtain hyper-parameters definitions class of a primitive.
[163](https://gitlab.com/datadrivendiscovery/d3m/issues/163)
[!175](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/175)
* Pipeline run schema now records the global seed used by the runtime to run the pipeline.
[!187](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/187)
* Core package scores output now includes also a random seed column.
[299](https://gitlab.com/datadrivendiscovery/d3m/issues/299)
[!185](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/185)
* Metrics in core packages now take as input whole predictions DataFrame
objects and compute scores over them. So `applicability_to_targets` metric
method has been removed, and also code which handles the list of target
columns metric used to compute the score. This is not needed anymore
because now all columns are always used by all metrics. Moreover,
corresponding `dataset_id` and `targets` fields have been removed from
pipeline run schema.
* Core package now requires pip 19 or later to be installed.
`--process-dependency-links` argument when installing the package is not needed
nor supported anymore.
Primitives should not require use of `--process-dependency-links` to install
them either. Instead use link dependencies as described in
[PEP 508](https://www.python.org/dev/peps/pep-0508/).
[285](https://gitlab.com/datadrivendiscovery/d3m/issues/285)
[!176](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/176)
**Backwards incompatible.**
* `outputs` field in parsed problem description has been removed.
[290](https://gitlab.com/datadrivendiscovery/d3m/issues/290)
[!174](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/174)
**Backwards incompatible.**
* `Hyperparameter`'s `value_to_json` and `value_from_json` methods have been
renamed to `value_to_json_structure` and `value_from_json_structure`, respectively.
[122](https://gitlab.com/datadrivendiscovery/d3m/issues/122)
[173](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/173)
* Moved utility functions from common primitives package to core package:

* `copy_metadata` to `Metadata.copy_to` method
* `select_columns` to `DataFrame.select_columns` method
* `select_columns_metadata` to `DataMetadata.select_columns` method
* `list_columns_with_semantic_types` to `DataMetadata.list_columns_with_semantic_types` method
* `list_columns_with_structural_types` to `DataMetadata.list_columns_with_structural_types` method
* `remove_columns` to `DataFrame.remove_columns` method
* `remove_columns_metadata` to `DataMetadata.remove_columns` method
* `append_columns` to `DataFrame.append_columns` method
* `append_columns_metadata` to `DataMetadata.append_columns` method
* `insert_columns` to `DataFrame.insert_columns` method
* `insert_columns_metadata` to `DataMetadata.insert_columns` method
* `replace_columns` to `DataFrame.replace_columns` method
* `replace_columns_metadata` to `DataMetadata.replace_columns` method
* `get_index_columns` to `DataMetadata.get_index_columns` method
* `horizontal_concat` to `DataFrame.horizontal_concat` method
* `horizontal_concat_metadata` to `DataMetadata.horizontal_concat` method
* `get_columns_to_use` to `d3m.base.utils.get_columns_to_use` function
* `combine_columns` to `d3m.base.utils.combine_columns` function
* `combine_columns_metadata` to `d3m.base.utils.combine_columns_metadata` function
* `set_table_metadata` to `DataMetadata.set_table_metadata` method
* `get_column_index_from_column_name` to `DataMetadata.get_column_index_from_column_name` method
* `build_relation_graph` to `Dataset.get_relations_graph` method
* `get_tabular_resource` to `d3m.base.utils.get_tabular_resource` function
* `get_tabular_resource_metadata` to `d3m.base.utils.get_tabular_resource_metadata` function
* `cut_dataset` to `Dataset.select_rows` method

[148](https://gitlab.com/datadrivendiscovery/d3m/issues/148)
[!172](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/172)

* Updated core dependencies. Some important packages are now at versions:
* `pyarrow`: 0.12.1

[!156](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/156)

2019.2.18

Bugfixes

* JSON schema for problem descriptions has been fixed to allow loading
D3M problem descriptions with data augmentation fields.
[284](https://gitlab.com/datadrivendiscovery/d3m/issues/284)
[!154](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/154)
* Utils now contains representers to encode numpy float and integer numbers
for YAML. Importing `utils` registers them.
[275](https://gitlab.com/datadrivendiscovery/d3m/issues/275)
[!148](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/148)
* Made sure all JSON files are read with UTF-8 encoding, so that we do
not depend on the encoding of the environment.
[!150](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/150)
[!153](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/153)

2019.2.12

Enhancements

* Runtime now makes sure that target columns are never marked as attributes.
[265](https://gitlab.com/datadrivendiscovery/d3m/issues/265)
[!131](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/131)
* When using runtime CLI, pipeline run output is made even in the case of an
exception. Moreover, exception thrown from `Result.check_success` contains
associated pipeline runs in its `pipeline_runs` attribute.
[245](https://gitlab.com/datadrivendiscovery/d3m/issues/245)
[!120](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/120)
* Made additional relaxations when reading D3M datasets and problem descriptions
to not require required fields which have defaults.
[!128](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/128)
* When loading D3M datasets and problem descriptions, package now just warns
if they have an unsupported schema version and continues to load them.
[247](https://gitlab.com/datadrivendiscovery/d3m/issues/247)
[!119](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/119)
* Added to `primitive_family`:

* `NATURAL_LANGUAGE_PROCESSING`

[!125](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/125)

Bugfixes

* Fixed an unexpected exception when running a pipeline using reference
runtime but not requesting to return output values.
[260](https://gitlab.com/datadrivendiscovery/d3m/issues/260)
[!127](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/127)
* Fixed infinite recursion loop which happened if Python logging was
configured inside primitive's method call. Moreover, recording of
logging records for pipeline run changed so that it does not modify
the record itself while recording it.
[250](https://gitlab.com/datadrivendiscovery/d3m/issues/250)
[123](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/123)
* Correctly populate `volumes` primitive constructor argument.
Before it was not really possible to use primitive static files with
reference runtime.
[!132](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/132)
* Fixed runtime/pipeline run configuration through environment variables.
Now it reads them without throwing an exception.
[274](https://gitlab.com/datadrivendiscovery/d3m/issues/274)
[!118](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/118)
[!137](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/137)

2019.1.21

* Some enumeration classes were moved and renamed:
* `d3m.metadata.pipeline.ArgumentType` to `d3m.metadata.base.ArgumentType`
* `d3m.metadata.pipeline.PipelineContext` to `d3m.metadata.base.Context`
* `d3m.metadata.pipeline.PipelineStep` to `d3m.metadata.base.PipelineStepType`

**Backwards incompatible.**

* Added `pipeline_run.json` JSON schema which describes the results of running a
pipeline as described by the `pipeline.json` JSON schema. Also implemented
a reference pipeline run output for reference runtime.
[165](https://gitlab.com/datadrivendiscovery/d3m/issues/165)
[!59](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/59)
* When computing primitive digests, primitive's ID is included in the
hash so that digest is not the same for all primitives from the same
package.
[154](https://gitlab.com/datadrivendiscovery/d3m/issues/154)
* When datasets are loaded, digest of their metadata and data can be
computed. To control when this is done, `compute_digest` argument
to `Dataset.load` can now take the following `ComputeDigest`
enumeration values: `ALWAYS`, `ONLY_IF_MISSING` (default), and `NEVER`.
[!75](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/75)
* Added `digest` field to pipeline descriptions. Digest is computed based
on the pipeline document and it helps differentiate between pipelines
with same `id`. When loading a pipeline, if there
is a digest mismatch a warning is issued. You can use
`strict_digest` argument to request an exception instead.
[190](https://gitlab.com/datadrivendiscovery/d3m/issues/190)
[!75](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/75)
* Added `digest` field to problem description metadata.
This `digest` field is computed based on the problem description document
and it helps differentiate between problem descriptions with same `id`.
[190](https://gitlab.com/datadrivendiscovery/d3m/issues/190)
[!75](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/75)
* Moved `id`, `version`, `name`, `other_names`, and `description` fields
in problem schema to top-level of the problem description. Moreover, made
`id` required. This aligns it more with the structure of other descriptions we have.
[!75](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/75)
**Backwards incompatible.**
* Pipelines can now provide multiple inputs to the same primitive argument.
In such case runtime wraps those inputs into a `List` container type, and then
passes the list to the primitive.
[200](https://gitlab.com/datadrivendiscovery/d3m/issues/200)
[!112](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/112)
* Primitives now have a method `fit_multi_produce` which primitive author can
override to implement an optimized version of both fitting and producing a primitive on same data.
The default implementation just calls `set_training_data`, `fit` and produce methods.
If your primitive has non-standard additional arguments in its `produce` method(s) then you
will have to implement `fit_multi_produce` method to accept those additional arguments
as well, similarly to how you have had to do for `multi_produce`.
[117](https://gitlab.com/datadrivendiscovery/d3m/issues/117)
[!110](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/110)
**Could be backwards incompatible.**
* `source`, `timestamp`, and `check` arguments to all metadata functions and container types'
constructors have been deprecated. You do not have to and should not be providing them anymore.
[171](https://gitlab.com/datadrivendiscovery/d3m/issues/171)
[172](https://gitlab.com/datadrivendiscovery/d3m/issues/172)
[173](https://gitlab.com/datadrivendiscovery/d3m/issues/173)
[!108](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/108)
[!109](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/109)
* Primitive's constructor is not run anymore during importing of primitive's class
which allows one to use constructor to load things and do any resource
allocation/reservation. Constructor is now the preferred place to do so.
[158](https://gitlab.com/datadrivendiscovery/d3m/issues/158)
[!107](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/107)
* `foreign_key` metadata has been extended with `RESOURCE` type which allows
referencing another resource in the same dataset.
[221](https://gitlab.com/datadrivendiscovery/d3m/issues/221)
[!105](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/105)
* Updated supported D3M dataset and problem schema both to version 3.2.0.
Problem description parsing supports data augmentation metadata.
A new approach for LUPI datasets and problems is now supported,
including runtime support.
Moreover, if dataset's resource name is `learningData`, it is marked as a
dataset entry point.
[229](https://gitlab.com/datadrivendiscovery/d3m/issues/229)
[225](https://gitlab.com/datadrivendiscovery/d3m/issues/225)
[226](https://gitlab.com/datadrivendiscovery/d3m/issues/226)
[!97](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/97)
* Added support for "raw" datasets.
[217](https://gitlab.com/datadrivendiscovery/d3m/issues/217)
[!94](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/94)
* A warning is issued if a primitive does not provide a description through
its docstring.
[167](https://gitlab.com/datadrivendiscovery/d3m/issues/167)
[!101](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/101)
* A warning is now issued if an installable primitive is lacking contact or bug
tracker URI metadata.
[178](https://gitlab.com/datadrivendiscovery/d3m/issues/178)
[!81](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/81)
* `Pipeline` class now has also `equals` and `hash` methods which can help
determining if two pipelines are equal in the sense of isomorphism.
[!53](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/53)
* `Pipeline` and pipeline steps classes now has `get_all_hyperparams`
method to return all hyper-parameters defined for a pipeline and steps.
[222](https://gitlab.com/datadrivendiscovery/d3m/issues/222)
[!104](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/104)
* Implemented a check for primitive Python paths to assure that they adhere
to the new standard of all of them having to be in the form `d3m.primitives.primitive_family.primitive_name.kind`
(e.g., `d3m.primitives.classification.random_forest.SKLearn`).
Currently there is a warning if a primitive has a different Python path,
and after January 2019 it will be an error.
For `primitive_name` segment there is a [`primitive_names.py`](./d3m/metadata/primitive_names.py)
file containing a list of all allowed primitive names.
Everyone is encouraged to help currate this list and suggest improvements (merging, removals, additions)
of values in that list. Initial version was mostly automatically made from an existing list of
values used by current primitives.
[3](https://gitlab.com/datadrivendiscovery/d3m/issues/3)
[!67](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/67)
* Added to semantic types:
* `https://metadata.datadrivendiscovery.org/types/TokenizableIntoNumericAndAlphaTokens`
* `https://metadata.datadrivendiscovery.org/types/TokenizableByPunctuation`
* `https://metadata.datadrivendiscovery.org/types/AmericanPhoneNumber`
* `https://metadata.datadrivendiscovery.org/types/UnspecifiedStructure`
* `http://schema.org/email`
* `http://schema.org/URL`
* `http://schema.org/address`
* `http://schema.org/State`
* `http://schema.org/City`
* `http://schema.org/Country`
* `http://schema.org/addressCountry`
* `http://schema.org/postalCode`
* `http://schema.org/latitude`
* `http://schema.org/longitude`

[!62](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/62)
[!95](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/95)
[!94](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/94)

* Updated core dependencies. Some important packages are now at versions:
* `scikit-learn`: 0.20.2
* `numpy`: 1.15.4
* `pandas`: 0.23.4
* `networkx`: 2.2
* `pyarrow`: 0.11.1

[106](https://gitlab.com/datadrivendiscovery/d3m/issues/106)
[175](https://gitlab.com/datadrivendiscovery/d3m/issues/175)

* Added to `algorithm_types`:
* `IDENTITY_FUNCTION`
* `DATA_SPLITTING`
* `BREADTH_FIRST_SEARCH`
* Moved a major part of README to Sphinx documentation which is built
and available at [http://docs.datadrivendiscovery.org/](http://docs.datadrivendiscovery.org/).
* Added a `produce_methods` argument to `Primitive` hyper-parameter class
which allows one to limit matching primitives only to those providing all
of the listed produce methods.
[124](https://gitlab.com/datadrivendiscovery/d3m/issues/124)
[!56](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/56)
* Fixed `sample_multiple` method of the `Hyperparameter` class.
[157](https://gitlab.com/datadrivendiscovery/d3m/issues/157)
[!50](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/50)
* Fixed pickling of `Choice` hyper-parameter.
[!49](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/49)
[!51](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/51)
* Added `Constant` hyper-parameter class.
[186](https://gitlab.com/datadrivendiscovery/d3m/issues/186)
[!90](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/90)
* Added `count` to aggregate values in metafeatures.
[!52](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/52)
* Clarified and generalized some metafeatures, mostly renamed so that it can be
used on attributes as well:
* `number_of_classes` to `number_distinct_values`
* `class_entropy` to `entropy_of_values`
* `majority_class_ratio` to `value_probabilities_aggregate.max`
* `minority_class_ratio` to `value_probabilities_aggregate.min`
* `majority_class_size` to `value_counts_aggregate.max`
* `minority_class_size` to `value_counts_aggregate.min`
* `class_probabilities` to `value_probabilities_aggregate`
* `target_values` to `values_aggregate`
* `means_of_attributes` to `mean_of_attributes`
* `standard_deviations_of_attributes` to `standard_deviation_of_attributes`
* `categorical_joint_entropy` to `joint_entropy_of_categorical_attributes`
* `numeric_joint_entropy` to `joint_entropy_of_numeric_attributes`
* `pearson_correlation_of_attributes` to `pearson_correlation_of_numeric_attributes`
* `spearman_correlation_of_attributes` to `spearman_correlation_of_numeric_attributes`
* `canonical_correlation` to `canonical_correlation_of_numeric_attributes`

[!52](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/52)

* Added metafeatures:
* `default_accuracy`
* `oner`
* `jrip`
* `naive_bayes_tree`
* `number_of_string_attributes`
* `ratio_of_string_attributes`
* `number_of_other_attributes`
* `ratio_of_other_attributes`
* `attribute_counts_by_structural_type`
* `attribute_ratios_by_structural_type`
* `attribute_counts_by_semantic_type`
* `attribute_ratios_by_semantic_type`
* `value_counts_aggregate`
* `number_distinct_values_of_discrete_attributes`
* `entropy_of_discrete_attributes`
* `joint_entropy_of_discrete_attributes`
* `joint_entropy_of_attributes`
* `mutual_information_of_discrete_attributes`
* `equivalent_number_of_discrete_attributes`
* `discrete_noise_to_signal_ratio`

[!21](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/21)
[!52](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/52)

* Added special handling when reading scoring D3M datasets (those with true targets in a separate
file `targets.csv`). When such dataset is detected, the values from the separate file are now
merged into the dataset, and its ID is changed to finish with `SCORE` suffix. Similarly, an
ID of a scoring problem description gets its suffix changed to `SCORE`.
[176](https://gitlab.com/datadrivendiscovery/d3m/issues/176)
* Organized semantic types and add to some of them parent semantic types to organize/structure
them better. New parent semantic types added: `https://metadata.datadrivendiscovery.org/types/ColumnRole`,
`https://metadata.datadrivendiscovery.org/types/DimensionType`, `https://metadata.datadrivendiscovery.org/types/HyperParameter`.
* Fixed that `dateTime` column type is mapped to `http://schema.org/DateTime` semantic
type and not `https://metadata.datadrivendiscovery.org/types/Time`.
**Backwards incompatible.**
* Updated generated [site for metadata](https://metadata.datadrivendiscovery.org/) and
generate sites describing semantic types.
[33](https://gitlab.com/datadrivendiscovery/d3m/issues/33)
[114](https://gitlab.com/datadrivendiscovery/d3m/issues/114)
[!37](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/37)
* Optimized resolving of primitives in `Resolver` to not require loading of
all primitives when loading a pipeline, in the common case.
[162](https://gitlab.com/datadrivendiscovery/d3m/issues/162)
[!38](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/38)
* Added `NotFoundError`, `AlreadyExistsError`, and `PermissionDeniedError`
exceptions to `d3m.exceptions`.
* `Pipeline`'s `to_json_structure`, `to_json`, and `to_yaml` now have `nest_subpipelines`
argument which allows conversion with nested sub-pipelines instead of them
being only referenced.
* Made sure that Arrow serialization of metadata does not pickle also linked
values (`for_value`).
* Made sure enumerations are picklable.
* `PerformanceMetric` class now has `best_value` and `worst_value` which
return the range of possible values for the metric. Moreover, `normalize`
method normalizes the metric's value to a range between 0 and 1.
* Load D3M dataset qualities only after data is loaded. This fixes
lazy loading of datasets with qualities which was broken before.
* Added `load_all_primitives` argument to the default pipeline `Resolver`
which allows one to control loading of primitives outside of the resolver.
* Added `primitives_blacklist` argument to the default pipeline `Resolver`
which allows one to specify a collection of primitive path prefixes to not
(try to) load.
* Fixed return value of the `fit` method in `TransformerPrimitiveBase`.
It now correctly returns `CallResult` instead of `None`.
* Fixed a typo and renamed `get_primitive_hyparparams` to `get_primitive_hyperparams`
in `PrimitiveStep`.
**Backwards incompatible.**
* Additional methods were added to the `Pipeline` class and step classes,
to support runtime and easier manipulation of pipelines programmatically
(`get_free_hyperparams`, `get_input_data_references`, `has_placeholder`,
`replace_step`, `get_exposable_outputs`).
* Added reference implementation of the runtime. It is available
in the `d3m.runtime` module. This module also has an extensive
command line interface you can access through `python3 -m d3m.runtime`.
[115](https://gitlab.com/datadrivendiscovery/d3m/issues/115)
[!57](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/57)
[!72](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/72)
* `GeneratorPrimitiveBase` interface has been changed so that `produce` method
accepts a list of non-negative integers as an input instead of a list of `None` values.
This allows for batching and control by the caller which outputs to generate.
Previously outputs would depend on number of calls to `produce` and number of outputs
requested in each call. Now these integers serve as an index into the set of potential
outputs.
**Backwards incompatible.**
* We now try to preserve metadata log in default implementation of `can_accept`.
* Added `sample_rate` field to `dimension` metadata.
* `python3 -m d3m.index download` command now accepts `--prefix` argument to limit the
primitives for which static files are downloaded. Useful for testing.
* Added `check` argument to `DataMetadata`'s `update` and `remove` methods which allows
one to control if selector check against `for_value` should be done or not. When
it is known that selector is valid, not doing the check can speed up those methods.
* Defined metadata field `file_columns` which allows to store known columns metadata for
tables referenced from columns. This is now used by a D3M dataset reader to store known
columns metadata for collections of CSV files. Previously, this metadata was lost despite
being available in Lincoln Labs dataset metadata.

2018.7.10

* Made sure that `OBJECT_DETECTION_AVERAGE_PRECISION` metric supports operation on
vectorized target column.
[149](https://gitlab.com/datadrivendiscovery/d3m/issues/149)
* Files in D3M dataset collections are now listed recursively to support datasets
with files split into directories.
[146](https://gitlab.com/datadrivendiscovery/d3m/issues/146)
* When parameter value for `Params` fails to type check, a name of the parameter is now
reported as well.
[135](https://gitlab.com/datadrivendiscovery/d3m/issues/135)
* `python3 -m d3m.index` has now additional command `download` which downloads all static
files needed by available primitives. Those files are then exposed through `volumes`
constructor argument to primitives by TA2/runtime. Files are stored into an output
directory in a standard way where each volume is stored with file or directory name
based on its digest.
[102](https://gitlab.com/datadrivendiscovery/d3m/issues/102)
* Fixed standard return type of `log_likelihoods`, `log_likelihood`, `losses`, and `loss`
primitive methods to support multi-target primitives.
* Clarified that `can_accept` receives primitive arguments and not just method arguments.
* Added `https://metadata.datadrivendiscovery.org/types/FilesCollection` for resources which are
file collections. Also moved the main semantic type of file collection's values to the column.
* Fixed conversion of a simple list to a DataFrame.
* Added `https://metadata.datadrivendiscovery.org/types/Confidence` semantic type for columns
representing confidence and `confidence_for` metadata which can help confidence column refer
to the target column for which it is confidence for.
* Fixed default `can_accept` implementation to return type unwrapped from `CallResult`.
* Fixed `DataMetadata.remove` to preserve `for_value` value (and allow it to be set through the call).
* Fixed a case where automatically generated metadata overrode explicitly set existing metadata.
[!25](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/25)
* Fixed `query_with_exceptions` metadata method to correctly return exceptions for
deeper selectors.
* Added to `primitive_family`:
* `SCHEMA_DISCOVERY`
* `DATA_AUGMENTATION`
* Added to `algorithm_types`:
* `HEURISTIC`
* `MARKOV_RANDOM_FIELD`
* `LEARNING_USING_PRIVILEGED_INFORMATION`
* `APPROXIMATE_DATA_AUGMENTATION`
* Added `PrimitiveNotFittedError`, `DimensionalityMismatchError`, and `MissingValueError`
exceptions to `d3m.exceptions`.
[!22](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/22)
* Fixed setting semantic types for boundary columns.
[126](https://gitlab.com/datadrivendiscovery/d3m/issues/126) [!23](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/23)
* Added `video/avi` media type to lists of known media types.
* Fixed a type check which prevented an additional primitive argument to be of `Union` type.
* Fixed erroneous removal of empty dicts (`{}`) from metadata when empty dicts were
explicitly stored in metadata.
[118](https://gitlab.com/datadrivendiscovery/d3m/issues/118)
* Made sure that conflicting entry points are resolved in a deterministic way.
* Made sure primitive metadata's `python_path` matches the path under which
a primitive is registered under `d3m.primitives`. This also prevents
a primitive to be registered twice at different paths in the namespace.
[4](https://gitlab.com/datadrivendiscovery/d3m/issues/4)
* Fixed a bug which prevented registration of primitives at deeper levels
(e.g., `d3m.primitives.<name1>.<name2>.<primitive>`).
[121](https://gitlab.com/datadrivendiscovery/d3m/issues/121)

2018.6.5

* `Metadata` class got additional methods to manipulate metadata:
* `remove(selector)` removes metadata at `selector`.
* `query_with_exceptions(selector)` to return metadata for selectors which
have metadata which differs from that of `ALL_ELEMENTS`.
* `add_semantic_type`, `has_semantic_type`, `remove_semantic_type`,
`get_elements_with_semantic_type` to help with semantic types.
* `query_column`, `update_column`, `remove_column`, `get_columns_with_semantic_type`
to make it easier to work with tabular data.

[55](https://gitlab.com/datadrivendiscovery/d3m/issues/55)
[78](https://gitlab.com/datadrivendiscovery/d3m/issues/78)

* Container `List` now inherits from a regular Python `list` and not from `typing.List`.
It does not have anymore a type variable. Typing information is stored in `metadata`
anyway (`structural_type`). This simplifies type checking (and improves performance)
and fixes pickling issues.
**Backwards incompatible.**
* `Hyperparams` class' `defaults` method now accepts optional `path` argument which
allows one to fetch defaults from nested hyper-parameters.
* `Hyperparameters` class and its subclasses now have `get_default` method instead
of a property `default`.
**Backwards incompatible.**
* `Hyperparams` class got a new method `replace` which makes it easier to modify
hyper-parameter values.
* `Set` hyper-parameter can now accept also a hyper-parameters configuration as elements
which allows one to define a set of multiple hyper-parameters per each set element.
[94](https://gitlab.com/datadrivendiscovery/d3m/issues/94)
* Pipeline's `check` method now checks structural types of inputs and outputs and assures
they match.
[!19](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/19)
* `Set` hyper-parameter now uses tuple of unique elements instead of set to represent the set.
This assures that the order of elements is preserved to help with reproducibility when
iterating over a set.
**Backwards incompatible.**
[109](https://gitlab.com/datadrivendiscovery/d3m/issues/109)
* `Set` hyper-parameter can now be defined without `max_samples` argument to allow a set
without an upper limit on the number of elements.
`min_samples` and `max_samples` arguments to `Set` constructor have been switched as
a consequence, to have a more intuitive order.
Similar changes have been done to `sample_multiple` method of hyper-parameters.
**Backwards incompatible.**
[110](https://gitlab.com/datadrivendiscovery/d3m/issues/110)
* Core dependencies have been upgraded: `numpy==1.14.3`. `pytypes` is now a released version.
* When converting a numpy array with more than 2 dimensions to a DataFrame, higher dimensions are
automatically converted to nested numpy arrays inside a DataFrame.
[80](https://gitlab.com/datadrivendiscovery/d3m/issues/80)
* Metadata is now automatically preserved when converting between container types.
[76](https://gitlab.com/datadrivendiscovery/d3m/issues/76)
* Basic metadata for data values is now automatically generated when using D3M container types.
Value is traversed over its structure and `structural_type` and `dimension` with its `length`
keys are populated. Some `semantic_types` are added in simple cases, and `dimension`'s
`name` as well. In some cases analysis of all data to generate metadata can take time,
so you might consider disabling automatic generation by setting `generate_metadata`
to `False` in container's constructor or `set_for_value` calls and then manually populating
necessary metadata.
[35](https://gitlab.com/datadrivendiscovery/d3m/issues/35)
[!6](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/6)
[!11](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/11)
* When reading D3M datasets, `media_types` metadata now includes proper media types
for the column, and also media type for each particular row (file).
* D3M dataset and problem description parsing has been updated to 3.1.2 version:
* `Dataset` class now supports loading `edgeList` resources.
* `primitive_family` now includes `OBJECT_DETECTION`.
* `task_type` now includes `OBJECT_DETECTION`.
* `performance_metrics` now includes `PRECISION`, `RECALL`, `OBJECT_DETECTION_AVERAGE_PRECISION`.
* `targets` of a problem description now includes `clusters_number`.
* New metadata `boundary_for` can now describe for which other column
a column is a boundary for.
* Support for `realVector`, `json` and `geojson` column types.
* Support for `boundingBox` column role.
* New semantic types:
* `https://metadata.datadrivendiscovery.org/types/EdgeList`
* `https://metadata.datadrivendiscovery.org/types/FloatVector`
* `https://metadata.datadrivendiscovery.org/types/JSON`
* `https://metadata.datadrivendiscovery.org/types/GeoJSON`
* `https://metadata.datadrivendiscovery.org/types/Interval`
* `https://metadata.datadrivendiscovery.org/types/IntervalStart`
* `https://metadata.datadrivendiscovery.org/types/IntervalEnd`
* `https://metadata.datadrivendiscovery.org/types/BoundingBox`
* `https://metadata.datadrivendiscovery.org/types/BoundingBoxXMin`
* `https://metadata.datadrivendiscovery.org/types/BoundingBoxYMin`
* `https://metadata.datadrivendiscovery.org/types/BoundingBoxXMax`
* `https://metadata.datadrivendiscovery.org/types/BoundingBoxYMax`

[99](https://gitlab.com/datadrivendiscovery/d3m/issues/99)
[107](https://gitlab.com/datadrivendiscovery/d3m/issues/107)

* Unified the naming of attributes/features metafeatures to attributes.
**Backwards incompatible.**
[!13](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/13)
* Unified the naming of categorical/nominal metafeatures to categorical.
**Backwards incompatible.**
[!12](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/12)
* Added more metafeatures:
* `pca`
* `random_tree`
* `decision_stump`
* `naive_bayes`
* `linear_discriminant_analysis`
* `knn_1_neighbor`
* `c45_decision_tree`
* `rep_tree`
* `categorical_joint_entropy`
* `numeric_joint_entropy`
* `number_distinct_values_of_numeric_features`
* `class_probabilities`
* `number_of_features`
* `number_of_instances`
* `canonical_correlation`
* `entropy_of_categorical_features`
* `entropy_of_numeric_features`
* `equivalent_number_of_categorical_features`
* `equivalent_number_of_numeric_features`
* `mutual_information_of_categorical_features`
* `mutual_information_of_numeric_features`
* `categorical_noise_to_signal_ratio`
* `numeric_noise_to_signal_ratio`

[!10](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/10)
[!14](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/14)
[!17](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/17)

* Added metafeatures for present values:
* `number_of_instances_with_present_values`
* `ratio_of_instances_with_present_values`
* `number_of_present_values`
* `ratio_of_present_values`

[84](https://gitlab.com/datadrivendiscovery/d3m/issues/84)

* Implemented interface for saving datasets.
[31](https://gitlab.com/datadrivendiscovery/d3m/issues/31)
* To remove a key in metadata, instead of using `None` value one should now use
special `NO_VALUE` value.
**Backwards incompatible.**
* `None` is now serialized to JSON as `null` instead of string `"None"`.
**Could be backwards incompatible.**
* Unified naming and behavior of methods dealing with JSON and JSON-related
data. Now across the package:
* `to_json_structure` returns a structure with values fully compatible with JSON and serializable with default JSON serializer
* `to_simple_structure` returns a structure similar to JSON, but with values left as Python values
* `to_json` returns serialized value as JSON string

**Backwards incompatible.**

* Hyper-parameters are now required to specify at least one
semantic type from: `https://metadata.datadrivendiscovery.org/types/TuningParameter`,
`https://metadata.datadrivendiscovery.org/types/ControlParameter`,
`https://metadata.datadrivendiscovery.org/types/ResourcesUseParameter`,
`https://metadata.datadrivendiscovery.org/types/MetafeatureParameter`.
**Backwards incompatible.**
* Made type strings in primitive annotations deterministic.
[93](https://gitlab.com/datadrivendiscovery/d3m/issues/93)
* Reimplemented primitives loading code to load primitives lazily.
[74](https://gitlab.com/datadrivendiscovery/d3m/issues/74)
* `d3m.index` module now has new and modified functions:
* `search` now returns a list of Python paths of all potential
primitives defined through entry points (but does not load them
or checks if entry points are valid)
* `get_primitive` loads and returns a primitive given its Python path
* `get_primitive_by_id` returns a primitive given its ID, but a primitive
has to be loaded beforehand
* `get_loaded_primitives` returns a list of all currently loaded primitives
* `load_all` tries to load all primitives
* `register_primitive` now accepts full Python path instead of just suffix

**Backwards incompatible.**
[74](https://gitlab.com/datadrivendiscovery/d3m/issues/74)

* Defined `model_features` primitive metadata to describe features supported
by an underlying model. This is useful to allow easy matching between
problem's subtypes and relevant primitives.
[88](https://gitlab.com/datadrivendiscovery/d3m/issues/88)
* Made hyper-parameter space of an existing `Hyperparams` subclass immutable.
[91](https://gitlab.com/datadrivendiscovery/d3m/issues/91)
* `d3m.index describe` command now accept `-s`/`--sort-keys` argument which
makes all keys in the JSON output sorted, making output JSON easier to
diff and compare.
* `can_accept` now gets a `hyperparams` object with hyper-parameters under
which to check a method call. This allows `can_accept` to return a result
based on control hyper-parameters.
**Backwards incompatible.**
[81](https://gitlab.com/datadrivendiscovery/d3m/issues/81)
* Documented that all docstrings should be made according to
[numpy docstring format](https://numpydoc.readthedocs.io/en/latest/format.html).
[85](https://gitlab.com/datadrivendiscovery/d3m/issues/85)
* Added to semantic types:
* `https://metadata.datadrivendiscovery.org/types/MissingData`
* `https://metadata.datadrivendiscovery.org/types/InvalidData`
* `https://metadata.datadrivendiscovery.org/types/RedactedTarget`
* `https://metadata.datadrivendiscovery.org/types/RedactedPrivilegedData`
* Added to `primitive_family`:
* `TIME_SERIES_EMBEDDING`
* Added to `algorithm_types`:
* `IVECTOR_EXTRACTION`
* Removed `SparseDataFrame` from standard container types because it is being
deprecated in Pandas.
**Backwards incompatible.**
[95](https://gitlab.com/datadrivendiscovery/d3m/issues/95)
* Defined `other_names` metadata field for any other names a value might have.
* Optimized primitives loading time.
[87](https://gitlab.com/datadrivendiscovery/d3m/issues/87)
* Made less pickling of values when hyper-parameter has `Union` structural type.
[83](https://gitlab.com/datadrivendiscovery/d3m/issues/83)
* `DataMetadata.set_for_value` now first checks new value against the metadata, by default.
**Could be backwards incompatible.**
* Added `NO_NESTED_VALUES` primitive precondition and effect.
This allows primitive to specify if it cannot handle values where a container value
contains nested other values with dimensions.

Page 2 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.