-------------------------
C++ Extension
~~~~~~~~~~~~~
Phi_K contains an optional C++ extension to compute the significance matrix using the `hypergeometric` method
(also called the`Patefield` method).
Note that the PyPi distributed wheels contain a pre-build extension for Linux, MacOS and Windows.
A manual (pip) setup will attempt to build and install the extension, if it fails it will install without the extension.
If so, using the `hypergeometric` method without the extension will trigger a
NotImplementedError.
Compiler requirements through Pybind11:
- Clang/LLVM 3.3 or newer (for Apple Xcode's clang, this is 5.0.0 or newer)
- GCC 4.8 or newer
- Microsoft Visual Studio 2015 Update 3 or newer
- Intel classic C++ compiler 18 or newer (ICC 20.2 tested in CI)
- Cygwin/GCC (previously tested on 2.5.1)
- NVCC (CUDA 11.0 tested in CI)
- NVIDIA PGI (20.9 tested in CI)
Other
~~~~~
* You can now manually set the number of parallel jobs in the evaluation of Phi_K or its statistical significance
(when using MC simulations). For example, to use 4 parallel jobs do:
.. code-block:: python
df.phik_matrix(njobs = 4)
df.significance_matrix(njobs = 4)
The default value is -1, in which case all available cores are used. When using ``njobs=1`` no parallel processing
is applied.
* Phi_K can now be calculated with an independent expectation histogram:
.. code-block:: python
from phik.phik import phik_from_hist2d
cols = ["mileage", "car_size"]
interval_cols = ["mileage"]
observed = df1[["feature1", "feature2"]].hist2d()
expected = df2[["feature1", "feature2"]].hist2d()
phik_value = phik_from_hist2d(observed=observed, expected=expected)
The expected histogram is taken to be (relatively) large in number of counts
compared with the observed histogram.
Or can compare two (pre-binned) datasets against each other directly. Again the expected dataset
is assumed to be relatively large:
.. code-block:: python
from phik.phik import phik_observed_vs_expected_from_rebinned_df
phik_matrix = phik_observed_vs_expected_from_rebinned_df(df1_binned, df2_binned)
* Added links in the readme to the basic and advanced Phi_K tutorials on google colab.
* Migrated the spark example Phi_K notebook from popmon to directly using histogrammar for histogram creation.