Pingouin

Latest version: v0.5.5

Safety actively analyzes 724087 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 4

0.3.7

------------------

**Bugfixes**

This hotfix release brings important changes to the :py:func:`pingouin.pairwise_tukey` and :py:func:`pingouin.pairwise_gameshowell` functions. These two functions had been implemented soon after Pingouin's first release and were not as tested as more recent and widely-used functions. These two functions are now validated against `JASP <https://jasp-stats.org/>`_.

We strongly recommend that all users upgrade their version of Pingouin (:code:`pip install -U pingouin`).

a. Fixed a bug in :py:func:`pingouin.pairwise_tukey` and :py:func:`pingouin.pairwise_gameshowell` in which the group labels (columns A and B) were incorrect when the ``between`` column was encoded as a :py:class:`pandas.Categorical` with non-alphabetical categories order. This was caused by a discrepancy in how Numpy and Pandas sorted the categories in the ``between`` column. For more details, please refer to `issue 111 <https://github.com/raphaelvallat/pingouin/issues/111>`_.
b. Fixed a bug in :py:func:`pingouin.pairwise_gameshowell` in which the reported standard errors were slightly incorrect because of a typo in the code. However, the T-values and p-values were fortunately calculated using the correct standard errors, so this bug only impacted the values in the ``se`` column.
c. Removed the ``tail`` and ``alpha`` argument from the in :py:func:`pingouin.pairwise_tukey` and :py:func:`pingouin.pairwise_gameshowell` functions to be consistent with JASP. Note that the ``alpha`` parameter did not have any impact. One-sided p-values were obtained by halving the two-sided p-values.

.. error:: Please check all previous code and results that called the :py:func:`pingouin.pairwise_tukey` or :py:func:`pingouin.pairwise_gameshowell` functions, especially if the ``between`` column was encoded as a :py:class:`pandas.Categorical`.

**Deprecation**

a. We have now removed the :py:func:`pingouin.plot_skipped_corr` function, as we felt that it may not be useful or relevant to many users (see `issue 105 <https://github.com/raphaelvallat/pingouin/issues/105>`_).

0.3.6

------------------

**Bugfixes**

a. Changed the default scikit-learn solver in :py:func:`pingouin.logistic_regression` from *'lbfgs'* to *'newton-cg'* in order to get results that are `always consistent with R or statsmodels <https://stats.stackexchange.com/questions/203816/logistic-regression-scikit-learn-vs-glmnet>`_. Previous version of Pingouin were based on the *'lbfgs'* solver which internally applied a regularization of the intercept that may have led to different coefficients and p-values for the predictors of interest based on the scaling of these predictors (e.g very small or very large values). The new *'newton-cg'* solver is scaling-independent, i.e. no regularization is applied to the intercept and p-values are therefore unchanged with different scaling of the data. If you prefer to keep the old behavior, just use: ``pingouin.logistic_regression(..., solver='lbfgs')``.
b. Fixed invalid results in :py:func:`pingouin.logistic_regression` when ``fit_intercept=False`` was passed as a keyword argument to scikit-learn. The standard errors and p-values were still calculated by taking into account an intercept in the model.

.. warning:: We highly recommend double-checking all previous code and results that called the :py:func:`pingouin.logistic_regression` function, especially if it involved non-standardized predictors and/or custom keywords arguments passed to scikit-learn.

**Enhancements**

a. Added ``within_first`` boolean argument to :py:func:`pingouin.pairwise_ttests`. This is useful in mixed design when one want to change the order of the interaction. The default behavior of Pingouin is to return the within * between pairwise tests for the interaction. Using ``within_first=False``, one can now return the between * within pairwise tests. For more details, see `issue 102 <https://github.com/raphaelvallat/pingouin/issues/102>`_ on GitHub.
b. :py:func:`pingouin.list_dataset` now returns a dataframe instead of simply printing the output.
c. Added the Palmer Station LTER `Penguin dataset <https://github.com/allisonhorst/palmerpenguins>`_, which describes the flipper length and body mass for different species of penguins. It can be loaded with ``pingouin.read_dataset('penguins')``.
d. Added the `Tips dataset <https://vincentarelbundock.github.io/Rdatasets/doc/reshape2/tips.html>`_. It can be loaded with ``pingouin.read_dataset('tips')``.

0.3.5

------------------

**Enhancements**

a. Added support for weighted linear regression in :py:func:`pingouin.linear_regression`. Users can now pass sample weights using the ``weights`` argument (similar to ``lm(..., weights)`` in R and ``LinearRegression.fit(X, y, sample_weight)`` in scikit-learn).
b. The :math:`R^2` in :py:func:`pingouin.linear_regression` is now calculated in a similar manner as statsmodels and R, which give different results as :py:func:`sklearn.metrics.r2_score` when, *and only when*, no constant term (= intercept) is present in the predictor matrix. In that case, scikit-learn (and previous versions of Pingouin) uses the standard :math:`R^2` formula, which assumes a reference model that only includes an intercept:

.. math:: R^2 = 1 - \frac{\sum_i (y_i - \hat y_i)^2}{\sum_i (y_i - \bar y)^2}

However, statsmodels, R, and newer versions of Pingouin use a modified formula, which uses a reference model corresponding to noise only (i.e. no intercept, as explained `in this post <https://stats.stackexchange.com/questions/26176/removal-of-statistically-significant-intercept-term-increases-r2-in-linear-mo>`_):

.. math:: R_0^2 = 1 - \frac{\sum_i (y_i - \hat y_i)^2}{\sum_i y_i^2}

Note that this only affects the (rare) cases when no intercept is present in the predictor matrix. Remember that Pingouin automatically add a constant term in :py:func:`pingouin.linear_regression`, a behavior that can be disabled using ``add_intercept=False``.

c. Added support for robust `biweight midcorrelation <https://en.wikipedia.org/wiki/Biweight_midcorrelation>`_ (``'bicor'``) in :py:func:`pingouin.corr` and :py:func:`pingouin.pairwise_corr`.

d. The Common Language Effect Size (CLES) is now calculated using the formula given by Vargha and Delaney 2000, which works better when ties are present in data.

.. math:: \text{CL} = P(X > Y) + .5 \times P(X = Y)

This applies to the :py:func:`pingouin.wilcoxon` and :py:func:`pingouin.compute_effsize` functions. Furthermore, the CLES is now tail-sensitive in the former, but not in the latter since tail is not a valid argument. In :py:func:`pingouin.compute_effsize`, the CLES thus always corresponds to the proportion of pairs where x is *higher* than y. For more details, please refer to `PR 94 <https://github.com/raphaelvallat/pingouin/pull/94>`_.

e. Confidence intervals around a Cohen d effect size are now calculated using a central T distribution instead of a standard normal distribution in the :py:func:`pingouin.compute_esci` function. This is consistent with the effsize R package.

**Code**

a. Added support for unsigned integers in dtypes safety checks (see `issue 93 <https://github.com/raphaelvallat/pingouin/issues/93>`_).

0.3.4

-----------------

**Bugfixes**

a. The Cohen :math:`d_{avg}` for paired samples was previously calculated using eq. 10 in `Lakens 2013 <https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00863/full>`_. However, this equation was slightly different from the original proposed by `Cumming 2012 <https://books.google.com/books/about/Understanding_the_New_Statistics.html?id=AVBDYgEACAAJ>`_, and Lakens has since updated the equation in his effect size conversion `spreadsheet <https://osf.io/vbdah/>`_. Pingouin now uses the correct formula, which is :math:`d_{avg} = \frac{\overline{X} - \overline{Y}}{\sqrt{\frac{(\sigma_1^2 + \sigma_2^2)}{2}}}`.
b. Fixed minor bug in internal function *pingouin.utils._flatten_list* that could lead to TypeError in :py:func:`pingouin.pairwise_ttests` with within/between factors encoded as integers (see `issue 91 <https://github.com/raphaelvallat/pingouin/issues/91>`_).

**New functions**

a. Added :py:func:`pingouin.convert_angles` function to convert circular data in arbitrary units to radians (:math:`[-\pi, \pi)` range).

**Enhancements**

a. Better documentation and testing for descriptive circular statistics functions.
b. Added safety checks that ``angles`` is expressed in radians in circular statistics function.
c. :py:func:`pingouin.circ_mean` and :py:func:`pingouin.circ_r` now perform calculations omitting missing values.
d. Pingouin no longer changes the default matplotlib style to a Seaborn-default (see `issue 85 <https://github.com/raphaelvallat/pingouin/issues/85>`_).
e. Disabled rounding of float in most Pingouin functions in order to reduce numerical imprecision. For more details, please refer to `issue 87 <https://github.com/raphaelvallat/pingouin/issues/87>`_. Users can still round the output using the :py:meth:`pandas.DataFrame.round` method, or changing the default precision of Pandas DataFrame with `pandas.set_option <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.set_option.html>`_.
f. Disabled filling of missing values by ``'-'`` in some ANOVAs functions, which may have lead to dtypes issues.
g. Added partial eta-squared (``np2`` column) to the output of :py:func:`pingouin.ancova` and :py:func:`pingouin.welch_anova`.
h. Added the ``effsize`` option to :py:func:`pingouin.anova` and :py:func:`pingouin.ancova` to return different effect sizes. Must be one of ``'np2'`` (partial eta-squared, default) or ``'n2'`` (eta-squared).
i. Added the ``effsize`` option to :py:func:`pingouin.rm_anova` and :py:func:`pingouin.mixed_anova` to return different effect sizes. Must be one of ``'np2'`` (partial eta-squared, default), ``'n2'`` (eta-squared) or ``ng2`` (generalized eta-squared).

**Code and dependencies**

a. Compatibility with Python 3.9 (see `PR by tirkarthi <https://github.com/raphaelvallat/pingouin/pull/83>`_).
b. To avoid any confusion, the ``alpha`` argument has been renamed to ``angles`` in all circular statistics functions.
c. Updated flake8 guidelines and added continuous integration for Python 3.8.
d. Added the `tabulate <https://pypi.org/project/tabulate/>`_ package as dependency. The tabulate package is used by the :py:func:`pingouin.print_table` function as well as the :py:meth:`pandas.DataFrame.to_markdown` function.

0.3.3

----------------------

**Bugfixes**

a. Fixed a bug in :py:func:`pingouin.pairwise_corr` caused by the deprecation of ``pandas.core.index`` in the new version of Pandas (1.0). For now, both Pandas 0.25 and Pandas 1.0 are supported.
b. The standard deviation in :py:func:`pingouin.pairwise_ttests` when using ``return_desc=True`` is now calculated with ``np.nanstd(ddof=1)`` to be consistent with Pingouin/Pandas default unbiased standard deviation.

**New functions**

a. Added :py:func:`pingouin.plot_circmean` function to plot the circular mean and circular vector length of a set of angles (in radians) on the unit circle.

0.3.2

---------------------

Hotfix release to fix a critical issue with :py:func:`pingouin.pairwise_ttests` (see below). We strongly recommend that you update to the newest version of Pingouin and double-check your previous results if you've ever used the pairwise T-tests with more than one factor (e.g. mixed, factorial or 2-way repeated measures design).

**Bugfixes**

a. MAJOR: Fixed a bug in :py:func:`pingouin.pairwise_ttests` when using mixed or two-way repeated measures design. Specifically, the T-tests were performed without averaging over repeated measurements first (i.e. without calculating the marginal means). Note that for mixed design, this only impacts the between-subject T-test(s). Practically speaking, this led to higher degrees of freedom (because they were conflated with the number of repeated measurements) and ultimately incorrect T and p-values because the assumption of independence was violated. Pingouin now averages over repeated measurements in mixed and two-way repeated measures design, which is the same behavior as JASP or JAMOVI. As a consequence, and when the data has only two groups, the between-subject p-value of the pairwise T-test should be (almost) equal to the p-value of the same factor in the :py:func:`pingouin.mixed_anova` function. The old behavior of Pingouin can still be obtained using the ``marginal=False`` argument.
b. Minor: Added a check in :py:func:`pingouin.mixed_anova` to ensure that the ``subject`` variable has a unique set of values for each between-subject group defined in the ``between`` variable. For instance, the subject IDs for group1 are [1, 2, 3, 4, 5] and for group2 [6, 7, 8, 9, 10]. The function will throw an error if there are one or more overlapping subject IDs between groups (e.g. the subject IDs for group1 AND group2 are both [1, 2, 3, 4, 5]).
c. Minor: Fixed a bug which caused the :py:func:`pingouin.plot_rm_corr` and :py:func:`pingouin.ancova` (with >1 covariates) to throw an error if any of the input variables started with a number (because of statsmodels / Patsy formula formatting).

**Enhancements**

a. Upon loading, Pingouin will now use the `outdated <https://github.com/alexmojaki/outdated>`_ package to check and warn the user if a newer stable version is available.
b. Globally removed the ``export_filename`` parameter, which allowed to export the output table to a .csv file. This helps simplify the API and testing. As an alternative, one can simply use pandas.to_csv() to export the output dataframe generated by Pingouin.
c. Added the ``correction`` argument to :py:func:`pingouin.pairwise_ttests` to enable or disable Welch's correction for independent T-tests.

Page 3 of 4

Releases

Has known vulnerabilities

Previous Next

Pingouin

Page 3 of 4

0.3.7

0.3.6

0.3.5

0.3.4

0.3.3

0.3.2

Page 3 of 4

Links

Releases