- Fixes multiple smaller and larger performance regressions. The NYC-Taxi example regression now takes approximately 22 seconds to run (... if my laptop is connected to a power charger)!
python
%load_ext autoreload
%autoreload 2
import duckdb
import time
import numpy as np
import pyfixest as pf
%%
nyc = duckdb.sql(
'''
FROM 'C:/Users/alexa/Documents/nyc-taxi/**/*.parquet'
SELECT
tip_amount, trip_distance, passenger_count,
vendor_id, payment_type, dropoff_at,
dayofweek(dropoff_at) AS dofw
WHERE year = 2012 AND month <= 3
'''
).df()
convert dowf, vendor_id, payment_type to categorical
tic = time.time()
nyc["dofw"] = nyc["dofw"].astype(int)
nyc["vendor_id"] = nyc["vendor_id"].astype("category")
nyc["payment_type"] = nyc["payment_type"].astype("category")
print(f"""
I am convering columns of type 'objects' to 'categories' and 'int'data types outside
of the regression, hence I am cheating a bit. This saves {np.round(time.time() - tic)} seconds.
"""
)
I am convering columns of type 'objects' to 'categories' and 'int'data types outside
of the regression, hence I am cheating a bit. This saves 7.0 seconds.
run = True
if run:
mock regression for JIT compilation
fit = pf.feols(
fml = "tip_amount ~ trip_distance + passenger_count | vendor_id + payment_type + dofw",
data = nyc.iloc[1:10_000],
copy_data = False,
store_data = False
)
import time
tic = time.time()
fit = pf.feols(
fml = "tip_amount ~ trip_distance + passenger_count | vendor_id + payment_type + dofw",
data = nyc,
copy_data = False, saves a few seconds
store_data = False saves a few second
)
passed = time.time() - tic
print(f"Passed time is {np.round(passed)}.")
Passed time is 22.
- Three new function arguments for `feols()` and `fepois()`: `copy_data`, `store_data`, and `fixef_tol`.
- Support for frequency weights via the `weights_type` function argument.
{python}
import pyfixest as pf
data = pf.get_data(N = 10000, model = "Fepois")
df_weighted = data[["Y", "X1", "f1"]].groupby(["Y", "X1", "f1"]).size().reset_index().rename(columns={0: "count"})
df_weighted["id"] = list(range(df_weighted.shape[0]))
print("Dimension of the aggregated df:", df_weighted.shape)
print(df_weighted.head())
fit = pf.feols(
"Y ~ X1 | f1",
data = data
)
fit_weighted = pf.feols(
"Y ~ X1 | f1",
data = df_weighted,
weights = "count",
weights_type = "fweights"
)
pf.etable([fit, fit_weighted], coef_fmt = "b(se) \n (t) \n (p)")
- Bugfix: Wild Cluster Bootstrap Inference with Weights would compute unweighted standard errors. Sorry about that! WLS is not supported for the WCB.
- Adds support for CRV3 inference with weights.
What's Changed
* Performance Improvements by s3alfisc in https://github.com/s3alfisc/pyfixest/pull/359
* Fweights by s3alfisc in https://github.com/s3alfisc/pyfixest/pull/365
* release version 0.19.0 by s3alfisc in https://github.com/s3alfisc/pyfixest/pull/366
**Full Changelog**: https://github.com/s3alfisc/pyfixest/compare/v0.18.0...v0.19.0