Scraperfc

Latest version: v3.1.2

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

3.1.2

* Fixed issue 46 (Capology timeout exception when looking for element)
* Capology scrapes cleaned column names for current season.
* Previously, the column names for the current season included any options from the dropdown menus of help hover icons in the column
* Added numpy docstring validation when building Sphinx docs

3.1.1

* Added a `get_match_links()` function to Transfermarkt that returns all match links for a given `year` and `league`
* Updated the output of `scrape_player_match_stats()` in the Sofascore module to also return team name and team ID columns for each player.

3.1.0

* Removed the FiveThirtyEight module.
* See https://scraperfc.readthedocs.io/en/latest/fivethirtyeight.html for a really simple way to acquire the FiveThirtyEight data.
* Added Copa Libertardors as a competition to the Sofascore module.
* FBref
* Revamped the `scrape_match()` function.
* Updated the rate limiting in the FBref module due to FBref changing their bot rate limit speed.
* Also added a new ScraperFC exception class that should be raised when FBref has temporarily flagged your IP due to rate limit infringments.
* Added linting and typechecking to Tox and GitHub Actions.
* Added some new test cases for the FBref module.

3.0.0

Why the change?
This is a big update and it's not backwards compatible; some of you will have to rewrite small parts of your own code. I know this can be frustrating so I want to explain why I'm making these changes. If you're not interested in the "why", feel free to skip to the [[Changelog]] below and see what the changes are!

A lot of the changes are non-codebase changes. Things I should have done from Day 1. Unit tests, CI pipelines for testing, docs, and builds, etc. Most of you won't care or see these unless you a) look for them or b) contribute code in the future.

The codebase changes fall into a few categories:
1. Making it easier for me to maintain the code moving forward. The code got pretty messy and hard for me to take care of.
2. Making the code run faster and more reliably.
3. Making it easier for community members (you!) to contribute new code.
Changelog
Now the part you've all been waiting for.
Shared functions
- Moved the ScraperFC exceptions into their own file.
- Got rid of the overly-complicated function to check years and leagues, `get_source_comp_info()`. This was a function from very early on in ScraperFC. It was poor architecting and was too much of a pain in the a$$ to fix before this. Now, each module now has a `comps` dict in its `.py` file. Any checks to make sure year and league inputs are valid are done in the module functions.
FBref
- Updated the capitalization, I finally realized the "r" is lowercase 🤦‍♂️.
- `FBref.close()` has been removed. Only 1 function used the Selenium driver and that function has been updated to open, use, and then close the driver without the user needing to call `close()`.
- Added `FBref.get_valid_seasons()`. This returns the valid seasons for a given competition, scraped directly from the competition's history page on FBref.
- The `year` argument is no longer an `int`. This is a byproduct of adding `get_valid_seasons()`. The year is now a `str` and needs to match the year as it appears on the competition's history page on FBref. This will require a lot of user code changes but makes it far easier to assert the year is valid. See the year parameter page on ReadTheDocs for more details.
- `FBref.scrape_league_table()` now returns all tables from the season's league table page. The first table should be the league table and then any tables after that vary by competition.
Understat
- No longer need to call `Understat.close()`. The Understat module doesn't even need Selenium anymore! They embed a lot of the raw data as JSON in JS scripts right in the HTML.
- As a result of getting the data in a different format, *a lot* of the functions have changed functionality or been deprecated in favor of new functions. Please read the ReadTheDocs page for this module.
- Added `Understat.get_valid_seasons()`.
- The `year` argument is a string now. Write the year as it appears in the season dropdown on the Understat website. See the year parameter page on ReadTheDocs for more details.
Sofascore
- I switched from requests to the [Botasaurus](https://github.com/omkarcloud/botasaurus) library. Requests was no longer returning accurate data but using Botasaurus fixes this.
- I renamed a lot of the functions to more closely match the naming convention of the rest of the modules.
- Just about the only complaint I ever heard about this module was that it wasn't automated enough; a lot of the functions required a match link as input but there was no way to get all of the match URLs for a given season. So....
- I've added a function to return basic info for all of the matches, `Sofascore.get_match_dicts()`.
- You can use the match IDs in the output of this function as input to a lot of the other functions because they now take match URLs *or* match IDs as inputs. Match URLs must be strings, match IDs must be ints.
Transfermarkt
- Removed `Transfermarkt.close()`. The Transfermarkt module now uses cloudscraper instead of a Selenium driver.
- Added `Transfermarkt.get_valid_seasons()`
- `year` argument is a string now. Enter the string as it appears in the competition's season dropdown on the Transfermarkt website. See the year parameter page on ReadTheDocs for more details.
Capology
- No longer need to call `Capology.close()`. Driver will be closed on its own when scraping is done.
- Added `Capology.get_valid_seasons()`.
- The `year` argument is a string now. Write the year as it appears in the season dropdown on the Capology website. See the year parameter page on ReadTheDocs for more details.
- Removed `Capology.scrape_payrolls()`. It ended up doing the same thing as `Capology.scrape_salaries()`.
ClubELO
- Minor changes to how invalid team names are detected. Shouldn't impact anything.
FiveThirtyEight
- No longer need to call `FiveThirtyEight.close()`. Driver will be closed on its own when scraping is done.
"Behind the Scenes"
- Unit tests
- Uses [pytest](https://docs.pytest.org/en/8.2.x/) and [pytest-cov](https://pytest-cov.readthedocs.io/en/latest/)
- These are in the `test` folder at the root of the GitHub repository.
- There's a test file for each ScraperFC module.
- Python packaging tooling changes
- [tox](https://tox.wiki/en/latest/index.html): I've created tox environments for running the unit tests, building the docs, and building the package.
- GitHub Actions:
- Every push now automatically runs the test suite and does a test build of the docs.
- Tagged commits will trigger a workflow to build from that commit and upload to PyPI.
- I've updated the layout of the [documentation on Read the Docs](https://scraperfc.readthedocs.io/en/latest/).
- I've updated the examples in `Examples.ipynb` in the GitHub repo to reflect all of the changes introduced in ScraperFC 3.0.

2.9.2

* removed webdriver_manager import in shared_functions.py because it's no longer required and not included in requirements.txt (v2.9.1, technically)
* renamed Sofascore file, class, and in __init__.py to all align on capitalization
* updates to FBRef.py for issues found during unit test dev

2.9.0

* Added RFPL as a scrape-able league for Understat
* Fixed some residual bugs from the transition away from ChromeDriverManager and to new `get_source_comp_info()` function
* Added Oddsportal module (unstable)
* Fixed 32

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.