Paperscraper

Latest version: v0.2.14

Safety actively analyzes 682404 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

0.2.8

What's Changed
* Graceful handling of connection errors by jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/35
* chore(deps): bump requests from 2.24.0 to 2.31.0 by dependabot in https://github.com/PhosphorylatedRabbits/paperscraper/pull/30

New Contributors
* dependabot made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/30

**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.7...v0.2.8

0.2.7

What's Changed
* fix: OS agnostic urljoining by jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/29

A bugfix for Windows users that prevented from querying the chemrxiv API


**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.6...v0.2.7

0.2.6

What's Changed
* Save DOIs from arxiv papers by jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/27
--> This also allows to scrape PDFs from arxiv metadata

**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.5...v0.2.6

0.2.5

What's Changed
* Extract records from **biorxiv** and **medrxiv** based on **start date** and **end date** by achouhan93 in https://github.com/PhosphorylatedRabbits/paperscraper/pull/24
* Extract records from **chemrxiv** based on **start date** and **end date** by achouhan93 and jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/25

EXAMPLE
Since v0.2.5 `paperscraper` also allows to scrape {med/bio/chem}rxiv for specific dates!
py
medrxiv(begin_date="2023-04-01", end_date="2023-04-08")

But watch out. The resulting `.jsonl` file will be labelled according to the current date and all your subsequent searches will be based on this file **only**. If you use this option you might want to keep an eye on the source files (`paperscraper/server_dumps/*jsonl`) to ensure they contain the paper metadata for all papers you're interested in.

New Contributors
* achouhan93 made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/24

**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.4...v0.2.5

0.2.4

1. Support for scraping PDFs
2. Harmonize return types of scraper classes to `pd.DataFrame` rather than `List[Dict]`.


**1. Scraping PDFs**
v0.2.4 now supports downloading PDFs. The core function is `paperscraper.pdf.save_pdf` which receives a dictionary with the key `doi` and downloads the PDF for the desired DOI. There's also a wrapper function `paperscraper.pdf.save_pdf_from_dump` that can be called with a filepath of a `.jsonl` file that was previously obtained in the metadata search. This wrapper downloads all PDFs from the metadata search. Examples are given in the [README](https://github.com/PhosphorylatedRabbits/paperscraper/commit/f8ed29c8801e7d1dc94e81e4dd273819bded7cdc).

Thanks to daenuprobst for suggestions!

**2.Return types**
With this version, it is ensured that all scraper classes return the results in a pandas dataframe (one paper per row) as opposed to a list of dictionaries (one paper per dict).

**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.3...v0.2.4

0.2.3

What's Changed
* fix: preprint['id'] should be preprint['item']['id'] by oppih in https://github.com/PhosphorylatedRabbits/paperscraper/pull/22


**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.2...v0.2.3

Page 2 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.