Paperscraper

Latest version: v0.3.0

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

0.2.5

What's Changed
* Extract records from **biorxiv** and **medrxiv** based on **start date** and **end date** by achouhan93 in https://github.com/PhosphorylatedRabbits/paperscraper/pull/24
* Extract records from **chemrxiv** based on **start date** and **end date** by achouhan93 and jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/25

EXAMPLE
Since v0.2.5 `paperscraper` also allows to scrape {med/bio/chem}rxiv for specific dates!
py
medrxiv(begin_date="2023-04-01", end_date="2023-04-08")

But watch out. The resulting `.jsonl` file will be labelled according to the current date and all your subsequent searches will be based on this file **only**. If you use this option you might want to keep an eye on the source files (`paperscraper/server_dumps/*jsonl`) to ensure they contain the paper metadata for all papers you're interested in.

New Contributors
* achouhan93 made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/24

**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.4...v0.2.5

0.2.4

1. Support for scraping PDFs
2. Harmonize return types of scraper classes to `pd.DataFrame` rather than `List[Dict]`.


**1. Scraping PDFs**
v0.2.4 now supports downloading PDFs. The core function is `paperscraper.pdf.save_pdf` which receives a dictionary with the key `doi` and downloads the PDF for the desired DOI. There's also a wrapper function `paperscraper.pdf.save_pdf_from_dump` that can be called with a filepath of a `.jsonl` file that was previously obtained in the metadata search. This wrapper downloads all PDFs from the metadata search. Examples are given in the [README](https://github.com/PhosphorylatedRabbits/paperscraper/commit/f8ed29c8801e7d1dc94e81e4dd273819bded7cdc).

Thanks to daenuprobst for suggestions!

**2.Return types**
With this version, it is ensured that all scraper classes return the results in a pandas dataframe (one paper per row) as opposed to a list of dictionaries (one paper per dict).

**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.3...v0.2.4

0.2.3

What's Changed
* fix: preprint['id'] should be preprint['item']['id'] by oppih in https://github.com/PhosphorylatedRabbits/paperscraper/pull/22


**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/v0.2.2...v0.2.3

0.2.2

What's Changed
* refactor: extraction of published DOI/URL by oppih in https://github.com/PhosphorylatedRabbits/paperscraper/pull/21

New Contributors
* oppih made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/21

**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/0.2.1...v0.2.2

0.2.1

This version streamlines the handling of `.jsonl` files throughout the package. It removes an inconsistency between the arxiv/pubmed and the biorxiv/chemrxiv/medrxiv entry points where the former would dump the papers one string per line and the latter dumps it as one dict (json) per line.
Thanks juliusbierk for pointing this out.

What's Changed
* Export to json format by juliusbierk in https://github.com/PhosphorylatedRabbits/paperscraper/pull/19
* 0.2.1 - Streamline jsonl file saving/loading by jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/20

New Contributors
* juliusbierk made their first contribution in https://github.com/PhosphorylatedRabbits/paperscraper/pull/19

**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/0.2.0...0.2.1

0.2.0

What's Changed
* 0.2.0 - Chemrxiv engage api by jannisborn in https://github.com/PhosphorylatedRabbits/paperscraper/pull/18
* Bring back the support of chemrxiv
* Extend functionalities compared to old figshare API (more searchable fields)


**Full Changelog**: https://github.com/PhosphorylatedRabbits/paperscraper/compare/0.1.1...0.2.0

Page 3 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.