Task
- RIS Parser
- Expose `get_article_by_arxiv_id` to cli
- Article.Published must be fixed in pubmed and is string in Arxiv
- Data Extraction from Unstructured PDFs
- https://www.analyticsvidhya.com/blog/2021/06/data-extraction-from-unstructured-pdfs/
- https://unstructured-io.github.io/unstructured/core/partition.html
- https://python.plainenglish.io/how-to-create-a-pdf2text-preprocessing-microservice-using-python-8b844b85c797
- https://github.com/arXiv/arxiv-fulltext
- complete get_full_text
- move_state_forward may be error in TinyDB
- check all TinyDB
Improvements
- Add `update_cstate_by_id` In SERVICE.REPOSITORY.PERSIST
- Add `precalculate` and `reset_flag_llm_by_function` in SERVICE.LLM
- Add `AAA_LLM_TEMPLATE_FILE` in SETTING
- Add FlagShortReviewByLLM
- Add `get_article_by_arxiv_id` Minor Version 003
- Add `convert_full_text2string` for converting fulltext pdf to string
- Add `mongo_nav` for some query function for MongoDB
- Add `get_full_text`
- Add `article_embedding` and `scigenius_article_embedding` 2024-01-27
- Add unified_export_json
- Add `update_article_by_pmid` replace with `update_article_by_id`
- Add `get_article_id_list_by_cstate` replace with `get_article_pmid_list_by_cstate`
- Add `get_article_by_id` replace with `get_article_by_pmid`
- Add `get_all_article_id_list` replace with`get_all_article_pmid_list`
- Add print_error in utils.general for unified Error printing
- Add Published, ArxivID, SourceBank field in Article
Bug Fixes
- Repackaing pyproject.toml
- Fix parsing_details_pubmed.py", line 214 : `abstract_all = abstract_all + " " + abstract_part["text"]`
- Fix `triplea/config/settings.py`, line 27 - FileNotFoundError: [Errno 2] No such file or directory: 'pyproject.toml'
- Fix `print_error()`
- Fix bug with pydantic new version 2024-02-03
- Fix session of extract_triple
- Fix tinydb.get_all_article_id_list