Chrisdata

Latest version: v0.5.1

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.5.1

**Full Changelog**: https://github.com/chrisjihee/chrisdata/compare/v0.5.0...v0.5.1
* Add kg_generation_messages()
* Update convert_message_to_jsonl()
* Remove compare_eval_results()
* Add convert_to_entity_query_samples()
* Update convert_conll_to_jsonl()
* Add sample_jsonl_by_dataset()
* Update GenNERSample.set_instruction_prompt(): instruction_file may be txt file
* Update GenNERSample
* Add GenNERSample.set_prompt_labels()
* Update convert_to_entity_query_version()
* Update GenSeq2SeqSample: add prediction_output
* Update GenNERSample: add get_prompt_labels()
* Add GenNERSampleEntitySpan
* Add convert_to_word_query_version()
* Update GenNERSample: target_word -> target_index
* Add make_prompt_label()
* Make _generate_labeled_string() as staticmethod
* Update convert_to_word_query_version(): not accept output_dir
* Update convert_to_word_query_version(): modify output_post
* Update convert_to_word_query_version(): replace " " into "_" in label
* Add extract() in GenNERSample
* Fix convert_to_word_query_version(): sample.label_list = [x.replace(" ", "_") for x in sample.label_list]
* Update ner_convert_to_WQ.py: target_label_levels = ["1", "3", "5"]
* Add stratified_sample_jsonl_lines()
* Remove ner_convert_conll.sh and Updating convert_conll_to_jsonl()
* Update download_hf_dataset(): save source.txt
* Update download_hf_dataset(): accept dataset_path and split!
* Add HfNerDatasetInfo and use it!
* Update download_hf_dataset(): output_dir
* Update save_conll_format(): save if len(tokens) > 0 and len(tags) > 0
* Update save_conll_format(): count if len(tokens) > 0 and len(tags) > 0
* class_name = re.sub(r"^[BIES]-|^O$", "", label_name)
* Add normalize_conll()
* Add read_class_names()
* Move F1: generate.py -> chrisdata/metric.py
* Add RegressionSample
* Add convert_to_hybrid_round_version()
* Remove convert_to_entity_query_version()
* Update ner_samples_from_json(), ner_samples_from_jsonl(): sample.id = sample.instance.id = sample.id or sample.instance.id
* Update convert_to_hybrid_round_version(): aceept mr_input_file, sr_input_file, mr_inst_file, sr_inst_file
* Update stratified_sample_jsonl_lines(): accept random_seed, uppercased labels
* Update convert_to_hybrid_round_version(): uppercased labels
* Update stratified_sample_jsonl(), convert_to_hybrid_round_version(): rename after close!
* Fix convert_to_hybrid_round_version(): use final_labels, target_label="*" (single-round)

0.5.0

Start to publish as an official python package: https://pypi.org/project/chrisdata/

software_registration_v1
chrisdata

Data processing tools for data analysis


Installation

1. Install Miniforge
bash
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh


2. Clone the repository
bash
rm -rf chrisdata*
git clone gitgithub.com:chrisjihee/chrisdata.git
cd chrisdata*


3. Create a new environment
bash
mamba create -n chrisdata python=3.11 -y
mamba activate chrisdata


4. Install the required packages
bash
pip install -U -e .
pip list | grep -E "mongo|search|Wiki|wiki|json|pydantic|chris|Flask"


5. Install MongoDB
bash
mkdir mongodb; cd mongodb; mkdir data log
if [ "$(uname)" = "Linux" ]; then
aria2c https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu2204-8.0.0.tgz
elif [ "$(uname)" = "Darwin" ]; then
aria2c https://fastdl.mongodb.org/osx/mongodb-macos-arm64-8.0.0.tgz
fi
tar zxvf mongodb-*.tgz --strip-components=1
cd ..


6. Run MongoDB
bash
cd mongodb
bin/mongod --config ../cfg/mongod-8800.yaml
cd ..


7. Link input data
bash
cd input
ln -s /mnt/geo/data/wikidata .
ln -s /mnt/geo/data/wikipedia .
cd ..



Execution

1. Show help
bash
python -m chrisdata.cli --help


bash
python -m chrisdata.cli wikipedia --help


bash
python -m chrisdata.cli wikidata --help


2. Run command
* To convert Wikipedia articles
bash
python -m chrisdata.cli wikipedia convert


* To parse Wikidata dump
bash
python -m chrisdata.cli wikidata parse


* To filter Wikidata entities
bash
python -m chrisdata.cli wikidata filter


* To convert Wikidata entities
bash
python -m chrisdata.cli wikidata convert



Reference

* https://github.com/chrisjihee/chrisdata/
* https://github.com/chrisjihee/chrisbase/

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.