Nlu

Latest version: v5.3.2

Safety actively analyzes 640762 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 12

1.0.1

- Fixed bug that caused NER pipelines to crash in NLU when input string caused the NER model to predict without additional metadata

1.0

We are glad to announce that NLU 1.0.0 has been released!

**Changes and Updates:**
- Automatic to Numpy conversion of embeddings
- Added various testing classes
- Added various new to the NLU namespace Aliases for easier access to models
- Removed irrelevant information from Component Infos
- Integration of Spark NLP 2.6.2 enhancements and bugfixes https://github.com/JohnSnowLabs/spark-nlp/releases/tag/2.6.2
- Updated old T-SNE notebooks with the more elegant and simpler generation of t-SNE embeddings
- [New 6 embeddings at once notebook with t-SNE and Medium article](https://medium.com/spark-nlp/1-line-of-code-for-bert-albert-elmo-electra-xlnet-glove-part-of-speech-with-nlu-and-t-sne-9ebcd5379cd)
<img src="https://miro.medium.com/max/1296/1*WI4AJ78hwPpT_2SqpRpolA.png" >

1.0.0

1.0000

Train with default glove embeddings
python
untrained_chunk_resolver = nlu.load('train.resolve_chunks')
trained_chunk_resolver = untrained_chunk_resolver.fit(df)
trained_chunk_resolver.predict(df)


Train with custom embeddings
python
Use BIo GLove
untrained_chunk_resolver = nlu.load('en.embed.glove.biovec train.resolve_chunks')
trained_chunk_resolver = untrained_chunk_resolver.fit(df)
trained_chunk_resolver.predict(df)




Rule based NER with Context Matcher
[Rule based NER with context matching tutorial notebook](https://github.com/JohnSnowLabs/nlu/blob/master/examples/colab/Training/rule_based_named_entity_recognition_and_resolution/rule_based_NER_and_resolution_with_context_matching.ipynb)
Define a rule-based NER algorithm by providing Regex Patterns and resolution mappings.
The confidence value is computed using a heuristic approach based on how many matches it has.
A dictionary can be provided with setDictionary to map extracted entities to a unified representation. The first column of the dictionary file should be the representation with the following columns the possible matches.


python
import nlu
import json
Define helper functions to write NER rules to file
"""Generate json with dict contexts at target path"""
def dump_dict_to_json_file(dict, path):
with open(path, 'w') as f: json.dump(dict, f)

"""Dump raw text file """
def dump_file_to_csv(data,path):
with open(path, 'w') as f:f.write(data)
sample_text = """A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation , associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting. Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation . Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity . Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia . The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L . β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again . The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours . Twenty days ago. Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use . At birth the typical boy is growing slightly faster than the typical girl, but the velocities become equal at about seven months, and then the girl grows faster until four years. From then until adolescence no differences in velocity can be detected. 21-02-2020 21/04/2020 """

Define Gender NER matching rules
gender_rules = {
"entity": "Gender",
"ruleScope": "sentence",
"completeMatchRegex": "true" }

Define dict data in csv format
gender_data = '''male,man,male,boy,gentleman,he,him
female,woman,female,girl,lady,old-lady,she,her
neutral,neutral'''

Dump configs to file
dump_dict_to_json_file(gender_data, 'gender.csv')
dump_dict_to_json_file(gender_rules, 'gender.json')
gender_NER_pipe = nlu.load('match.context')
gender_NER_pipe.print_info()
gender_NER_pipe['context_matcher'].setJsonPath('gender.json')
gender_NER_pipe['context_matcher'].setDictionary('gender.csv', options={"delimiter":","})
gender_NER_pipe.predict(sample_text)


| context_match | context_match_confidence |
| :------------ | -----------------------: |
| female | 0.13 |
| she | 0.13 |
| she | 0.13 |
| she | 0.13 |
| she | 0.13 |
| boy | 0.13 |
| girl | 0.13 |
| girl | 0.13 |

Context Matcher Parameters
You can define the following parameters in your rules.json file to define the entities to be matched

| Parameter | Type | Description |
| --------------------- | ----------------------- | ------------------------------------------------------------ |
| entity | `str ` | The name of this rule |
| regex | `Optional[str] ` | Regex Pattern to extract candidates |
| contextLength | `Optional[int] ` | defines the maximum distance a prefix and suffix words can be away from the word to match,whereas context are words that must be immediately after or before the word to match |
| prefix | `Optional[List[str]] ` | Words preceding the regex match, that are at most `contextLength` characters aways |
| regexPrefix | `Optional[str] ` | RegexPattern of words preceding the regex match, that are at most `contextLength` characters aways |
| suffix | `Optional[List[str]] ` | Words following the regex match, that are at most `contextLength` characters aways |
| regexSuffix | `Optional[str] ` | RegexPattern of words following the regex match, that are at most `contextLength` distance aways |
| context | `Optional[List[str]] ` | list of words that must be immediatly before/after a match |
| contextException | `Optional[List[str]] ` | ?? List of words that may not be immediatly before/after a match |
| exceptionDistance | `Optional[int] ` | Distance exceptions must be away from a match |
| regexContextException | `Optional[str] ` | Regex Pattern of exceptions that may not be within `exceptionDistance` range of the match |
| matchScope | `Optional[str]` | Either `token` or `sub-token` to match on character basis |
| completeMatchRegex | `Optional[str]` | Wether to use complete or partial matching, either `"true"` or `"false"` |
| ruleScope | `str` | currently only `sentence` supported |

Drug Normalizer
[Drug Normalizer tutorial notebook](https://github.com/JohnSnowLabs/nlu/blob/master/examples/colab/healthcare/drug_normalization/drug_norm.ipynb)

Normalize raw text from clinical documents, e.g. scraped web pages or xml documents. Removes all dirty characters from text following one or more input regex patterns. Can apply unwanted character removal which a specific policy. Can apply lower case normalization.

**Parameters are**
- lowercase: whether to convert strings to lowercase. Default is False.
- `policy`: rule to remove patterns from text. Valid policy values are: `all` `abbreviations`, `dosages`
Defaults is `all`. `abbreviation` policy used to expend common drugs abbreviations, `dosages` policy used to convert drugs dosages and values to the standard form (see examples below).

python
data = ["Agnogenic one half cup","adalimumab 54.5 + 43.2 gm","aspirin 10 meq/ 5 ml oral sol","interferon alfa-2b 10 million unit ( 1 ml ) injec","Sodium Chloride/Potassium Chloride 13bag"]
nlu.load('norm_drugs').predict(data)



| drug_norm | text |
| :--------------------------------------------------- | :------------------------------------------------ |
| Agnogenic 0.5 oral solution | Agnogenic one half cup |
| adalimumab 97700 mg | adalimumab 54.5 + 43.2 gm |
| aspirin 2 meq/ml oral solution | aspirin 10 meq/ 5 ml oral sol |
| interferon alfa - 2b 10000000 unt ( 1 ml ) injection | interferon alfa-2b 10 million unit ( 1 ml ) injec |
| Sodium Chloride / Potassium Chloride 13 bag | Sodium Chloride/Potassium Chloride 13bag |




New NLU Spells
These new magical 1-liners which get new the folowing models

Open Source NLU Spells

| NLU Spell | Spark NLP Model |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [nlu.load('de.ner.wikiner.6B_100')](https://nlp.johnsnowlabs.com//2019/07/13/wikiner_6B_100_de.html) | [wikiner_6B_100](https://nlp.johnsnowlabs.com//2019/07/13/wikiner_6B_100_de.html) |
| nlu.load('xx.embed.glove.glove_6B_100') | glove_6B_100 |


Healthcare NLU spells

| NLU Spell | Spark NLP Model |
| --------- | --------------- |
|[nlu.load('en.resolve.snomed_body_structure_med')](https://nlp.johnsnowlabs.com//2021/06/15/sbertresolve_snomed_bodyStructure_med_en.html) | [sbertresolve_snomed_bodyStructure_med](https://nlp.johnsnowlabs.com//2021/06/15/sbertresolve_snomed_bodyStructure_med_en.html)
|[nlu.load('en.resolve.snomed_body_structure')](https://nlp.johnsnowlabs.com//2021/06/15/sbiobertresolve_snomed_bodyStructure_en.html) | [sbiobertresolve_snomed_bodyStructure](https://nlp.johnsnowlabs.com//2021/06/15/sbiobertresolve_snomed_bodyStructure_en.html)
|[nlu.load('en.resolve.icdo_augmented')](https://nlp.johnsnowlabs.com//2021/06/22/sbiobertresolve_icdo_augmented_en.html) | [sbiobertresolve_icdo_augmented](https://nlp.johnsnowlabs.com//2021/06/22/sbiobertresolve_icdo_augmented_en.html)
|[nlu.load('en.embed_sentence.biobert.jsl_cased')](https://nlp.johnsnowlabs.com//2021/05/14/sbiobert_jsl_cased_en.html) | [sbiobert_jsl_cased](https://nlp.johnsnowlabs.com//2021/05/14/sbiobert_jsl_cased_en.html)
|[nlu.load('en.embed_sentence.biobert.jsl_umls_cased')](https://nlp.johnsnowlabs.com//2021/05/14/sbiobert_jsl_umls_cased_en.html) | [sbiobert_jsl_umls_cased](https://nlp.johnsnowlabs.com//2021/05/14/sbiobert_jsl_umls_cased_en.html)
|[nlu.load('en.embed_sentence.bert.jsl_medium_uncased')](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_medium_uncased_en.html) | [sbert_jsl_medium_uncased](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_medium_uncased_en.html)
|[nlu.load('en.embed_sentence.bert.jsl_medium_umls_uncased')](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_medium_umls_uncased_en.html) | [sbert_jsl_medium_umls_uncased](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_medium_umls_uncased_en.html)
|[nlu.load('en.embed_sentence.bert.jsl_mini_uncased')](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_mini_uncased_en.html) | [sbert_jsl_mini_uncased](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_mini_uncased_en.html)
|[nlu.load('en.embed_sentence.bert.jsl_mini_umlsuncased')](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_mini_umls_uncased_en.html) | [sbert_jsl_mini_umls_uncasedjsl_tiny_uncased](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_mini_umls_uncased_en.html)
|[nlu.load('en.embed_sentence.bert.jsl_tiny_uncased')](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_tiny_uncased_en.html) | [sbert_jsl_tiny_uncased](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_tiny_uncased_en.html)
|[nlu.load('en.embed_sentence.bert.jsl_tiny_umls_uncased')](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_tiny_umls_uncased_en.html) | [sbert_jsl_tiny_umls_uncased](https://nlp.johnsnowlabs.com//2021/05/14/sbert_jsl_tiny_umls_uncased_en.html)
|[nlu.load('en.resolve.icd10cm.slim_billable_hcc')](https://nlp.johnsnowlabs.com//2021/05/21/sbiobertresolve_icd10cm_slim_billable_hcc_en.html) | [sbiobertresolve_icd10cm_slim_billable_hcc](https://nlp.johnsnowlabs.com//2021/05/21/sbiobertresolve_icd10cm_slim_billable_hcc_en.html)
|[nlu.load('en.resolve.icd10cm.slim_billable_hcc_med')](https://nlp.johnsnowlabs.com//2021/05/21/sbertresolve_icd10cm_slim_billable_hcc_med_en.html) | [sbertresolve_icd10cm_slim_billable_hcc_med](https://nlp.johnsnowlabs.com//2021/05/21/sbertresolve_icd10cm_slim_billable_hcc_med_en.html)
|[nlu.load('med_ner.deid.generic_augmented')](https://nlp.johnsnowlabs.com//2021/06/30/ner_deid_generic_augmented_en.html) | [ner_deid_generic_augmented](https://nlp.johnsnowlabs.com//2021/06/30/ner_deid_generic_augmented_en.html)
|[nlu.load('med_ner.deid.subentity_augmented')](https://nlp.johnsnowlabs.com//2021/06/30/ner_deid_subentity_augmented_en.html) | [ner_deid_subentity_augmented](https://nlp.johnsnowlabs.com//2021/06/30/ner_deid_subentity_augmented_en.html)
|[nlu.load('en.assert.radiology')](https://nlp.johnsnowlabs.com//2021/03/18/assertion_dl_radiology_en.html) | [assertion_dl_radiology](https://nlp.johnsnowlabs.com//2021/03/18/assertion_dl_radiology_en.html)
|[nlu.load('en.relation.test_result_date')](https://nlp.johnsnowlabs.com//2021/02/24/re_test_result_date_en.html) | [re_test_result_date](https://nlp.johnsnowlabs.com//2021/02/24/re_test_result_date_en.html)
|[nlu.load('en.med_ner.admission_events')](https://nlp.johnsnowlabs.com//2021/03/01/ner_events_admission_clinical_en.html) | [ner_events_admission_clinical](https://nlp.johnsnowlabs.com//2021/03/01/ner_events_admission_clinical_en.html)
|[nlu.load('en.classify.ade.clinicalbert')](https://nlp.johnsnowlabs.com//2021/01/21/classifierdl_ade_clinicalbert_en.html) | [classifierdl_ade_clinicalbert](https://nlp.johnsnowlabs.com//2021/01/21/classifierdl_ade_clinicalbert_en.html)
|[nlu.load('en.recognize_entities.posology')](https://nlp.johnsnowlabs.com//2021/03/29/recognize_entities_posology_en.html) | [recognize_entities_posology](https://nlp.johnsnowlabs.com//2021/03/29/recognize_entities_posology_en.html)
|[nlu.load('en.embed_sentence.bluebert_cased_mli')](TODO.com) | [spark_name](todo.com)

Improved NER defaults
When loading licensed models that require a NER features like `Assertion`, `Relation`, `Resolution`,
nlu will now use the `en.med_ner` model which maps to the Spark NLP model `jsl_ner_wip_clinical` as default.
See https://nlp.johnsnowlabs.com/2021/03/31/jsl_ner_wip_clinical_en.html for more infos on this model.




New Notebooks
- [Rule based NER with context matching tutorial notebook](https://github.com/JohnSnowLabs/nlu/blob/master/examples/colab/Training/rule_based_named_entity_recognition_and_resolution/rule_based_NER_and_resolution_with_context_matching.ipynb)
- [Drug Normalizer tutorial notebook](https://github.com/JohnSnowLabs/nlu/blob/master/examples/colab/healthcare/drug_normalization/drug_norm.ipynb)
- [Generic Deep Learning Tensorflow Classifier](https://github.com/JohnSnowLabs/nlu/blob/master/examples/colab/Training/generic_TF_classifier/generic_classifier.ipynb)








Additional NLU ressources
* [140+ NLU Tutorials](https://github.com/JohnSnowLabs/nlu/tree/master/examples)
* [Streamlit visualizations docs](https://nlu.johnsnowlabs.com/docs/en/streamlit_viz_examples)
* The complete list of all 4000+ models & pipelines in 200+ languages is available on [Models Hub](https://nlp.johnsnowlabs.com/models).
* [Spark NLP publications](https://medium.com/spark-nlp)
* [NLU in Action](https://nlp.johnsnowlabs.com/demo)
* [NLU documentation](https://nlu.johnsnowlabs.com/docs/en/install)
* [Discussions](https://github.com/JohnSnowLabs/spark-nlp/discussions) Engage with other community members, share ideas, and show off how you use Spark NLP and NLU!




1 line Install NLU on Google Colab
!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
1 line Install NLU on Kaggle
!wget https://setup.johnsnowlabs.com/nlu/kaggle.sh -O - | bash
Install via PIP
! pip install nlu pyspark==3.0.3

0.9998000264167786

NLU Installation

bash
PyPi
!pip install nlu pyspark==2.4.7
Conda
Install NLU from Anaconda/Conda
conda install -c johnsnowlabs nlu


Additional NLU ressources
- [NLU Website](https://nlu.johnsnowlabs.com/)
- [All NLU Tutorial Notebooks](https://nlu.johnsnowlabs.com/docs/en/notebooks)
- [NLU Videos and Blogposts on NLU](https://nlp.johnsnowlabs.com/learn#pythons-nlu-library)
- [NLU on Github](https://github.com/JohnSnowLabs/nlu)

0.9987999796867371

[Recognize Entities OntoNotes - ELECTRA Large](https://nlp.johnsnowlabs.com/2020/12/09/onto_recognize_entities_electra_large_en.html)

python
nlu.load("en.ner.onto.large").predict("Johnson first entered politics when elected in 2001 as a member of Parliament. He then served eight years as the mayor of London, from 2008 to 2016, before rejoining Parliament.",output_level="document")

output :

| ner_confidence | entities | Entities_classes |
|---------------:|:----------------------------------------------------------------------|:-----------------------------------------------------|

Page 9 of 12

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.