Documentation
Changes
- Added `allow_overlap_entities` parameter to `FrameExtractor.extract_frames()` method. This allows LLM to output multiple frames with overlapping entity spans. For example, the text below has two "headache" mentions,
python
text = "In trial 12345, headache was reported in 5% of patients, while headache was reported in 10% of patients in arm A."
LLM generated:
"[
{"ClinicalTrial": "12345", "Arm": "A", "AdverseReaction": "headache", "Percentage": "10%"},
{"ClinicalTrial": "12345", "Arm": "", "AdverseReaction": "headache", "Percentage": "5%"}
]"
When `allow_overlap_entities=False`, the two frames will be the two "headache" mentions:
python
[
{'frame_id': '0', 'start': 17, 'end': 25, 'entity_text': 'headache', 'attr': {'ClinicalTrial': 'trial 12345', 'Arm': 'arm A', 'Percentage': '5%'}}
{'frame_id': '1', 'start': 64, 'end': 72, 'entity_text': 'headache', 'attr': {'ClinicalTrial': 'trial 12345', 'Arm': 'arm A', 'Percentage': '10%'}}
]
While `allow_overlap_entities=True`, the two frames will overlap on the first mention:
python
[
{'frame_id': '0', 'start': 17, 'end': 25, 'entity_text': 'headache', 'attr': {'ClinicalTrial': '12345', 'Arm': 'A', 'Percentage': '10%'}}
{'frame_id': '1', 'start': 17, 'end': 25, 'entity_text': 'headache', 'attr': {'ClinicalTrial': '12345', 'Percentage': '5%'}}
]
- Fixed **UnboundLocalError** in `extract_async()`. The issue happened when input `text_content` is too short to be sentence tokenized.