Titel: Relationship Extraction and Data Augmentation from Small Dataset Within the Medical Domain
Översikt
- Datum:Startar 30 oktober 2023, 15:00Slutar 30 oktober 2023, 16:00
- Tillgängliga platser:24
- Plats:
- Språk:Svenska och engelska
Opponents:
Nils Eickhoff
Axel Eiman
Abstract:
This work evaluates the possibility of using machine learning (ML) models for relationship extraction in unstructured text, such as news articles and social media posts, within the medical domain. ML models are compared to a statistical baseline and trained on a small dataset containing approximately 800 examples of texts with varying numbers of entities and relations. The thesis further evaluates the augmentation methods synonym replacement, GPT-augmentation (generating new examples with Open AI's GPT-3) and perturbation. It is found that a combination of binary models is the most suitable model structure and outperforms the baseline on the majority of classes. It is also concluded that, out of the three compared augmentation methods, GPT-generated examples are the most reliable ones which both keep the semantic meaning of the original examples while increasing the model's F1-score.
Welcome!
Lukas, Viktor and Giuseppe