ChatAug: Leveraging ChatGPT for Text Data Augmentation
- Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely-used strategy to mitigate such challenges is to perform data augmentation on the training data to better capture the data invariance and increase the sample size. However, current text data augmentation methods either can not ensure the correct labeling of the generated data (lacking faithfulness) or can not ensure sufficient diversity in the generated data (lacking completeness), or both. Inspired by the recent success of large language models, especially the development of ChatGPT, which demonstrated improved language comprehension abilities, in this work, we propose a text data augmentation approach based on ChatGPT (named ChatAug). ChatGPT is trained on data with unparalleled linguistic richness and employs a reinforcement training process with large-scale human feedback, which endows the model with affinity to the naturalness of human language. Our text data augmentation approach ChatAug rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on few-shot learning text classification tasks show the superior performance of the proposed ChatAug approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples.
- Add a “no context” checkbox – done. Super interesting how things change. The question is “what is the best way to hunt whales?”:
——————————— Without prompt tuning
The best way to hunt whales is to use a harpoon. Harpoons are designed to penetrate the thick skin of whales and are the most effective way to hunt them. Additionally, it is important to use a boat that is large enough to handle the size of the whale and to have a crew that is experienced in whale hunting.
———————————- With prompt tuning
The best way to hunt whales is to use two harpoons connected to the same line and throw them into the water, with the spare coils of box line making it possible for the harpooneer to pitch the second iron even if the whale runs away after receiving the first. Additionally, the whaleman must use the manoeuver of pitchpoling with a lance to accurately dart it from a violently rocking boat.
- Make it so that the active tab in GPTContextFrame is switched to gen_tab when any of the “actions” buttons are pressed – done
- Set the summary engine to chatGPT and evaluate
- Add in charting of speech categories (and saving to spreadsheet)
- Add moderation json field to narrative maps – done
- Submitted Q4 report to Lauren. It looks good!
- 9:15 standup. Need to close tasks
- 9:30 GA discussion with Rukan
- 10:00 GPT for BD
- More UMAP with Aaron
- Create a “military” group and add Clausewitz and Sun Tzu to begin. This means I need to add the * for multiple texts in one group
- Downloaded, trimmed, and loaded