Minh Nguyen

Is an Applied Scientist II at Amazon Web Services (AWS) AI Labs. He is working in the science team behind Amazon Q for Business. Minh obtained a PhD degree in Computer Science at the University of Oregon under the supervision of Professor Thien Huu Nguyen in the UONLP lab. His research areas involve multilingual natural language processing, information extraction, knowledge graph construction, retrieval-augmented generation, and question answering.

Education

  • University of Oregon
    • Ph.D. in Computer and Information Science, 2019 - 2024.
    • Advisor: Prof. Thien Huu Nguyen.
  • Hanoi University of Science and Technology
    • B.E. in Information Systems, 2014 - 2019.
    • Advisor: Dr. Linh Ngo Van.

Experience

Publications (*=equal contribution)

2024


  • Reinforcement Learning from Answer Reranking Feedback for Retrieval-Augmented Answer Generation [To Appear]
    Minh Nguyen, Toan Quoc Nguyen, Kishan KC, Zeyu Zhang, and Thuy Vu.
    Proceedings of INTERSPEECH 2024.


  • Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [To Appear]
    Minh Nguyen, Franck Dernoncourt, Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan Rossi, Quan Hung Tran, Trung Bui, and Thien Huu Nguyen.
    Proceedings of INTERSPEECH 2024.

2023


  • Efficient Fine-tuning Large Language Models for Knowledge-Aware Response Planning [To Appear]
    Minh Nguyen, Kishan KC, Toan Quoc Nguyen, Ankit Chadha, and Thuy Vu.
    Proceedings of ECML-PKDD 2023.


  • Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection [To Appear]
    Minh Nguyen, Kishan KC, Toan Nguyen, Thien Huu Nguyen, Ankit Chadha, and Thuy Vu.
    Proceedings of INTERSPEECH 2023.

2022


  • Learning Cross-Task Dependencies for Joint Extraction of Entities, Events, Event Arguments, and Relations [To Appear]
    Minh Nguyen, Bonan Min, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of EMNLP 2022.

  • FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction [Paper] [Github] [Demo] [Documentation]
    Minh Nguyen, Nghia Trung Ngo, Bonan Min, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022 (System Demonstrations).

  • Joint Extraction of Entities, Relations, and Events via Modeling Inter-Instance and Inter-Label Dependencies [To Appear]
    Minh Nguyen, Bonan Min, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • MECI: A Multilingual Dataset for Event Causality Identification [To Appear]
    Viet Dac Lai, Amir Pouran Ben Veyseh, Minh Nguyen, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of COLING 2022.

  • BehanceMT: A Machine Translation Corpus for Livestreaming Video Transcripts [To Appear]
    Minh Nguyen, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of TU@COLING 2022.

  • Cross-Lingual Event Detection via Optimized Adversarial Training [To Appear]
    Luis Fernando Guzman-Nateras, Minh Nguyen, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection [To Appear]
    Amir Pouran Ben Veyseh, Minh Nguyen, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • Document-Level Event Argument Extraction via Optimal Transport [To Appear]
    Amir Pouran Ben Veyseh, Minh Nguyen, Franck Dernoncourt, Bonan Min, and Thien Huu Nguyen.
    Proceedings of ACL 2022 (Findings).

  • Event Causality Identification via Generation of Important Context Words [To Appear]
    Hieu Man Duc Trong, Minh Nguyen, and Thien Huu Nguyen.
    Proceedings of SEM@NAACL-HLT 2022.

2021


  • Crosslingual Transfer Learning for Relation and Event Extraction via Word Category and Class Alignments [Paper]
    Minh Nguyen, Tuan Ngo Nguyen, Bonan Min and Thien Huu Nguyen.
    Proceedings of EMNLP 2021.

  • Modeling Document-Level Context for Event Detection via Important Context Selection [Paper]
    Amir Pouran Ben Veyseh, Minh Nguyen, Nghia Ngo Trung, Bonan Min and Thien Huu Nguyen.
    Proceedings of EMNLP 2021.

  • Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks [Paper] [Demo]
    Minh Nguyen, Viet Dac Lai and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2021.

  • Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing [Paper] [Github] [Demo] [Documentation]
    Minh Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh and Thien Huu Nguyen.
    Proceedings of EACL 2021 (System Demonstrations).
    (EACL2021 Outstanding Demo Paper Award)

  • Event Extraction from Historical Texts: A New Dataset for Back Rebellions [Paper]
    Viet Dac Lai, Minh Nguyen, Heidi Kaufman, and Thien Huu Nguyen.
    Proceedings of ACL-IJCNLP 2021 (Findings).

  • Augmenting Open-Domain Event Detection with Synthetic Data from GPT-2 [Paper]
    Amir Pouran Ben Veyseh, Minh Nguyen, Bonan Min and Thien Huu Nguyen.
    Proceedings of ECML PKDD 2021.

  • Graph Learning Regularization and Transfer Learning for Few-Shot Event Detection [Paper]
    Viet Dac Lai, Minh Nguyen, Thien Huu Nguyen, and Franck Dernoncourt.
    Proceedings of SIGIR 2021.

  • Fine-grained Temporal Relation Extraction with Ordered-Neuron LSTM and Graph Convolutional Networks [Paper]
    Minh Phu Tran *, Minh Nguyen *, and Thien Huu Nguyen.
    Proceedings of WNUT@EMNLP 2021.

  • Improving Cross-Lingual Transfer for Event Argument Extraction with Language-Universal Sentence Structures [Paper]
    Minh Nguyen and Thien Huu Nguyen.
    Proceedings of WANLP@EACL 2021.

  • Learning Cross-lingual Representations for Event Coreference Resolution with Multi-view Alignment and Optimal Transport [Paper]
    Duy Phung, Hieu Minh Tran, Minh Nguyen, and Thien Huu Nguyen.
    Proceedings of MRL@EMNLP 2021.

2018


  • Who is Killed by Police: Introducing Supervised Attention for Hierarchical LSTMs [Paper]
    Minh Nguyen and Thien Huu Nguyen.
    Proceedings of COLING 2018.

  • A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing [Paper]
    Minh Nguyen, Toan Nguyen and Thien Huu Nguyen.
    Proceedings of IWSPA@CODASPY 2018.

Software

I am the lead author of the following software:

  • Trankit: a state-of-the-art multilingual NLP system that outperforms other popular toolkits such as Stanza, UDPipe, Stanford CoreNLP, and spaCy on sentence and word segmentation, part of speech tagging, dependency parsing, and morphological tagging for 56 languages. Github, Demo, Documentation.

  • FAMIE: a novel multilingual active learning framework that supports smart data labeling and model training for named entity recognition, event detection, and event argument extraction tasks for 56 languages. Github, Demo, Documentation.

  • FourIE: Our work proposes a state-of-the-art mulitlingual system that can simultaneously extract events, entities, relations from English, Chinese, and Spanish text. Demo for English.

Skills

Python, Pytorch, Numpy, Scikit-learn, Bash/Shell, Vim, Tmux, Git, Docker, Linux Operating System.

Honors and Awards

  • Gurdeep Pall Graduate Student Fellowship, University of Oregon, 2022-2023.
  • Invited to present our work - FAMIE at IARPA’s Demo Day 2022.
  • Outstanding Demo Paper Award, EACL 2021.
  • Erwin & Gertrude Juilfs Scholarship, University of Oregon, 2021.

Academic Service

  • Program Committee: SDU@AAAI {2021,2022}, AAAI {2021,2024}, EMNLP {2021,2022}, SemEval {2022}, ARR {2022}, ECAI {2023}.
  • Reviewer: Neural Computing Journal 2022.