Hi!

I am a third-year Ph.D. student in Computer and Information Science at the University of Oregon. I have been working with Prof. Thien Huu Nguyen in the UONLP lab in Multilingual Information Extraction (Multilingual IE), which aims to develop systems for extracting structured information from unstructured text in different languages. Recent notable projects, where I am the lead author, are Trankit (a state-of-the-art multilingual NLP toolkit for 56 languages), FourIE (a state-of-the-art multilingual joint IE system for English, Chinese, and Spanish), and FAMIE (a fast active learning framework for multilingual IE for 56 languages). I am also a research assistant for the IARPA’s BETTER project, where I have been building systems to extract events and arguments of different granularities (i.e., Abstract, Basic, and Granular levels) from English, Arabic, Farsi, Chinese, Russian, and Korean text.

Education

  • University of Oregon
    • Ph.D. in Computer and Information Science, 2019 - Present.
    • Advisor: Prof. Thien Huu Nguyen.
  • Hanoi University of Science and Technology
    • B.E. in Information Systems, 2014 - 2019.
    • Advisor: Dr. Linh Ngo Van.

Experience

Publications (*=equal contribution)

2022


  • FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction [Paper] [Github] [Demo] [Documentation]
    Minh Van Nguyen, Nghia Trung Ngo, Bonan Min, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022 (System Demonstrations).

  • Joint Extraction of Entities, Relations, and Events via Modeling Inter-Instance and Inter-Label Dependencies [To Appear]
    Minh Van Nguyen, Bonan Min, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • Cross-Lingual Event Detection via Optimized Adversarial Training [To Appear]
    Luis Fernando Guzman-Nateras, Minh Van Nguyen, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection [To Appear]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • Document-Level Event Argument Extraction via Optimal Transport [To Appear]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, Bonan Min, and Thien Huu Nguyen.
    Proceedings of ACL 2022 (Findings).

  • Event Causality Identification via Generation of Important Context Words [To Appear]
    Hieu Man Duc Trong, Minh Van Nguyen, and Thien Huu Nguyen.
    Proceedings of the 11th Joint Conference on Lexical and Computational Semantics (SEM 2022) at NAACL-HLT 2022.

  • MECI: A Multilingual Dataset for Event Causality Identification [To Appear]
    Viet Dac Lai, Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022)

2021


  • Crosslingual Transfer Learning for Relation and Event Extraction via Word Category and Class Alignments [Paper]
    Minh Van Nguyen, Tuan Ngo Nguyen, Bonan Min and Thien Huu Nguyen.
    Proceedings of EMNLP 2021.

  • Modeling Document-Level Context for Event Detection via Important Context Selection [Paper]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Nghia Ngo Trung, Bonan Min and Thien Huu Nguyen.
    Proceedings of EMNLP 2021.

  • Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks [Paper] [Demo]
    Minh Van Nguyen, Viet Dac Lai and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2021.

  • Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing [Paper] [Github] [Demo] [Documentation]
    Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh and Thien Huu Nguyen.
    Proceedings of EACL 2021 (System Demonstrations).
    (EACL2021 Outstanding Demo Paper Award)

  • Event Extraction from Historical Texts: A New Dataset for Back Rebellions [Paper]
    Viet Dac Lai, Minh Van Nguyen, Heidi Kaufman, and Thien Huu Nguyen.
    Proceedings of ACL-IJCNLP 2021 (Findings).

  • Augmenting Open-Domain Event Detection with Synthetic Data from GPT-2 [Paper]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Bonan Min and Thien Huu Nguyen.
    Proceedings of ECML PKDD 2021.

  • Graph Learning Regularization and Transfer Learning for Few-Shot Event Detection [Paper]
    Viet Dac Lai, Minh Van Nguyen, Thien Huu Nguyen, and Franck Dernoncourt.
    Proceedings of SIGIR 2021.

  • Fine-grained Temporal Relation Extraction with Ordered-Neuron LSTM and Graph Convolutional Networks [Paper]
    Minh Phu Tran *, Minh Van Nguyen *, and Thien Huu Nguyen.
    Proceedings of WNUT@EMNLP 2021.

  • Improving Cross-Lingual Transfer for Event Argument Extraction with Language-Universal Sentence Structures [Paper]
    Minh Van Nguyen and Thien Huu Nguyen.
    Proceedings of WANLP@EACL 2021.

  • Learning Cross-lingual Representations for Event Coreference Resolution with Multi-view Alignment and Optimal Transport [Paper]
    Duy Phung, Hieu Minh Tran, Minh Van Nguyen, and Thien Huu Nguyen.
    Proceedings of MRL@EMNLP 2021.

2018


  • Who is Killed by Police: Introducing Supervised Attention for Hierarchical LSTMs [Paper]
    Minh Van Nguyen and Thien Huu Nguyen.
    Proceedings of COLING 2018.

  • A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing [Paper]
    Minh Van Nguyen, Toan Nguyen and Thien Huu Nguyen.
    Proceedings of IWSPA@CODASPY 2018.

Software

I am the lead author of the following software:

  • Trankit: a state-of-the-art multilingual NLP system that outperforms other popular toolkits such as Stanza, UDPipe, Stanford CoreNLP, and spaCy on sentence and word segmentation, part of speech tagging, dependency parsing, and morphological tagging for 56 languages. Github, Demo, Documentation.

  • FAMIE: a novel multilingual active learning framework that supports smart data labeling and model training for named entity recognition, event detection, and event argument extraction tasks for 56 languages. Github, Demo, Documentation.

  • FourIE: Our work proposes a state-of-the-art mulitlingual system that can simultaneously extract events, entities, relations from English, Chinese, and Spanish text. Demo for English.

Skills

Python, Pytorch, Numpy, Scikit-learn, Bash/Shell, Vim, Tmux, Git, Docker, Linux Operating System.

Honors and Awards

  • Gurdeep Pall Graduate Student Fellowship, University of Oregon, 2022.
  • Invited to present our work - FAMIE at IARPA’s Demo Day 2022.
  • Outstanding Demo Paper Award, EACL 2021.
  • Erwin & Gertrude Juilfs Scholarship, University of Oregon, 2021.

Academic Service

  • Program Committee: SDU@AAAI {2021,2022}, AAAI {2021,2023}, EMNLP {2021,2022}, SemEval {2022}, ARR {2022}.
  • Reviewer: Neural Computing Journal 2022.