Hi!

I am a third-year Ph.D. student in Computer and Information Science at the University of Oregon. I have been working with Prof. Thien Huu Nguyen in the UONLP lab in Multilingual Information Extraction (Multilingual IE), which aims to develop systems for extracting structured information from unstructured text in different languages. Recent notable projects, where I am the lead author, are Trankit (a state-of-the-art multilingual NLP toolkit for 56 languages), FourIE (a state-of-the-art multilingual joint IE system for English, Chinese, and Spanish), and FAMIE (a fast active learning framework for multilingual IE for 100 languages). I am also a research assistant for the IARPA’s BETTER project, where I have been building systems to extract events and arguments of different granularities (i.e., Abstract, Basic, and Granular levels) from English, Arabic, Farsi, Chinese, Russian, and Korean text.

Pursuing Multilingual IE, my ultimate goal is to expand the application of IE (e.g., news summarization, information retrieval) to many languages, and contribute to the democratization of IE across the languages. I believe this can give equal information access to populations speaking different languages in the world.

Education

  • University of Oregon
    • Ph.D. in Computer and Information Science, 2019 - Present.
    • Advisor: Prof. Thien Huu Nguyen.
  • Hanoi University of Science and Technology
    • B.E. in Information Systems, 2014 - 2019.
    • Advisor: Dr. Linh Ngo Van.

Experience

Publications (*=equal contribution)

2022


  • FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction [Paper] [Github] [Demo] [Documentation]
    Minh Van Nguyen, Nghia Trung Ngo, Bonan Min, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022 (System Demonstrations).

  • Joint Extraction of Entities, Relations, and Events via Modeling Inter-Instance and Inter-Label Dependencies [To Appear]
    Minh Van Nguyen, Bonan Min, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • Cross-Lingual Event Detection via Optimized Adversarial Training [To Appear]
    Luis Fernando Guzman-Nateras, Minh Van Nguyen, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection [To Appear]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2022.

  • Document-Level Event Argument Extraction via Optimal Transport [To Appear]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, Bonan Min, and Thien Huu Nguyen.
    Proceedings of ACL 2022 (Findings).

  • Event Causality Identification via Generation of Important Context Words [To Appear]
    Hieu Man Duc Trong, Minh Van Nguyen, and Thien Huu Nguyen.
    Proceedings of the 11th Joint Conference on Lexical and Computational Semantics (*SEM 2022) at NAACL-HLT 2022.

2021


  • Crosslingual Transfer Learning for Relation and Event Extraction via Word Category and Class Alignments [Paper]
    Minh Van Nguyen, Tuan Ngo Nguyen, Bonan Min and Thien Huu Nguyen.
    Proceedings of EMNLP 2021.

  • Modeling Document-Level Context for Event Detection via Important Context Selection [Paper]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Nghia Ngo Trung, Bonan Min and Thien Huu Nguyen.
    Proceedings of EMNLP 2021.

  • Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks [Paper] [Demo]
    Minh Van Nguyen, Viet Dac Lai and Thien Huu Nguyen.
    Proceedings of NAACL-HLT 2021.

  • Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing [Paper] [Github] [Demo] [Documentation]
    Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh and Thien Huu Nguyen.
    Proceedings of EACL 2021 (System Demonstrations).
    (EACL2021 Outstanding Demo Paper Award)

  • Event Extraction from Historical Texts: A New Dataset for Back Rebellions [Paper]
    Viet Dac Lai, Minh Van Nguyen, Heidi Kaufman, and Thien Huu Nguyen.
    Proceedings of ACL-IJCNLP 2021 (Findings).

  • Augmenting Open-Domain Event Detection with Synthetic Data from GPT-2 [Paper]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Bonan Min and Thien Huu Nguyen.
    Proceedings of ECML PKDD 2021.

  • Graph Learning Regularization and Transfer Learning for Few-Shot Event Detection [Paper]
    Viet Dac Lai, Minh Van Nguyen, Thien Huu Nguyen, and Franck Dernoncourt.
    Proceedings of SIGIR 2021.

  • Fine-grained Temporal Relation Extraction with Ordered-Neuron LSTM and Graph Convolutional Networks [Paper]
    Minh Phu Tran *, Minh Van Nguyen *, and Thien Huu Nguyen.
    Proceedings of WNUT@EMNLP 2021.

  • Improving Cross-Lingual Transfer for Event Argument Extraction with Language-Universal Sentence Structures [Paper]
    Minh Van Nguyen and Thien Huu Nguyen.
    Proceedings of WANLP@EACL 2021.

  • Learning Cross-lingual Representations for Event Coreference Resolution with Multi-view Alignment and Optimal Transport [Paper]
    Duy Phung, Hieu Minh Tran, Minh Van Nguyen, and Thien Huu Nguyen.
    Proceedings of MRL@EMNLP 2021.

2018


  • Who is Killed by Police: Introducing Supervised Attention for Hierarchical LSTMs [Paper]
    Minh Van Nguyen and Thien Huu Nguyen.
    Proceedings of COLING 2018.

  • A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing [Paper]
    Minh Van Nguyen, Toan Nguyen and Thien Huu Nguyen.
    Proceedings of IWSPA@CODASPY 2018.

Software

I am the lead author of the following software:

  • Trankit: a light-weight transformer-based toolkit for multilingual NLP that can process raw text and support fundamental NLP tasks for 56 languages. Trankit is based on recent advances on multilingual pre-trained language models, providing state-of-the-art performance for Sentence Segmentation, Part-of-Speech Tagging, Morphological Feature Tagging, Dependency Parsing, and Named Entity Recognition over 90 Universal Dependencies treebanks. Trankit is written in Python and can be installed via pip. Github, Demo, Documentation.

  • FAMIE: a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE). FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration. With a novel proxy AL mechanism and the integration of our SOTA multilingual toolkit Trankit, it takes FAMIE only a few hours to provide users with a labeled dataset and a ready-to-use model for different IE tasks over 100 languages. Github, Demo, Documentation.

  • FourIE: a neural information extraction system that annotates text for entity mentions (names, pronouns, and nominals of people, organizations, locations, etc), relations (between two entity mentions), event triggers and argument roles using the information schema defined in the ACE 2005 dataset. FourIE leverages deep learning and graph convolutional networks to jointly perform four tasks in information extraction, i.e., entity mention detection, relation extraction, event detection and argument role prediction in an end-to-end fashion. FourIE can work for 3 languages (English, Chinese, and Spanish). Demo for English.

Skills

Python, Pytorch, Numpy, Scikit-learn, Bash/Shell, Vim, Tmux, Git, Docker, Linux Operating System.

Honors and Awards

  • Outstanding Demo Paper Award, EACL 2021.
  • Erwin & Gertrude Juilfs Scholarship, University of Oregon, 2021.

Academic Service

  • Program Committee: SDU@AAAI {2021,2022}, AAAI 2021, EMNLP 2021, SemEval {2022}, ARR {2022}.
  • Reviewer: Neural Computing Journal 2022.