Hi!

News: I am looking for an internship for the Summer 2022. Please contact me if you are interested.

I am a third-year Ph.D. student in Computer and Information Science at the University of Oregon. I have been working with Prof. Thien Huu Nguyen in the UONLP lab in Multilingual Natural Language Processing with the focus on Information Extraction, whose goal is to develop systems for extracting structured information from unstructured text in different languages. Two recent notable projects, where I am the lead author, are Trankit (a state-of-the-art multilingual NLP toolkit for 56 languages) and FourIE (a state-of-the-art multilingual joint IE system for English, Chinese, and Spanish). I am also a research assistant for the IARPA’s BETTER project, where I have been building systems to extract events and arguments of different granularities (i.e., Abstract, Basic, and Granular level) from English, Arabic, and Farsi text.

Before starting my Ph.D., I received my bachelor’s degree in Computer Science from the Hanoi University of Science and Technology and was a member of the DS lab under the supervision of Prof. Khoat Than and Dr. Linh Ngo Van.

Education

  • University of Oregon
    • Ph.D. in Computer and Information Science, 2019 - .
    • Advisor: Prof. Thien Huu Nguyen.
  • Hanoi University of Science and Technology
    • B.S. in Computer Science, 2014 - 2019.
    • Advisor: Dr. Linh Ngo Van.

Publications (*=equal contribution)

2021


  • Crosslingual Transfer Learning for Relation and Event Extraction via Word Category and Class Alignments [To Appear]
    Minh Van Nguyen, Tuan Ngo Nguyen, Bonan Min and Thien Huu Nguyen.
    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021).

  • Modeling Document-Level Context for Event Detection via Important Context Selection [To Appear]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Nghia Ngo Trung, Bonan Min and Thien Huu Nguyen.
    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021).

  • Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks [Paper] [Demo]
    Minh Van Nguyen, Viet Dac Lai and Thien Huu Nguyen.
    Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021).

  • Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing [Paper] [Github] [Demo] [Documentation]
    Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh and Thien Huu Nguyen.
    Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (EACL 2021 Demo).
    (EACL2021 Outstanding Demo Paper Award)

  • Event Extraction from Historical Texts: A New Dataset for Back Rebellions [Paper]
    Viet Dac Lai, Minh Van Nguyen, Heidi Kaufman, and Thien Huu Nguyen.
    Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Findings ACL-IJCNLP 2021).

  • Augmenting Open-Domain Event Detection with Synthetic Data from GPT-2 [To Appear]
    Amir Pouran Ben Veyseh, Minh Van Nguyen, Bonan Min and Thien Huu Nguyen.
    Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021).

  • Graph Learning Regularization and Transfer Learning for Few-Shot Event Detection [Paper]
    Viet Dac Lai, Minh Van Nguyen, Thien Huu Nguyen, and Franck Dernoncourt.
    Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021).

  • Fine-grained Temporal Relation Extraction with Ordered-Neuron LSTM and Graph Convolutional Networks [To Appear]
    Minh Phu Tran *, Minh Van Nguyen *, and Thien Huu Nguyen.
    Proceedings of the 7th Workshop on Noisy User-generated Text at EMNLP 2021 (WNUT@EMNLP 2021).

  • Improving Cross-Lingual Transfer for Event Argument Extraction with Language-Universal Sentence Structures [Paper]
    Minh Van Nguyen and Thien Huu Nguyen.
    Proceedings of the 6th Arabic Natural Language Processing Workshop at EACL 2021 (WANLP@EACL 2021).

  • Learning Cross-lingual Representations for Event Coreference Resolution with Multi-view Alignment and Optimal Transport [To Appear]
    Duy Phung, Hieu Minh Tran, Minh Van Nguyen, and Thien Huu Nguyen.
    Proceedings of the 1st Workshop on Multilingual Representation Learning at EMNLP 2021 (MRL@EMNLP 2021).

2018


  • Who is Killed by Police: Introducing Supervised Attention for Hierarchical LSTMs [Paper]
    Minh Van Nguyen and Thien Huu Nguyen.
    Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018).

  • A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing [Paper]
    Minh Van Nguyen, Toan Nguyen and Thien Huu Nguyen.
    Proceedings of the 1st Anti-Phishing Shared Pilot at 4th ACM International Workshop on Security and Privacy Analytics Academic Service (IWSPA@CODASPY 2018).

Software

I am the lead author of the following software:

  • FourIE: a neural information extraction system that annotates text for entity mentions (names, pronouns, and nominals of people, organizations, locations, etc), relations (between two entity mentions), event triggers and argument roles using the information schema defined in the ACE 2005 dataset. FourIE leverages deep learning and graph convolutional networks to jointly perform four tasks in information extraction, i.e., entity mention detection, relation extraction, event detection and argument role prediction in an end-to-end fashion. FourIE can work for 3 languages (English, Chinese, and Spanish). Demo for English.
  • Trankit: a light-weight transformer-based toolkit for multilingual NLP that can process raw text and support fundamental NLP tasks for 56 languages. Trankit is based on recent advances on multilingual pre-trained language models, providing state-of-the-art performance for Sentence Segmentation, Part-of-Speech Tagging, Morphological Feature Tagging, Dependency Parsing, and Named Entity Recognition over 90 Universal Dependencies treebanks. Trankit is written in Python and can be installed via pip. Github, Demo, Documentation.

Projects

  • IARPA Better Extraction from Text Towards Enhanced Retrieval (BETTER)
    • Research Assistant, January 2020 - .
    • I am a Research Assistant for the project where I’ve been building different cross-lingual information extraction systems (with English as the source language) for extracting events in the form of who-did-what-to-whom-when-where, at different granularity levels of information, across various target languages (e.g., Arabic, Farsi).

Skills

Python, Pytorch, Numpy, Scikit-learn, Bash/Shell, Vim, Tmux, Git, Docker, Linux Operating System.

Teaching

  • CIS 471: Introduction to Artificial Intelligence [Class Page]
    • Teaching Assistant, Fall 2019.
    • I was a Teaching Assistant for the class where I helped undergraduate students understand fundamental concepts and problems in Artificial Intelligence.

Honors and Awards

  • Outstanding Demo Paper Award, EACL 2021.
  • Erwin & Gertrude Juilfs Scholarship, University of Oregon, 2021.

Academic Service

  • Program Committee: SDU@AAAI 2021, AAAI 2021, EMNLP 2021.
  • Reviewer: Neural Computing Journal.