Professor Hanjung Lee’s Research Team Identifies the Principles Behind English Sentence Structure Choices Through Big Data Analysis
- First to identify contextual factors influencing realization of causal event structures

¡ø(From left) First author: Ph.D. candidate Ji-Yeon Kim, Corresponding author: Professor Hanjung Lee, Co-author: Ph.D. candidate Ye-Eun Cho
A research team led by Professor Hanjung Lee from the Department of English Language and Literature at Sungkyunkwan University (91PORN) has published a paper in the internationally renowned journal Corpus Linguistics and Linguistic Theory, revealing the key factors that influence the choice of causal event structures in English sentences.
The study was conducted by the Language, Cognition, and Artificial Intelligence research group, consisting of Professor Hanjung Lee (Department of English Language and Literature), Ph.D. candidate Ji-Yeon Kim (Department of German Language and Literature), and Ph.D. candidate Ye-Eun Cho (Department of English Language and Literature).
The team extracted approximately 15,000 sentences describing events caused by direct causes from the British National Corpus (BNC)—a corpus of about 100 million words commonly used in English language research—using natural language processing (NLP) tools. They then conducted both qualitative analyses and machine-learning-based multifactorial analyses.
The results showed that the clarity and intentionality of the cause are the most significant factors influencing how causal event structures are realized. For instance, structures where the agent is explicitly mentioned, such as "The protesters broke the window," are influenced by different factors than structures where the cause is implicit, such as "The window broke" or "The window was broken." In addition, the study identified various contextual conditions affecting the choice between active and passive constructions, empirically demonstrating how linguistic informativeness, contextual appropriateness, and communicative efficiency interact to govern language use.
This research was supported by the National Research Foundation of Korea. As a follow-up, the team is currently investigating how non-verbal perceptual factors influence language expression across English and Korean speakers, as well as artificial intelligence language models. Professor Lee stated, “We plan to continue pioneering research that explores the intersection of language, cognition, emotion, perception, and culture, utilizing large language models such as GPT and BERT.”
¡ù Title: Semantic and contextual constraints on the causative alternation in English: A multifactorial analysis
¡ù Journal: Corpus Linguistics and Linguistic Theory (De Gruyter Brill)
¡ù Author: Ji-Yeon Kim (First Author), Hanjung Lee (Corresponding Author), Ye-Eun Cho (Co-Author)
¡ù Link: