How infants' utterances grow: A probabilistic account of early language development

Qihui Xu, Martin Chodorow, Virginia Valian

Research output: Contribution to journalArticlepeer-review


Why are children's first utterances short and ungrammatical, with some obvious constructions missing? What determines the lengthening of children's early utterances over time? The literature is replete with references to a one-word, a two-word, and a later multiword stage in language development, but with little empirical evidence, and with little account for how and why utterances grow. To address these questions, we analyze speech samples from 25 children between the ages of 14 and 43 months; we construct distributions of their utterances of lengths one to five by age. Our novel findings are that multiword utterances of different lengths appear early in acquisition and increase together until they reach relatively stable proportions similar to those found in parents' input. To explain such patterns, we develop a probabilistic computational model, VIRTUAL, that posits an interaction between a) varying, increasing resources from various developmental domains and b) target utterance lengths mirroring the input. VIRTUAL successfully accounts for most of the empirical patterns, suggesting a probabilistic and dynamic process that is nonetheless compatible with apparent distinct milestones in development. We provide a new, systematic way of showing how developmental cascade theories could work in language development. Our findings and model also suggest insights into syntactic, semantic, and cognitive development.

Original languageEnglish
Article number105275
StatePublished - Jan 2023
Externally publishedYes


  • Computational models
  • Corpus linguistics
  • Language acquisition
  • Utterance length


Dive into the research topics of 'How infants' utterances grow: A probabilistic account of early language development'. Together they form a unique fingerprint.

Cite this