Reuse what you can, differentiate what you must: Universal principles in the organization of meaning

Thomas Brochhagen Jun 15, 2026

Here is a question that sounds far easier than it is: what is a word?

It feels like something the language sciences must have settled long ago. They have not. Once you look across the world's languages, the idea of a neat unit called "the word" starts to dissolve. Where one language draws a crisp boundary, another runs several meanings together into a single long form, or splits apart what we would treat as one word. The "word" is one of the most useful and most slippery objects we work with.

That slipperiness is exactly where this paper began. A lot of recent work, including my own, has shown that there are striking regularities in how languages map meanings onto whole word forms. But if the boundary of the word is itself blurry, there was never any reason to expect the interesting patterns to stop neatly at that boundary. Why would the same forces that shape whole words fall silent just below them, among the pieces of words? That question had largely gone untackled, and it is the gap we set out to fill.

Two kinds of reuse

Start with a familiar pattern. Languages love to reuse the same word form for related meanings. Spanish lengua means both "tongue" and "language." Quechua shimi means both "word" and "mouth." When one form carries multiple meanings like this, linguists call it a colexification. Colexification thus follows a universal tendency: languages tend to give the same word to meanings that are related to one another.

But languages do not only recycle whole word forms. They also recycle parts of them. English grandfather and grandmother share the piece grand, which attaches to family words to mean "one generation further away". For instance, grandaunt, or, more playfully, grand-supervisor. Mandarin uses zuǐ (嘴) for "mouth" and zuǐchún (嘴唇) for "lips," reusing the first piece. Two meanings sharing part, but not all, of a form is what is called a partial colexification, and it is just as widespread across the world's languages as the whole-word kind. It had simply received far less attention.

One idea, pulling two ways

Our hypothesis was that both kinds of reuse are shaped by the same two pressures, pulling against each other.

The first is a pressure for compression. Reusing a form is efficient: fewer things to memorize, and a way to lean on meanings you already know when guessing at new ones. If two meanings are closely related, assigning them to the same form makes the system easier to learn and to use.

The second is a pressure for differentiation. If two meanings are easy to confuse (if they keep turning up in the same kinds of situations) then giving them the same form invites misunderstanding. Here it pays to keep the forms apart.

Reusing a whole form is what you get when meanings are both similar and easy to tell apart in context. But what about meanings that are similar yet genuinely confusable? Our idea was that reusing just a piece of a word is the elegant compromise: you keep some of the savings of reuse, because the forms share material, while sidestepping the confusion, because the forms are not identical. A little of both.

Deciding what counts

To test this we used a curated collection of data spanning nearly 2,000 languages from almost 200 language families. For any pair of meanings we asked which of three things a language does: give them completely different forms, share part of a form, or share the whole form.

And here is where one of our many challenges lay: not in the computation, but in deciding what our units of analysis even were. If the notion of a "word" is already fuzzy, the notion of a meaningfully reused piece of one is fuzzier still. How much overlap between two forms counts as genuine reuse rather than coincidence? How often does a pattern have to recur before we are willing to call it a pattern at all?

Our pragmatic answer was to set a threshold: we only counted a case of reuse if it showed up in at least a few different language families, so that we were tracking a real cross-linguistic signal. Reassuringly, our findings hold whether we loosen or tighten that threshold. But, to be clear: this is a choice, not a discovery. Just how much recurrence, within a language, should count as partial reuse is a genuinely open question; one I hope this paper helps push to the foreground rather than pretends to close.

What we found

Two results stand out. First, whole-word and part-word reuse recruit different meanings. Second, the interaction between how similar two meanings are and how confusable they are in context best predicts which pattern a language tends to use. As we hypothesized, meanings that are close but rarely share a context, like "breast" and "suck," are more likely to share a whole form. Meanings that are close but crowd into the same contexts, like "hear" and "listen" or "ankle" and "wrist," are more likely to share only a piece. And meanings that are neither close nor contextually similar, like "beard" and "two," stay fully apart. The compromise we predicted appears where the theory says it should.

Why it matters

The satisfying part is the unification. Reuse at the word level and below it turn out to be two settings on the same dial: squeeze the lexicon for efficiency, but keep forms distinct enough to tease meanings apart in context.

It also points somewhere. Partial word reuse turned out to be more variable across language families than whole-word reuse, probably because it bundles together several different ways of building words, each with its own typological quirks. Teasing those apart is the obvious next step; as is the deeper, still-unsettled question we kept bumping into: what, exactly, is a unit worth counting? For a problem that may start with something as small as a shared syllable, that turns out to be a surprisingly deep place to end up.

You can read the full paper in Nature Human Behaviour: https://www.natu—re.com/articles/s41562-026-02488-3