The Story of H
Well, maybe not the whole story, but some interesting parts of it.

H is a fascinating letter, protean in the variety of roles that it plays in the different version of the Latin alphabet, according to language. Even in Latin, from the start, it had two distinct functions.

Among the members of the alphabet, it’s an oddball: it comes from the Greek H (eta), but this is a vowel letter (representing long /ɛ/ in classical Greek and /i/ in Byzantine and Modern Greek). The Roman H, on the other hand, is closer in function to the Semitic ancestor of eta, ḥet (ח in Hebrew): it’s also a consonant letter, at least in its primary function, and in classical Latin it represents the voiceless glottal fricative /h/.

When it stands at the beginning of a word derived from Greek it represents the rough breathing or aspiration of the initial vowel, something that classical Greek didn’t indicate at all, while Byzantine Greek used a reverse apostrophe (‘).

At some point in late Roman history the /h/ was lost, since none of the Romance languages have kept it. Some of them reacquired the sound later from other sources (French from Germanic loanwords, Romanian from Greek or Slavic ones, Spanish from an evolution of Latin words with F), only to lose it again, except for Romanian.

The other function of H in Latin is as the equivalent of a diacritic in modifying the sound of certain consonants, namely C, P, R and T. The digraphs CH, PH, RH and TH are used to represent X (chi), Φ (phi), P (rho) and Θ (theta), respectively, in words of Greek origin. Several European languages (including French, English and German) have retained these digraphs, with the pronunciation of the words adapted to the language in question.

Several languages have also adopted the digraphs to represent sounds in words that are not borrowed from Greek via Latin, including native words. In German, CH typically represents /x/ or /ɕ/, but at the beginning of a word it’s usually /k/ (as in Charakter, Chemnitz or Chur), except in Swiss-German, where word-initial /x/ (or /χ/) is common. In Italian and Romanian, CH came to stand for /k/ before E or I. French, English, Spanish and Portuguese, on the other hand, employed CH to denote /ʧ/ (which in French and Portuguese eventually changed to /ʃ/). English also adopted TH for the sounds that previously had been written with þ and ð.

Languages have further extended the quasi-diacritic use of H to other consonants. Italian and Romanian have GH analogous to CH. In English the main example is SH (since /ʃ/ is an important phoneme in English). But in Middle English there was also GH for /χ/; this no longer exists in modern English, having changed to /f/ or become silent, while in Scottish – which still has it – it’s written CH. English also GH where the H is altogether redundant, as in ghost or burgher, and the variety of possible pronunciations of words containing GH (cough – tough – though – through) is practically a joke.

In transcriptions from other languages, especially Middle Eastern or South Asian (sometimes also Modern Greek), GH is used to represent /ɣ/ or /gh/ (Baghdad, afghan), though in practice English-speakers read this as /g/.

Similarly, in the transcription of words containing /x/, /χ/ or /kh/ from languages that don’t use the Latin alphabet, English typicallly uses KH (with the sound typically reverting to /k/ – as in khaki – except in careful speech). In Yiddish words, however, it’s CH that is commonly used (as in chutzpah), perhaps under the (often mistaken) assumption that such words are of German origin, though (as I wrote above) standard German has no word-initial /x/. On the other hand, many Jews, especially Orthodox, unaccountably (given the potential for misreading) seem to prefer Chanukah to Hanukkah and the like.

In the same way, SCH is used to designate /ʃ/ (for example schmooze). This trigraph represents the German practice (though the perfectly good German nasch is usually spelled nosh in English.); this convention has in turn led to the tetragraph TSCH (and sometimes even the pentagraph TZSCH, as in Nietzsche) for /ʧ/, and the tetragraph DSCH for /ʤ/.

In English the normal pronunciation of SCH may be either /sk/ (as in school) or, less often, /ʃ/ (as in schist). Ignorance of how the combination is pronounced in other languages (/sk/ in Italian, /sx/ or [word-final] /s/ in Dutch) often leads to mispronunciation (/bru'ʃɛtə/ for bruschetta, /grolʃ/ for Grolsch and the like).

Lastly, English uses ZH to represent /ʒ/, and this pronunciation is usually maintained even in reading Chinese words written in pinyin, where this digraph represents (approximately) /ʤ/, so that a word written zhu is read as [ʒu] rather than [ʤu], while this last reading would be applied to ju, meant to be pronounced somewhat like [ʥy].

Occitan adopted H to mark the palatalization of L and N, something that scribes in Castile and Leon – whether writing in Castilian, Asturian or Galician – did by doubling the letter. (By the scribal convention whereby a tilde stood for an intercalated lower-case N, NN eventually became Ñ.) Aragonese scribes, on the other hand, followed the Catalan convention of LL or YL – depending on poisition – and NY. (In modern Catalan YL has been replaced by LL.)

When Portugal became independent and Portuguese emerged as a language separate from Galician, the Occitan rule (LH, NH) was adopted, probably in an effort to make Portuguese look different from Spanish.

Another language with a fondness for the quasi-diacritic use of H in its alphabet is Albanian. Not only does it use SH, TH (for /θ/ only) and ZH as in English, but also DH for /ð/ and XH for /ʤ/. Only /ʧ/ is written Ç (as in Turkish), not CH.

Back to Spanish

In Spanish, as I wrote above, the only diacritic use of H is in CH. There is also – as in French, and unlike Italian and Romanian – the use of H (always silent) in words derived from Latin ones that have it, like hombre (from hominem, man). But in many Latin words with F, the /f/ sound changed (by way of /φ/) to /h/, and this F came to be replaced by an H that was pronounced: hambre (from the vulgar famine, hunger), hembra (from femina, female) and most importantly – for a reason that will be explained – hoder (from futuere, to fuck). Now, those familiar with Spanish will know this last verb as joder. And thereby hangs a linguistic tale.

But before going on, it’s worth nothing that Spanish has a fourth use for H: it’s required before U when this is followed by a vowel letter either at the beginning of a word, as in huevo or huarache, or after another vowel letter, as in marihuana. The reason for this is that until the eighteenth century, U and V were considered the same letter, whichever shape it took (V at the beginning of a word and U elsewhere), while the sound it represented – vowel (or semivowel) or consonant – depended on neighboring letters. In particular, if it was followed by a vowel letter at the beginning of a word, or between two vowel letters, then the default was the consonant value (which might be /v/, /b/, /β/ or /f/, depending on the language). If the semivowel (/w/ or, in French, /ɥ/) was desired, the convention adopted by the Spanish and the French was to put an H (silent by default) in front. The H in the French words huile and huit is not etymological – the corresponding Latin words are oleum and octo – and the fact that huit has the badly named phonetic feature known as “aspirated H” has nothing to do with the letter being there, and everything with its being a numeral, since un/une and onze have the same feature.

Languages in which consonantal H was generally pronounced, such as German or English, didn’t have that option, so what scribes did in those languages was to write U (or V) twice, eventually producing the new letter W, which continues to represent /w/ in English (where it’s called “double U”) and some dialects of Dutch; in others, as well as in German, this sound shifted to /v/ while /v/ (written V) became /f/,

So, let’s get to hoder/joder.

It turns out that when the H derived from Latin F began to be silenced – a trend that, like most others in the development of Spanish, seems to have begun in Burgos in northern Castile and gradually spread southward – this one word, because of its forceful nature (its “brutal character,” according to the linguist Joan Corominas), remained the exception.

Meanwhile, other changes were happening in the Castilian language. In its medieval form it had a large panoply of sibilants, similar to those of modern Italian – fricative and affricate, voiceless and voiced, alveolar and postalveolar: /s/ (written S, or SS between vowels), /ʦ/ (Ç, C or Ç before E or I, Z at the end of a word), /z/ (S between vowels), /ʣ/ (Z), /ʃ/ (X), /ʒ/ (J, with [ʤ] as a variant), and /ʧ/ (CH). But two more or less parallel processes took place in the course of the late Middle and early Modern Ages: devoicing (/z/ → /s/, /ʒ/ → /ʃ/, /ʣ/ → /ʦ/) and alveolar deaffrication (/ʦ/ → /s/, /ʣ/ → /z/). Eventually these changes affected all of Spain, but devoicing happened first in the north, while deaffrication happened first in the south.

It’s my belief that both changes were influenced by the fact that during the late Middle Ages large non-Romance-speaking communities became Spanish-speaking. In the north these were Basques: in the course of the thirteenth and fourteenth centuries the Basque Provinces came under Castilian rule, but with substantial privileges that enabled many Basques to make careers in the Castilian realm. Many of them settled among Castilians and intermarried with them, with a consequent influence on the language. (Those who returned to their homeland in turn brought the Castilian language with them, so that large parts of the Basque Country, including most of Alava and cities such as Bilbao, became Spanish-speaking.)

Now the Basque language has no voiced sibilants but it has two kinds of “s” sound: the alveolar /s/ (spelled Z) and the retroflex /ṣ/ (spelled S). It so happened that [ṣ] was the allophone of /s/ that was already used in northern Castilian, and consequently /z/ devolved to this sound. What happened, then, was the word casa (house) changed from [kaza] to ['kaṣa], somewhat close to caxa (box), still pronounced ['kaʃa], while caça (hunt) was still ['kaʦa].

In the south deaffrication took place before devoicing, and so caça became ['kasa] while casa was still ['kaza]. This was the phonetic system that the Jews, expelled from Spain in 1492, took with them to the Eastern Mediterranean lands, and it has so remained in Judaeo-Spanish (also called Ladino) to the present day. It may be that the loss of affricates (other than /ʧ/) in the south was due to the large number of Arabic-speakers (not only Muslims but Jews and Christians as well) who became Spanish-speaking, since Arabic has neither /ʦ/ nor /ʣ/.

When the northern devoicing reached the south, casa became ['kasa], indistinguishable from caça (whose spelling eventually – centuries later – was changed to caza).

Eventually deaffrication reached Burgos, either from the south or from the many French people who passed through there on the pilgrim’s way to Santiago de Compostela, and so caça became ['kasa], with an alveolar /s/ which was now a distinct phoneme from /ṣ/. The three words caça/casa/caxa were now ['kasa]/['kaṣa]/['kaʃa], with the sibilants differing only slightly according to the position of the tongue against the palate (front, middle and back). There was consequently a lot of potential for confusion. (An example: Cervantes’ account, in the first paragraph of Don Quixote, that the knight's surname may have been either Quixada or Quesada.) What the Castilians did to make distinction easier was to exaggerate the extreme positions. For the front position, the tongue was pushed even more forward until it touched the backs of the teeth, resulting in caça becoming ['kaθa]. For the back position the tongue was pulled even more back until the consonant became the velar /x/, and caxa became ['kaxa]. This new sound was spelled X if it came from an original /ʃ/, and J (or G before E or I) if the /ʃ/ devolved from /ʒ/. Later spellling reforms, eliminating variant spellings of the same sound, caused the X to be replaced by J, except in a few vestigial words such as México, mexicano (though in Spain, at one time, a spelling with J was preferred).

Similarly, the new /θ/ was Ç (or C before E or I) if the original was /ʦ/, and Z if it had once been /ʣ/. In modern Spanish Z has replaced Ç, so that the three words are now written caza/casa/caja.

Because the velar fricative /x/ is similar to the glottal /h/, the fact that the former was now widespread, while the latter had – in the north – survived only in one lexeme, led to this special /h/ being replaced by /x/, and this is how hoder came to be joder.

In the south the assimilation went the other way around: when the northern shift from /ʃ/ to /x/ got there, the /h/ (derived from /f/) was still pronounced, and so the new /x/ became [h], a variant that has become typical of Andalusia and parts of Latin America, especially the Caribbean. Eventually the old /h/ was also largely lost (though it survives in parts of Andalusia), except in a few words mainly associated with Gypsy culture. These words, then, also came to be spelled with J: jondo (‘deep’ when describing Flamenco singing, originally hondo), juerga (‘wild party’ or ‘binge,’ from huelga, originally ‘leisure’ or ‘idleness,’ now mainly ‘strike’).

Sundry uses of H

In addition to the quasi-diacritic function, several European languages have developed idiosyncratic uses of H.

In Catalan, until the adoption of the new orthography of Pompeu Fabra in 1913, it was necessary to append an H to a word-final C. This convention had no effect on the pronunciation, which was /k/, and may have been developed by medieval scribes as a way of of differentiating word-final C from Ç (pronounced /s/ or, much earlier, /ʦ/). An H was also often placed between two consecuitve vowel letters, as in rahó (modern spelling raó, ‘reason’). The spelling change eliminated the H from common words and from most place-names, but in some cases the old spelling remained in the Spanish form of the name; for example, Vic was called Vich in Spanish until 1982. In any case the CH has remained in surnames such as Samaranch and Domènech, as well in such names as Bach, March and Bosch, which look English or German and are often pronounced as if they were so by non-Catalan-speakers. The usual Spanish-speakers’ pronunciation of this CH is /ʧ/, as if it were Spanish, though Spanish has no word-final CH. But Juan Bosch (1909–2001), the Dominican writer and politician, is generally known as [boʃ] to his fellow Dominicans; only those familiar with Catalan, to my knowledge, call him [bosk].

German, some time in the early modern period (Frühneuhochdeutsch), began to use H to mark a lengthening of a preceding vowel. It is not the only such marker – a doubling of the vowel letter or simply its position may do the same – so that there is no difference between the vowels of Jahr, Haar and Bar; of wer and Wehr, Meer and mehr; of ihr, wir and hier; and so on.

Other languages (including English, French and Italian) use this device occasionally in interjections such as oh and ah. In English, in which vowel letters have no fixed value, the H in ah (and hah), eh (and feh, meh), and uh (and huh) also indicates that the vowel is different from what it would have been without the H. The interjections oh and O, while they sound the same, have different uses.

Word-final -ah is also found in English, especially in words from Middle Eastern or South Asian languages (for example mullah, hookah, purdah, verandah), originally intended to indicate that this vowel is to be pronounced /ɑ/ rather than neutralized to /ə/, though the effect has generally been lost.

In the name Mariah, the H indicates the pronunciation [mə'rajə] as distinct from Maria [mə'riə]. This pronunciation, as well as that of pariah, may be influenced by the fact that the ending -iah [ajə] is frequent in Old Testament names (Jeremiah, Hezekiah, Josiah). In these names, along with many others (such as Judah, Sarah, Elijah, Shiloh), the H has no effect on pronunciation per se and is meant to indicate the presence of a ה in the Hebrew original, in contrast to such names as Ezra, Elisha and Bathsheba, where there is no ה. It may well be that the King James translators wanted to show in this way that their version was taken “out of the original tongue,” in contrast to others who translated from the Latin and who wrote the names as Jeremias, Judas, Sara and so on.

In Italian, word-initial H (silent, of course) is used to distinguish between some forms of the verb avere and their homophones: ho (‘I have’) and o (‘or’), hai (‘you [sg.] have ’) and ai (‘to the []’), ha (‘has’) and a (‘to’), hanno (‘they have’) and anno (‘year’) .


Enough of Germanic and Romance. Let me get to Slavic, of special interest to me since Polish is my native language.

In southern Slavic H is the equivalent of Cyrillic X and represents the common Slavic phoneme /x/. It’s also used to transcribe /h/, /x/ and /χ/ from other languages. For example, when Johann Sebastian Bach is written in Cyrillic (Serbian, Macedonian or Bulgarian), both the H of Johann and the CH of Bach are transliterated as X, and when Serbian is written in Latin script, the name becomes Johan Sebastijan Bah. (Croatian and Slovenian use the original German spelling, and this difference is in fact the chief distinction between written Serbian – which observes the principle of one-to-one equivalence between Cyrillic and Latin characters – and Croatian.)

The same phoneme /x/ occurs in northern Slavic, generally realized as [x] except that in upper Sorbian the realization is an aspirated ‘k’ sound, [kh]. In the eastern branch (Russian, Ukrainian and Belarusian, all of which use Cyrillic) it’s likewise represented by X. But in the western languages (Czech, Slovak, Sorbian and Polish), which use Latin script, it’s written CH, as it also has been by those who – perhaps due to being brought up under Polish rule – use Latin script to write Ukrainian (Łatynka) or Belarusian (Łacinka), though the usual English transliteration from these languages is KH, as from Russian. CH/X is used consistently in northern Slavic in the transcription of foreign words containing /x/ or /χ/.

All the western Slavic languages also include H in their alphabets, and use it to write foreign words with /h/. The value of H in Czech and Slovak is the “voiced h” phoneme /ɦ/, which corresponds (as does its Upper Sorbian value, voiceless /h/) to /g/ in Russian, Polish, Lower Sorbian and southern Slavic. In the former group of languages /g/ (written G) appears only in non-native names and relatively recent loan words.

Ukrainian and Belarusian have the same /ɦ/, written Г (the character for /g/ in Russian and southern Slavic) and transliterated into Latin script as H. Here, too, this character is used to transcribe /h/ from foreign languages (Johann is Iaгaн or Ёгaн in Belarusian and Йoгaнн in Ukrainian), and /g/ appears only in foreign names and recent loan words. For this sound Belarusian uses the same Г, but Ukrainian has developed the special character Ґ.

What Russian does with the transcription of /h/ is inconsistent: sometimes (especially in names) as /g/ (written Г), other times as /χ/ (written X). Heil Hitler, for example, is transliterated as Xaйль Гитлep.

This leaves Polish, and this is where I have a personal problem (about which I've written in a recent blog post).

All writings on Polish phonology assert that in Polish H is pronounced /x/, just like CH. But this is not how I learned the language as a child in Łódź and Piotrków just before and during World War II. In my native version of Polish, H is pronounced /h/; the other pronunciation was, as I remember; regarded as “peasant talk” (po chłopsku).

It may be that my pronunciation was conditioned by the fact that the Polish-speaking environment of my childhood was pretty much entirely Jewish. Polish Jews probably needed the distinction between /x/ and /h/ because it’s necessary in Yiddish and Hebrew (or perhaps, conversely, it’s a holdover from Yiddish). I distinctly remember that there was no confusion between women named Hana (a Polish name) and Chana (a Jewish one).

Polish, incidentally, is the only Slavic language written in Latin script to use CH in the name of Christ (Chrystus); all the others use K, since in Latin the sound of CH became /k/ early in the Christian era, if not before. The languages written in Cyrillic, on the other hand, are spoken primarily by populations that are historically Greek Orthodox. Accordingly, they use the Greek Xριστóς and write it as Xpистoс (Xpыстoс in Belarusian). This means of course that in Latin-scripted Serbian (latinica) it’s Hristos, and here another difference between Serbian and Croatian shows up: in Croatian it’s Krist. Correspondingly, Christianity is Hrišćanstvo and Kršćanstvo, respectively.

December 6, 2008

Addendum (October 5, 2014): An interesting account of the use of H in the Fula language of West Africa, by Don Osborn, can be found here.

© 2008 by Jacob Lubliner

