Third draft, Peter Kirk, August 2003
The Hebrew block of the Unicode Standard (http://www.unicode.org/charts/PDF/U0590.pdf) is intended to include all of the characters needed for proper representation of Hebrew texts from all periods of the Hebrew language, including fully pointed and cantillated ancient texts such as that of the Hebrew Bible. It is also intended to cover other languages written in Hebrew script, including Aramaic as used in biblical and other religious texts1 as well as Yiddish and a few other modern languages.
In practice there are a number of issues and minor deficiencies in the Hebrew block as currently defined, in version 4.0 of the Unicode Standard (http://www.unicode.org/versions/Unicode4.0.0/), which affect its usefulness for representation of pointed Hebrew texts and of Hebrew script texts in some other languages. Some of these simply require clarification and agreed guidelines for implementers. Others require further discussion and decision, and possibly additions to the Unicode standard or other action by the Unicode Technical Committee. The conclusion reached in this paper is that two new Unicode characters should be proposed; other issues can be resolved by use of suitable sequences of existing characters, provided that such use is generally agreed by content providers and rendering systems.
Several of these issues relate to different typographical conventions for publishing of Hebrew texts. It seems that a particular set of conventions is used for general publications in Hebrew, especially in Israel, but various other conventions, in which more fine distinctions are made, are used mainly for quality editions of biblical and other religious texts. A major aim of this paper is to document these different conventions, and to define ways in which the finer distinctions made in the latter conventions can be supported in Unicode without increasing complexity for those who use the former set of conventions.
The text below refers in several places to Convention A# and Convention B#, where # stands for a digit. These are independent conventions for each issue. For each issue Convention A# is the one commonly used for general publication in Israel, and Convention B# is the one used in BHS2, the most widely used scholarly edition of the Hebrew Bible. But there are many other publications which use some of the Conventions A# and some of the Conventions B#.
For conciseness abbreviated names are used for Unicode characters and sequences. For the full names and Unicode code points, see Appendix A.
These issues may be encountered in the encoding of any Hebrew text with vowel points, including modern Hebrew texts in which vowel points are written as an aid to pronunciation. They may also apply to texts in other languages written in Hebrew script.
The vowel point holam consists of a dot which is usually placed either above the top left corner of any Hebrew base character or a little further to the left, above the space between this base character and the following (to the left) one, to indicate a long O sound following that of the base character. By Convention B1 but not by Convention A1, when the base character is alef and is not word initial, the point holam may also appear above its top right corner; this also indicates a long O sound, and the alef is not pronounced. This holam is not in fact logically associated with the alef, but is associated with the preceding base character. It is shifted from above and to the left of the preceding (to the right) letter to above the top right of the alef as a typographical convention. This shift generally takes place only when the alef is silent, which is when there is no vowel point or dagesh combined with it and it is not followed by vav shruqa or holam male.

bəzō’t
yē’ōtû “on this [they]
will agree”, from Genesis 34:22 BHS3
<bet,
sheva,
dagesh,
telisha
gedola,
zayin,
holam,
alef,
tav,
space,
yod,
tsere,
alef,
holam,
qadma,
tav,
masora
circle,
vav,
dagesh>4
Holam
above the right of alef
(right hand word) and above the left of alef
(left hand word)
In principle the options for encoding of collocations of holam and alef are the same as for collocations of holam and vav as described below. In practice the encoding described here is already in general and uncontroversial use, and so there is no good reason to change it. However, if option 5 below is chosen for holam and vav, it might be sensible to use the same new character also when holam appears above the right of alef.
When a collocation of alef and holam is intended to be pronounced as the consonant sound alef followed by the vowel sound holam, and the holam is intended to be positioned above and to the left of alef, the collocation should be encoded as <alef, holam>. When Convention B1 is in use and a collocation of alef and holam is intended to be pronounced as the vowel sound holam alone with the alef not pronounced, with the holam positioned over the top right of alef, the collocation should be encoded as <holam, alef>, with the holam combined with the preceding base character. Where, exceptionally, a holam should not be shifted on to a following alef regardless of the Convention, the encoding <holam, ZWNJ, alef> should be used. Where, exceptionally, a holam should be shifted on to a following alef, when Convention B1 is used, in a context where this shift would not normally happen, the encoding <holam, ZWJ, alef> should be used. The encoding <RLM, holam, alef>5 may be used for special purposes to represent an isolated or word initial alef with holam above its top right, but will be rendered as such only by Convention B1.
Convention A1
No special rendering is required for collocations of alef and holam, as holam is never shifted on to following alef. The sequence <RLM, holam, alef> should be rendered as an isolated holam followed by an alef.
Convention B1
When a composite character sequence including holam is followed by alef, the holam should be shifted in rendering from its regular position and rendered above the top right of the alef. But this shift should not take place when the alef is combined with any vowel point or with dagesh. It should also not take place when the alef is followed immediately by vav shruqa6 or holam male. It should never take place when the alef is preceded by ZWNJ, but always when the alef is preceded by ZWJ. The sequence <RLM, holam, alef> should be rendered as an alef with a holam above its top right. When a holam follows an alef within the same composite character sequence, the holam should be rendered in the regular way above and to the left of the alef.
Furtive patah is a patah or short A vowel sound pronounced in Hebrew before the consonants ayin, het, and he with dagesh, when these are word final or followed by maqaf. Although furtive patah is pronounced before the word final consonant, it is represented by a patah glyph positioned under this final consonant. By Convention B2 but not by Convention A2, furtive patah is positioned under the right side of the final consonant, thus distinguishing it from regular patah which is centred under the consonant. In Hebrew any patah which appears under the final base character of a word, or under a base character followed by maqaf, is a furtive patah, but this rule may not apply to other languages written in Hebrew script.7

wəhôkiax
“and [he] was complaining”, from Genesis 21:25 BHS
<vav,
sheva,
he,
holam,
vav,
kaf,
hiriq,
merkha,
masora
circle,
het,
patah>
Furtive
patah
displaced to the right
Furtive patah is generally encoded in Unicode and in legacy encodings as patah following the word final base character, although this does not correspond to the pronunciation order. Any change to a more logical encoding would further complicate the issue of multiple vowel points described below, and so no change is suggested.
Furtive patah should be encoded as patah following the word final consonant.
Convention A2
Furtive patah is rendered as any other patah.
Convention B2
Patah should be shifted from its regular place to below the right side of the base character when this base character is one of ayin, he and het, and it is the last base character in a word or the next base character is maqaf. This Convention is suitable for Hebrew but may not be suitable for every other language written in Hebrew script.
By Convention B3, the vowel point holam may appear in two positions relative to the base letter vav. The first position is above its top left or a little further to the left; the second position is above its top right or its top centre.8 Thus a similar distinction is made to the one with alef. However, because vav is a narrow letter, the typographical distinction between these forms is often small. Sometimes, to make the distinction more clear, holam which would otherwise appear above the top left of vav is shifted further to the left, to over the space between the vav and the following base character. As with holam and alef, so similarly with holam and vav: holam above and to the left of vav is pronounced as a long O sound following the V or W sound of the vav; holam above the top right or centre of vav is pronounced as a long O sound but the vav is not pronounced.9 This holam above the top right or centre of vav is not in fact logically associated with the vav, but is associated with the preceding base character. It is shifted from above and to the left of the preceding (to the right) letter to above the top right of the vav as a typographical convention, to form what is sometimes considered to be a compound character, known in Hebrew as holam male. This shift takes place only when the vav is silent, which is when there is no vowel point or dagesh combined with it and it is not followed by vav shruqa or holam male.
By Convention A3, no distinction is made between the two positions of holam above vav, so that holam male and consonantal vav with holam are graphically identical.

gādôl
‘ăwōnî “great is my
iniquity”, from Genesis 4:13 BHS
<gimel,
qamats,
dagesh,
dalet,
holam,
merkha,
vav,
lamed,
space,
ayin,
hataf
patah,
vav,
holam,
nun,
hiriq,
tipeha,
yod>
Holam
above the right of vav,
i.e. holam
male
(right hand word), and holam
above the left of vav
(left hand word)10

The
words ’im-šāmôa‘
“if hearing” and ləmiṣwōtayw
“to his commandments”,
from Exodus 15:26 in a facsimile
of the Codex Leningradensis (dated 1008-9 CE)11
<alef,
hiriq,
final
mem,
maqaf,
shin,
qamats,
shin
dot,
mem,
qadma,
holam,
vav,
ayin,
patah>
<lamed,
sheva,
mem,
hiriq,
tsadi,
sheva,
vav,
holam,
tav,
qamats,
zaqef
qatan,
yod,
vav>
Holam
above the right of vav,
i.e. holam
male
(upper word), and holam
above the left of vav
(lower word)
Also, furtive patah
displaced to the right under the last letter of the upper word

ba‘ăwōnô
“in his iniquity”, from Joshua 22:20 in the Aleppo Codex
(dated c. 900 CE)
<bet,
patah,
dagesh,
ayin,
hataf
patah,
vav,
holam,
nun,
holam,
meteg,
vav>
Holam
above the left of vav
(third letter from the end (left)) and holam
male
(last letter, with holam
well to the right)
Exceptionally, when a collocation of holam and vav occurs within the divine name as written in the Bible text, i.e. in a word whose base characters are <yod, he, vav, he> (sometimes preceded by other base characters and/or followed by maqaf and another word), holam may appear above the top right of vav when there is another vowel point combined with and positioned below the same vav. See also section 3.1 below. Other exceptional collocations may occur in some texts.

The
divine name, from Genesis 3:14 BHS
<yod,
sheva,
he,
holam,
ZWJ,
vav,
qamats,
qadma,
he>
Holam
above the right of vav
and qamats
below it
Six options have been considered for encoding and rendering of collocations of holam and vav. The preferred option, which is described here, is based on the consensus reached in discussions on the Unicode Hebrew list. This option treats collocations of holam and vav in the same way as collocations of holam and alef. Arguably, it corresponds best to the historical development of Hebrew script and the linguistic properties of the Hebrew language. The other options are described in Appendix B, section B.1.
The following guidelines should be used for texts which may be rendered according to either Convention A3 or Convention B3:
When a collocation of vav and holam is intended to be pronounced as the consonant sound vav followed by the vowel sound holam, and the holam is intended to be positioned above and to the left of vav when Convention B3 is used, the collocation should be encoded as <vav, holam>. When a collocation of vav and holam is holam male, and so intended to be pronounced as the vowel sound holam alone with the vav not pronounced, the collocation should be encoded as <holam, vav>, with the holam combined with the preceding base character. Where, exceptionally, a holam followed by a vav with no vowel should not be taken together as holam male, the encoding <holam, ZWNJ, vav> should be used. Where, exceptionally as for example in the divine name as shown above, a holam should appear above the top right of a following vav in a context where holam male would not normally be formed, the encoding <holam, ZWJ, vav> should be used. An isolated or word initial holam male, which might be used for example in a list of Hebrew characters or possibly in other languages written in Hebrew script, should be encoded as <RLM, holam, vav>.12
The following guidelines may be used for texts which are intended for rendering only according to Convention A3, and in practice have been used for existing modern Hebrew texts:
All collocations of vav and holam should be encoded as <vav, holam>.
A possible mechanism for requesting a distinctive rendering of consonantal vav with holam with rendering systems using Convention A3, to distinguish them from the commoner holam male, is to encode such collocations as <vav, ZWNJ, holam>.13 This should be understood as a request for the holam not to be rendered above the vav at all but to be moved to the left and above it. This encoding is recommended for use only where there is no other way to make the required distinction.
Convention A3
A general rendering system for Convention A3 should follow the same rendering guidelines as for Convention B3 below, except that the position of the shifted holam should be the same as for rendering of <vav, holam> sequences. If a rendering system is intended to render only texts in which all collocations of vav and holam are encoded as <vav, holam>, no special rendering is required. In any case the sequence <vav, ZWNJ, holam> (with accents possibly inserted) should be rendered as a vav with a holam to the left of it and above.
Convention B3
When a composite character sequence including holam is followed by vav, the holam should be shifted in rendering from its regular position and rendered above the top right of the vav. But this shift should not take place when the vav is combined with any vowel point or with dagesh. It should also not take place when the vav is followed immediately by vav shruqa14 or holam male. It should never take place when the vav is preceded by ZWNJ, but always when the vav is preceded by ZWJ. The sequence <RLM, holam, vav> should be rendered as a vav with a holam above its top right. When a holam follows a vav within the same composite character sequence, the holam should be rendered in the regular way above and to the left of the vav. The sequence <vav, ZWNJ, holam> (with accents possibly inserted) should be rendered as a vav with a holam to the left of it and above, i.e. in many cases further to the left than the regular positioning of a holam above the top left of a vav.
When a composite character sequence including holam is followed by a composite character sequence including shin and shin dot, by Convention A4 but not Convention B4 the holam dot is notionally shifted from above and to the left of the preceding (to the right) letter to the top right of the shin and merged with the shin dot which is in the same position; this may be in effect equivalent to deletion of the holam, although the merged dot is not necessarily rendered exactly like shin dot. This shift does not apply to any holam which should be positioned above the top right of alef or vav as above.
Holam should normally be encoded in such cases even though it may not be visibly rendered. If a content provider wishes to ensure that such a holam is never displayed with a particular text, regardless of the Convention, the holam may be omitted.
Convention A4
Holam should not be rendered, i.e. simply omitted, above and to the left of any base character when followed by a composite character sequence including shin and shin dot. The rendering of shin dot may be adjusted e.g. replaced by the glyph for holam. Rendering of holam above the top right of any base character should not be adjusted.
Convention B4
The holam should be rendered in its regular position.
When a composite character sequence includes shin, sin dot and holam, by Convention A5 but not Convention B5 the holam point is notionally merged with the sin dot which is in the same position; this is in effect equivalent to deletion of the holam although the merged dot is not necessarily rendered exactly like sin dot. (Probably Convention A5 goes with Convention A4 and Convention B5 with Convention B4).
Holam should normally be encoded in such cases even though it may not be visibly rendered. If a content provider wishes to ensure that such a holam is never displayed with a particular text, regardless of the Convention, the holam may be omitted.
Convention A5
Holam should not be rendered at all when part of a composite character sequence including shin and sin dot. The rendering of sin dot may be adjusted e.g. replaced by the glyph for holam. This guideline should take precedence over any merging with shin dot. Where shin with sin dot is followed by holam male, the holam male should be rendered fully i.e. including the holam dot; but where shin with sin dot is followed by holam and alef, merging of the holam with the sin dot should take precedence over the shift of the holam according to Convention B1.
Convention B5
The holam should be rendered adjacent to the sin dot, or shifted according to the normal rules when followed by alef or vav.
Certain Hebrew punctuation marks are not correctly described in Unicode 4.0.
Sof pasuq is used to indicate the end of a verse in the Hebrew Bible (although it is missing from the end of a few verses in some texts, and completely absent from some others) and as the equivalent of a full stop in other Hebrew writings such as prayer books. It should be classed and processed as Terminal_Punctuation and as Sentence_Terminal.
Paseq is also used only at the ends of words, and so should also be classed as Terminal_Punctuation, but not as Sentence_Terminal. Paseq has two uses, one as part of the Hebrew accent system and the other as a special textual mark in the Hebrew Bible; it is normally found only in the Hebrew Bible and in quotations from it.
Maqaf is also generally considered to be a word divider and so should also be classed as Terminal_Punctuation. As its usage is analogous to that of hyphen and line breaks commonly occur after it in pointed Hebrew texts, it should also be listed in Unicode Standard Annex #14, along with hyphen, as a “break opportunity after”.
In post-biblical but pre-modern Hebrew texts, including the Masora (mediaeval editorial notes) which are printed with many Hebrew Bibles, numbers are indicated by Hebrew letters. Commonly but not always, one dot is placed above each of these letters to indicate this numerical use; sometimes two dots are used to indicate numbers greater than 1000.15 The same dots are apparently used for other purposes in the Masora, such as indicating abbreviations. These dots do not normally co-occur with other Hebrew points.

The
number 134, indicated by number dots, from the introduction (p. XV)
to BHS
<qof,
dot
above,
lamed,
dot
above,
dalet,
dot
above>
There are no specific Unicode characters for these functions, but it is possible to use general purpose combining diacritical marks. (See section 3.6 below concerning the suitability for this purpose of the Unicode Hebrew mark upper dot.) And so it is proposed that the single number dot should be represented by dot above and the double number dot by diaeresis. Rendering systems need to be able to handle this use of general purpose diacritical marks with Hebrew base characters.
For an alternative involving use of upper dot, see Appendix B, section B.5.
Hebrew grammars, dictionaries and pedagogical materials commonly use a mark similar in shape to < (the less-than sign) positioned above a base character to indicate the stress accent in Hebrew words. This mark has a glyph and positioning similar to the accent ole used in the cantillated Bible text. The simplest suggestion here is to use the character ole also as the generic accent.
The generic accent should be encoded as ole.
Rendering systems intended for rendering of pointed Hebrew text should render ole appropriately even if they do not support the fully cantillated biblical text.
These issues are known to apply to the Hebrew Bible text. Some of them may also apply to other pre-modern texts in Hebrew and Aramaic. They are not normally relevant to modern works, except perhaps to quotations.
The Hebrew Bible text was originally written with consonants only, and with some vowels represented, but not fully specified, by yod, vav and he, perhaps also alef. This consonantal text came to be regarded as sacred and unchangeable. In parallel a tradition of reading aloud and chanting the text developed. After many centuries efforts were made to codify this pronunciation tradition by adding marks to the consonantal text, to indicate in detail vowels and chanting (cantillation) patterns, without actually changing the consonants. The result is the current Hebrew Bible text, consisting of consonants as base characters, and vowels and accents as combining marks (and a few as base characters e.g. maqaf, paseq).
In some cases there was a mismatch between the consonantal text and the pronunciation tradition, such that a word to be pronounced, known as Qere, required different consonants from the word written in the text, known as Ketiv. In such cases the consonants of the Qere were written in the margin of the text but the vowel points to be used to pronounce it were arranged around the unchangeable Ketiv consonants. The resulting form in the text was a hybrid, never intended to be pronounced as written. In some extreme cases it would in fact be unpronounceable, as for example in a few cases where there is a Qere word with no corresponding Ketiv, represented in the biblical text as blank space (or in some editions an asterisk or a similar glyph) surrounded by vowel points and accents.16

The
vowel
points
of the Qere ’ēlay “to
me”, from Ruth 3:17 BHS
<RLM,
NBSP,
masora
circle,
NBSP,
tsere,
NBSP,
patah,
zaqef
qatan>
Qere
without Ketiv
When a particular Qere form is used regularly with the same Ketiv, the Qere is not written in the margin but is assumed to be known, as a perpetual Qere. All that is visible is the sometimes anomalous combination of the Ketiv consonants and the Qere vowels. One well known example is the divine name, written with the Ketiv consonants yod, he, vav, he but with the Qere vowels sheva, holam, qamats or sheva, holam, hiriq taken from one of two alternative words used in pronunciation – although often the holam is not actually written. (The example of the divine name given above, from Genesis 3:14, is one of the exceptional cases in BHS where the holam is written.) For another important example of perpetual Qere, see section 3.3 below.
Various approaches may be taken to encoding each case of Qere and Ketiv. It would be possible to encode only the Qere, or only the Ketiv, or only the mixed form actually seen in the text. If the Ketiv is encoded, it may be left unpointed, or points may be added according to a reconstruction of its intended pronunciation. In cases of perpetual Qere the mixed forms are commonly encoded. Alternatively, any two or three of these options may be combined, with a higher level mark-up to indicate which form is which. The mixed forms in the text may be highly anomalous, and so the correct encoding for them may be unclear. One approach in cases like Qere without Ketiv is to encode NBSP in place of each missing base character followed by the associated diacritical marks; in this case the word should be preceded by RLM to ensure correct directionality as NBSP is directionally neutral. See also section 3.3 below.
Rendering systems should be aware that anomalous combinations are likely to occur, especially in mixed Ketiv and Qere forms. So they should not be unduly restrictive in failing to render properly sequences which are not normally accepted in Hebrew.
For historical reasons, in the two passages in the Hebrew Bible which list the Ten Commandments (Exodus 20:2-17 and Deuteronomy 5:6-21) two different pronunciation traditions are recorded in the pointing. The main consequence of this is that two different accents (cantillation marks) are attached to many of the words in these passages, and as the accent is generally positioned above or below the stressed syllable in the word, in most cases the two different accents are combined with the same base character. When two accents on the same base character would otherwise be in the same position, they are displaced sideways so that one is to the left of the other. Also both rafe and dagesh are combined with some base characters in these passages. For one unique word in Exodus 20:4 with two vowel points, see section 3.3 below.

mittāaxat
“from below”, from Exodus 20:4 BHS
<mem,
hiriq,
tav,
dagesh,
qamats,
etnahta,
CGJ,
patah,
geresh,
het,
patah,
tav>
Two
vowel
points
and two accents, also dagesh,
combined with one base character
These passages should be encoded with two accents combined with a single base character where necessary. When the two accents would normally be in the same position, they have the same canonical combining class and so their relative ordering is significant; these accents should be encoded such that the one to the right precedes the one to the left. Co-occurrence of rafe and dagesh causes no problem as they have different canonical combining classes and do not interfere typographically.
Rendering systems should expect to have to render two accents combined with a single base character, and if necessary to adjust their positioning relative to one another, and relative to vowel points, as required. A rather small set of combinations is actually attested in the biblical text, and so positioning algorithms can be tested exhaustively. Rendering systems should also expect to render rafe and dagesh, in either canonically equivalent order, combined with the same base character.
The general rule in the Hebrew writing system is that no more than one vowel point is combined with any one base character (and indeed that, except at the end of a word or before vav shruqa or holam male, exactly one vowel point should be combined with each base character which is not silent).
The few exceptions to this rule are in fact almost all cases where the consonants of a Ketiv word have been combined with the vowel points of a Qere word. But this is not immediately apparent in the one repeated example of perpetual Qere in which two vowel points appear with one consonant, because the mixed form with two vowel points is the only one which appears in standard electronic texts. This repeated example is the biblical form of the name of the city Jerusalem, written with the base characters <yod, resh, vav, shin, lamed, final mem>, with both patah (or sometimes qamats) and hiriq combined with the lamed, but pronounced, as written in modern Hebrew, as if ending <..., lamed, patah (or qamats), yod, hiriq, final mem>. This form is encoded in existing standard texts as it is printed, as <..., lamed, patah (or qamats), hiriq, final mem>, so with two vowel points combined with the lamed. There are also a few variant forms of this word with a directional he suffix and sheva replacing hiriq, so encoded <..., lamed, patah (or qamats), sheva, mem, qamats, he>.17

yərûšālaim,
the name of Jerusalem, from Joshua 10:1 BHS
<yod,
sheva,
resh,
vav,
dagesh,
shin,
qamats,
shin
dot,
lamed,
patah,
revia,
CGJ,
hiriq,
final
mem>
Sequence
of patah
and hiriq

yərûšālaəmāh,
the name of Jerusalem with suffixed he,
from 1 Kings 10:2 BHS
<yod,
sheva,
resh,
vav,
dagesh,
shin,
qamats,
shin
dot,
masora
circle,
lamed,
patah,
revia,
CGJ,
sheva,
mem,
qamats,
he>
Sequence
of patah
and sheva
Other mixed forms of Ketiv consonants and Qere vowel points have multiple vowel points with a single base character, as well as other more complex anomalies as described above in section 3.1 above.
A very few other anomalous cases are found in Hebrew Bible texts. For example, in the BHS Hebrew Bible and in the Leningrad Codex on which it is based, the fourth base character of the first word of 2 Kings 21:26 carries both sheva and holam. This was probably originally a scribal error as many manuscripts lack the sheva, but it has been perpetuated in a standard scholarly edition and in electronic texts based on it.

wayyiqəbəōr
“and he buried”, from 2 Kings 21:26 BHS
<vav,
patah,
yod,
hiriq,
dagesh,
qof,
sheva,
bet,
sheva,
dagesh,
merkha,
CGJ,
holam,
resh>
Sheva
and holam
with the same base character
In just one unique word in the Ten Commandments, the twelfth word (counting maqaf as a word divider) in Exodus 20:4, two different vowel points, qamats and patah, as well as dagesh and two accents are combined with a single base character tav. In this word only, the two vowel points represent alternative pronunciations and are not to be pronounced sequentially.18
Encoding of these cases presents a problem because these pairs of vowel points combined with a single base character are combining characters with different canonical combining classes, and so each pair is canonically equivalent to the same pair reversed. In fact, by chance, most of the attested pairs are in descending order of combining class and so are reversed by Unicode normalisation processes. It seems the possibility of multiple vowel points on a single base character was ignored by those who defined the combining classes for the vowel points. Unfortunately it is not possible to adjust the combining classes so that they meet the requirements of encoding such cases; these adjustments are prohibited by the Unicode Stability Policy because they would destabilise the normalisation of existing texts.
Several proposals have been put forward for working around this problem. Of these various proposals, the most promising is based on the character CGJ, which is a combining character with no visual representation. As its canonical combining class is zero it inhibits canonical reordering of combining characters between which it is encoded. The alternative proposals are outlined in Appendix B, section B.2.
The recommended encoding for any pair of vowel points combined with a single base character is for CGJ to be inserted between them, at least if the ordering of the pair is significant, either semantically or visually, and is different from the canonical ordering. Thus, for example, the city name Jerusalem should be encoded <..., lamed, patah (or qamats), CGJ, hiriq, final mem>; and the special case word in Exodus 20:4 should be encoded with a composite character sequence <tav, dagesh, qamats, etnahta, CGJ, patah, geresh> – hopefully the longest composite character sequence in the Hebrew Bible.
Note: In any Hebrew composite character sequence including CGJ, any of dagesh, rafe, sin dot, and shin dot should be encoded before the CGJ. This is for the purpose of collation, to ensure that after normalisation these characters are suitably positioned relative to the base character for collation contractions to be applied.
Rendering systems should simply ignore CGJ when it appears between two vowel points, and render any pair of vowel points, if both are positioned below the base character, in right to left order below the base character. In the case of the city name Jerusalem, one practice, as in BHS, is to centre the patah or qamats under the lamed and shift the hiriq to below the right side of the final mem. In the special case word in Exodus 20:4, the three combining marks qamats, etnahta, patah should all appear in right to left order under one base character.
The point meteg, which is really part of the Hebrew accent system but is sometimes used e.g. to indicate secondary stress in texts which are not fully cantillated, is a short vertical line which is positioned below a base character and generally to the left of any vowel point below the base character. Note that the vowel point is not generally shifted from its centred position below the base character. See section 3.5 below for special rules with the hataf vowels.
In a minority of cases in some manuscripts and printed texts of the Hebrew Bible, meteg occurs not to the left but to the right of a low positioned vowel point. This is especially common at the beginning of a word. The distinction between these two positions seems to be not semantic but only graphical, and is not made in Convention A6. Nevertheless it is important to retain the distinction in scholarly texts, using Convention B6.19

wayəhî-kēn
“and it was so”, from Genesis 1:7 BHS
<vav,
meteg,
CGJ,
patah,
yod,
sheva,
he,
hiriq,
yod,
maqaf,
masora
circle,
kaf,
tsere,
meteg,
final
nun