SWML SignWriting Markup Language

page 8

5. Matching procedure for searching in sign language texts

Having document servers able to serve sign language texts stored in SWML would be almost useless if one couldn't define searching procedures for such texts.

However, there is a major problem in searching sign languages texts: dealing with the small graphical variations people can imprint in the way they write the same signs. The SignWriting system distinguishes explicitly some graphical properties of the symbols, like rotation and flop, for example, but does not distinguish (neither identify) tiny variations due to vertical and/or horizontal displacements of symbols within signs, because such values aren't discretized in the system (as opposed to, e.g., rotation, which can only assume a few set of possible discretevalues).

The solution we've found to allow the user to control the criteria to be used for judging on the similarity of two signs is to define a kind of degree of similarity between the component symbols of a sign, assuring that two corresponding symbols should have the same symbol type, rotation and flop, but allowing them to have some variation on their relative positions within the respective signs.. This kind of similarity is formalized here as a parameterized, reflexive and symmetric relation, that we call sign similarity relation.

5.1. Basic Definitions and Geometric Considerations

Each SignWriting symbol can be characterized, in principle, by three basic features: its shape number, its filling information, and its so-called variation.

Let s be a SignWriting symbol. Let t be its symbol shape, let f = 0,1,2,3,4,5 be its filling information (which, e.g., codes both the way hands are facing the signer and if they are in a horizontal or in vertical plane), and let v, the variation, be complementary information about the symbol. The identification tuple for symbol s can thus be defined as the tuple ids=(t,f,v), allowing us to identify the symbol with its identification tuple.

A set of symbols that represent the same essential linguistic information, having the same symbol shape and differing only in their filling or variation information, is called a symbol group. Each symbol group is a class of equivalence represented by the basic symbol, whose identified tuple is ids = (t,0,0).

For any symbol s (with its shape, fill and variation information) of any symbol group, s can - in principle - be given two additional spatial information: rotation, denoted by r, and flop, denoted by fl, where r = 0,1,2,3,4,5,6,7 indicates a counter clockwise rotation applied to s (given in intervals of 45 degrees), and fl is a Boolean value indicating if the symbol is vertically mirrored or not, relative to the basic symbol. A symbol with such additional information is called an oriented symbol, and is identified by the tuple (ids, r, fl), where ids is the symbol identification tuple.

Example: The symbol group called index, whose symbols represent hands with index finger straight up and closed fist, is shown in the figure below. Each symbol s in the group is identified by a tuple ids = (0,f,0): shape number 0, f=0,1,...,5 (from left to right in the figure), and variation 0. The symbols in the first line have the basic orientation (no rotations, no flop) and are identified by codes of the form (ids,0,0). In the second line, a rotation of 45 degrees was applied to each symbol, and the symbols in that line are thus identified by (ids,1,0). In the third and fourth lines, there are flopped symbols, identified by (ids,0,1) (with no rotations) and (ids,7,1) (with rotations of 315 degrees).

The group of symbols called index, which represents hands with index finger straight up and closed fist, and some of its rotated and flopped symbols.

A symbol-box is the least box that contains a symbol. It is identified by sb = (x,y,w,h), where x and y are, respectively, the horizontal and vertical coordinates of the upper left corner of the symbol_box (relative to the upper left corner of the sign-box containing the symbol_box ñ see below), w is its width and h is its height. Symbol_boxes serve the purpose of indicating the placement of symbol instances within signs. A symbol instance, that is, an occurence of an oriented symbol within a sign, is defined as a pair si = (s;sb), where s = (ids;r;fl) is an oriented symbol and sb is a symbol box.

A sign, denoted by Sg, is a finite set of symbol instances. A sign_box is the least box that contains a sign. It is identified by Sgb = (w,h), where w is the box's width and h is its height. A sign instance is a sign Sg together with its sign_box Sb and an index j indicating the position of the sign within a sign sequence (a phrase in sign language).. It is represented by the tuple Sgi = (Sg;Sgb;j).

Remark: All the definitions presented above are reflected in the SWML format. Note, in particular, that as defined above, sign_boxes (and consequently, sign instances) have no coordinate information. This is so because, in the current version of SWML, sign language texts are treated as simple strings of signs, with no formatting information. The format in which texts are rendered is left completely to the discretion of the application.

A sign instance for the sign ìideaî in LIBRAS.

Example: The representation in SWML of the sign for ìideaî in LIBRAS (see figure above) is:

<sign_box>



<symbol x="20" y="9">



<shape number="215" fill="1" variation="0"/>

<transformation rotation="3" flop="0" />

</symbol>

<symbol x="15" y="33">



<shape number="114" fill="1" variation="1"/>

<transformation rotation="7" flop="0" />

</symbol>

<symbol x="15" y="27">



<shape number="87" fill="1" variation="0"/>

<transformation rotation="0" flop="0" />

</symbol>

<symbol x="23" y="28">



<shape number="0" fill="1" variation="1"/>

<transformation rotation="1" flop="0" />

</symbol>

</sign_box>