5. Matching procedure for searching
in sign language texts
Having document servers able to serve sign
language texts stored in SWML would be almost useless if one
couldn't define searching procedures for such texts.
However, there is a major problem in searching
sign languages texts: dealing with the small graphical variations
people can imprint in the way they write the same signs. The
SignWriting system distinguishes explicitly some graphical properties
of the symbols, like rotation and flop, for example, but does
not distinguish (neither identify) tiny variations due to vertical
and/or horizontal displacements of symbols within signs, because
such values aren't discretized in the system (as opposed to,
e.g., rotation, which can only assume a few set of possible discretevalues).
The solution we've found to allow the user
to control the criteria to be used for judging on the similarity
of two signs is to define a kind of degree of similarity between
the component symbols of a sign, assuring that two corresponding
symbols should have the same symbol type, rotation and flop,
but allowing them to have some variation on their relative positions
within the respective signs.. This kind of similarity is formalized
here as a parameterized, reflexive and symmetric relation, that
we call sign similarity relation.
5.1. Basic Definitions and Geometric
Considerations
Each SignWriting symbol can be characterized,
in principle, by three basic features: its shape number, its
filling information, and its so-called variation.
Let s be a SignWriting symbol.
Let t be its symbol shape, let f = 0,1,2,3,4,5
be its filling information (which, e.g., codes both the way hands
are facing the signer and if they are in a horizontal or in vertical
plane), and let v, the variation, be complementary information
about the symbol. The identification tuple for symbol
s can thus be defined as the tuple ids=(t,f,v),
allowing us to identify the symbol with its identification tuple.
A set of symbols that represent the same
essential linguistic information, having the same symbol shape
and differing only in their filling or variation information,
is called a symbol group. Each symbol group is a class
of equivalence represented by the basic symbol, whose
identified tuple is ids = (t,0,0).
For any symbol s (with its shape,
fill and variation information) of any symbol group, s
can - in principle - be given two additional spatial information:
rotation, denoted by r, and flop, denoted
by fl, where r = 0,1,2,3,4,5,6,7 indicates a counter
clockwise rotation applied to s (given in intervals of
45 degrees), and fl is a Boolean value indicating if the
symbol is vertically mirrored or not, relative to the basic symbol.
A symbol with such additional information is called an oriented
symbol, and is identified by the tuple (ids, r, fl),
where ids is the symbol identification tuple.
Example:
The symbol group called index, whose symbols represent
hands with index finger straight up and closed fist, is shown
in the figure below. Each symbol s in the group is identified
by a tuple ids = (0,f,0): shape number 0, f=0,1,...,5
(from left to right in the figure), and variation 0. The
symbols in the first line have the basic orientation (no rotations,
no flop) and are identified by codes of the form (ids,0,0).
In the second line, a rotation of 45 degrees was applied to each
symbol, and the symbols in that line are thus identified by (ids,1,0).
In the third and fourth lines, there are flopped symbols, identified
by (ids,0,1) (with no rotations) and (ids,7,1)
(with rotations of 315 degrees).
The group of symbols called index,
which represents hands with index finger straight up and closed
fist, and some of its rotated and flopped symbols.
A symbol-box is the least box that
contains a symbol. It is identified by sb = (x,y,w,h),
where x and y are, respectively, the horizontal
and vertical coordinates of the upper left corner of the symbol_box
(relative to the upper left corner of the sign-box containing
the symbol_box ñ see below), w is its width and
h is its height. Symbol_boxes serve the purpose of indicating
the placement of symbol instances within signs. A symbol instance,
that is, an occurence of an oriented symbol within a sign, is
defined as a pair si = (s;sb), where s = (ids;r;fl)
is an oriented symbol and sb is a symbol box.
A sign, denoted by Sg, is
a finite set of symbol instances. A sign_box is the least
box that contains a sign. It is identified by Sgb = (w,h),
where w is the box's width and h is its height.
A sign instance is a sign Sg together with its
sign_box Sb and an index j indicating the position
of the sign within a sign sequence (a phrase in sign language)..
It is represented by the tuple Sgi = (Sg;Sgb;j).
Remark:
All the definitions presented above are reflected in the SWML
format. Note, in particular, that as defined above, sign_boxes
(and consequently, sign instances) have no coordinate information.
This is so because, in the current version of SWML, sign language
texts are treated as simple strings of signs, with no formatting
information. The format in which texts are rendered is left completely
to the discretion of the application.
A sign instance for the sign ìideaî
in LIBRAS.
Example:
The representation in SWML of the sign for ìideaî
in LIBRAS (see figure above) is:
<sign_box>
<!-- sign "idea" in LIBRAS
-->
<symbol x="20" y="9">
<!-- the head -->
<shape number="215" fill="1"
variation="0"/>
<transformation rotation="3"
flop="0" />
</symbol>
<symbol x="15" y="33">
<!-- the arrow -->
<shape number="114" fill="1"
variation="1"/>
<transformation rotation="7"
flop="0" />
</symbol>
<symbol x="15" y="27">
<!-- the asterisk -->
<shape number="87" fill="1"
variation="0"/>
<transformation rotation="0"
flop="0" />
</symbol>
<symbol x="23" y="28">
<!-- the hand -->
<shape number="0" fill="1"
variation="1"/>
<transformation rotation="1"
flop="0" />
</symbol>
</sign_box>
|