forum SignWriting List Forum
  Message 1543  |  Previous | Next  [ Up Thread ] Message Index
From:  Greg Noel
Date:  Wed Jun 30, 1999  7:41 am
Subject:  Re: Typing Left-Handed SignWriting


Valerie Sutton wrote:
>Perhaps you could share with us how Unicode and SGML differ?

Apples and oranges. Unicode is about alphabets. SGML is about display.

(The rest of this is oversimplified, but bear with me anyway.)

Unicode is an international agreement on what numerical values to assign to
glyphs(*) in the various alphabets. There's a set of values for the
English (well, Latin-1) alphabet, which happen to be exactly the same as
the ASCII values. There's another set for the Greek alphabet, Crylic,
Arabic, Hebrew, and so forth. The values are disjoint (that is, the don't
overlap) so that it's theoretically possible to use all of them in the same
document. Theoretically.

(*In this context, a displayable shape is called a glyph. It could be a
letter, number, punctuation, or just about anything---including whitespace;
it doesn't have to be something visible. For example, the Dingbat glyphs
have been assigned Unicode values, so you could theoretically use them in
any application that's Unicode-aware, just as you could the letter G or a
comma. Theoretically.)

This makes it relatively easy to display languages like English, French,
German, Russian, Arabic, and Hebrew---languages where the words are built
up out of glyphs displayed linearly(*). It's more problematical for
languages like Japanese and Chinese---and SignWriting---where words are a
single symbol. I recall reading that Chinese has 300,000 words; it's
likely infeasible to include all of them as individual Unicode values. (If
the words are composed out of smaller pieces, those pieces are candidates
for inclusion in Unicode, but there still needs to be some way of combining
them into the composite symbol.)

(*Yes, I know Arabic has, ah, interesting issues with ligatures in order to
display it properly, probably enough to take it out of the "easy" category,
but we're keeping things simple, remember?)

On the other hand, SGML is the ur-language from which HTML and XML are
descended. SGML is concerned with marking up text so that it can be
printed; its roots are in the publishing industry. It allows text (the
"content") to be annotated by the author with things like "this is a
paragraph" or "this is a list." This can then be passed to a publisher,
who can say things like "set paragraphs in Helvetica 14-on-16 at 130%
spaced 20, initial line indented a half-inch" (The font is Helvetica at
fourteen points, line leading (i.e., the height) is sixteen points, the
spread (character spacing) is increased 30%, and there's twenty points of
space between paragraphs.) If the book is popular, they can change the
specification for a paperback edition (or replace it with a different one
to produce a Braille edition) without changing the content. Since they
wouldn't have to re-proof the content, they'd save a fair amount of money.

Although the idea of SGML is straightforward and SGML itself as a language
is not all that complex, to implement all of SGML is an enormous task.
SGML is also not designed with efficiency of implementation as a goal---if
you're buying a million-dollar printing press, $50K for a box to do the
SGML layout is small change.

To implement all of SGML to display a few CERN technical documents was just
too much. So HTML was developed, which was an easily-implementable and
efficient subset of SGML that pre-selected all of the markup types
("paragraph" "list") and left most of the display decisions (font type and
size, paragraph delineation, list indentation) to the discretion of the
browser. It seemed reasonable at the time; they didn't know where it was
going to go.

Once the Web began to explode, HTML's restrictions (all markups built-in,
few display options available to the author, display markup necessarily
in-line) became apparent. XML, et.al., is the attempt to rectify that.
It's also an SGML subset, but it's still designed so that it can be
efficiently implemented.

XML is not quite a superset of HTML---it's almost possible to implement
HTML as a set of XML display markups---but there are a few critical
differences. In fact, XML doesn't really add much in the way of language
concepts---but the difference in capability is enormous.

>You mentioned a "document type definition". Would that be different than
>Unicode - another way of typing SignWriting symbols?

XML allows the display markup to be separated from the content markup and
"published" (made available) independently. Because of this (and some
other stuff I'm skipping over), third-party groups can publish display
markups that dramatically change how content is shown. Although the intent
is to improve performance (and interactivity) by moving some of the
processing from the server to the browser, the effective result is that
external authorities can extend browser functionality in some really
astonishing ways.

So, while we're still simplfying outrageously, let's just say that a DTD is
the display markup for some type of content. A DTD is a necessary
component to be able to display SignWriting, but I don't know if it would
be sufficient. As the canonical "external authority" you could publish the
appropriate DTD.

Presumably, the content given to the DTD could be Unicode (but I'll hold
some reservations on that; it's a topic for another time).

>I am preparing a document for the Unicode developers right now, so that we
>can type SignWriting someday using Unicode. Would the SGML work be
>separate? If so, do you need the listing of our thousands of symbol
>variations too?

As a passing comment, you may not need "thousands of symbol variations" in
the Unicode. Simple translations like rotations and reflections are dirt
simple for a computer to make (reversing a display for a left-handed
version is trivial compared to getting things to display in the first
place), so you probably don't need the eight different orientations of each
symbol. And going from palm-facing to palm-away is just a matter of
coloring the interior pixels.

On the other hand, the code would be simpler if it didn't have to calculate
all the extra shapes. (-;)

So my message is to be prepared with a backup position if the Unicode
authority bridles at adding that many symbols....

Hope this helps,
-- Greg Noel, retired UNIX guru


  Replies Author Date
1545 Symbols Rotate Automatically in SignWriter Valerie Sutton Wed  6/30/1999

  Message 1543  |  Previous | Next  [ Up Thread ] Message Index