SignWriting List Forum | |||
|
From:
Greg Noel Date: Wed Jun 30, 1999 7:41 am Subject: Re: Typing Left-Handed SignWriting | ||||||||
Valerie Sutton wrote: >Perhaps you could share with us how Unicode and SGML differ? Apples and oranges. Unicode is about alphabets. SGML is about display. (The rest of this is oversimplified, but bear with me anyway.) Unicode is an international agreement on what numerical values to assign to glyphs(*) in the various alphabets. There's a set of values for the English (well, Latin-1) alphabet, which happen to be exactly the same as the ASCII values. There's another set for the Greek alphabet, Crylic, Arabic, Hebrew, and so forth. The values are disjoint (that is, the don't overlap) so that it's theoretically possible to use all of them in the same document. Theoretically. (*In this context, a displayable shape is called a glyph. It could be a letter, number, punctuation, or just about anything---including whitespace; it doesn't have to be something visible. For example, the Dingbat glyphs have been assigned Unicode values, so you could theoretically use them in any application that's Unicode-aware, just as you could the letter G or a comma. Theoretically.) This makes it relatively easy to display languages like English, French, German, Russian, Arabic, and Hebrew---languages where the words are built up out of glyphs displayed linearly(*). It's more problematical for languages like Japanese and Chinese---and SignWriting---where words are a single symbol. I recall reading that Chinese has 300,000 words; it's likely infeasible to include all of them as individual Unicode values. (If the words are composed out of smaller pieces, those pieces are candidates for inclusion in Unicode, but there still needs to be some way of combining them into the composite symbol.) (*Yes, I know Arabic has, ah, interesting issues with ligatures in order to display it properly, probably enough to take it out of the "easy" category, but we're keeping things simple, remember?) On the other hand, SGML is the ur-language from which HTML and XML are descended. SGML is concerned with marking up text so that it can be printed; its roots are in the publishing industry. It allows text (the "content") to be annotated by the author with things like "this is a paragraph" or "this is a list." This can then be passed to a publisher, who can say things like "set paragraphs in Helvetica 14-on-16 at 130% spaced 20, initial line indented a half-inch" (The font is Helvetica at fourteen points, line leading (i.e., the height) is sixteen points, the spread (character spacing) is increased 30%, and there's twenty points of space between paragraphs.) If the book is popular, they can change the specification for a paperback edition (or replace it with a different one to produce a Braille edition) without changing the content. Since they wouldn't have to re-proof the content, they'd save a fair amount of money. Although the idea of SGML is straightforward and SGML itself as a language is not all that complex, to implement all of SGML is an enormous task. SGML is also not designed with efficiency of implementation as a goal---if you're buying a million-dollar printing press, $50K for a box to do the SGML layout is small change. To implement all of SGML to display a few CERN technical documents was just too much. So HTML was developed, which was an easily-implementable and efficient subset of SGML that pre-selected all of the markup types ("paragraph" "list") and left most of the display decisions (font type and size, paragraph delineation, list indentation) to the discretion of the browser. It seemed reasonable at the time; they didn't know where it was going to go. Once the Web began to explode, HTML's restrictions (all markups built-in, few display options available to the author, display markup necessarily in-line) became apparent. XML, et.al., is the attempt to rectify that. It's also an SGML subset, but it's still designed so that it can be efficiently implemented. XML is not quite a superset of HTML---it's almost possible to implement HTML as a set of XML display markups---but there are a few critical differences. In fact, XML doesn't really add much in the way of language concepts---but the difference in capability is enormous. >You mentioned a "document type definition". Would that be different than >Unicode - another way of typing SignWriting symbols? XML allows the display markup to be separated from the content markup and "published" (made available) independently. Because of this (and some other stuff I'm skipping over), third-party groups can publish display markups that dramatically change how content is shown. Although the intent is to improve performance (and interactivity) by moving some of the processing from the server to the browser, the effective result is that external authorities can extend browser functionality in some really astonishing ways. So, while we're still simplfying outrageously, let's just say that a DTD is the display markup for some type of content. A DTD is a necessary component to be able to display SignWriting, but I don't know if it would be sufficient. As the canonical "external authority" you could publish the appropriate DTD. Presumably, the content given to the DTD could be Unicode (but I'll hold some reservations on that; it's a topic for another time). >I am preparing a document for the Unicode developers right now, so that we >can type SignWriting someday using Unicode. Would the SGML work be >separate? If so, do you need the listing of our thousands of symbol >variations too? As a passing comment, you may not need "thousands of symbol variations" in the Unicode. Simple translations like rotations and reflections are dirt simple for a computer to make (reversing a display for a left-handed version is trivial compared to getting things to display in the first place), so you probably don't need the eight different orientations of each symbol. And going from palm-facing to palm-away is just a matter of coloring the interior pixels. On the other hand, the code would be simpler if it didn't have to calculate all the extra shapes. (-;) So my message is to be prepared with a backup position if the Unicode authority bridles at adding that many symbols.... Hope this helps, -- Greg Noel, retired UNIX guru | ||||||||
|
|