gnu.java.lang

Interface CharData

public interface CharData

This contains the info about the unicode characters, that java.lang.Character needs. It is generated automatically from ../doc/unicode/UnicodeData-4.0.0.txt and ../doc/unicode/SpecialCasing-4.0.0.txt, by some perl scripts. These Unicode definition files can be found on the http://www.unicode.org website. JDK 1.5 uses Unicode version 4.0.0. The data is stored as string constants, but Character will convert these Strings to their respective char[] components. The fields are stored in arrays of 17 elements each, one element per Unicode plane. BLOCKS stores the offset of a block of 2SHIFT characters within DATA. The DATA field, in turn, stores information about each character in the low order bits, and an offset into the attribute tables UPPER, LOWER, NUM_VALUE, and DIRECTION. Notice that the attribute tables are much smaller than 0xffff entries; as many characters in Unicode share common attributes. Numbers that are too large to fit into NUM_VALUE as 16 bit chars are stored in LARGENUMS and a number N is stored in NUM_VALUE such that (-N - 3) is the offset into LARGENUMS for the particular character. The DIRECTION table also contains a field for detecting characters with multi-character uppercase expansions. Next, there is a listing for TITLE exceptions (most characters just have the same title case as upper case). Finally, there are two tables for multi-character capitalization, UPPER_SPECIAL which lists the characters which are special cased, and UPPER_EXPAND, which lists their expansion.
See Also:
Character, String

Field Summary

static String[]
BLOCKS
The mapping of character blocks to their location in DATA.
static String[]
DATA
Information about each character.
static String[]
DIRECTION
This is the attribute table for computing the directionality class of a character, as well as a marker of characters with a multi-character capitalization.
static int[]
LARGENUMS
The array containing the numeric values that are too large to be stored as chars in NUM_VALUE.
static String[]
LOWER
This is the attribute table for computing the lowercase representation of a character.
static String[]
NUM_VALUE
This is the attribute table for computing the numeric value of a character.
static int[]
SHIFT
The character shift amount to look up the block offset.
static String
SOURCE
The Unicode definition file that was parsed to build this database.
static String
TITLE
This is the listing of titlecase special cases (all other characters can use UPPER to determine their titlecase).
static String[]
UPPER
This is the attribute table for computing the single-character uppercase representation of a character.
static String
UPPER_EXPAND
This is the listing of special case multi-character uppercase sequences.
static String
UPPER_SPECIAL
This is a listing of characters with multi-character uppercase sequences.

Field Details

BLOCKS

public static final String[] BLOCKS
The mapping of character blocks to their location in DATA. Each entry has been adjusted so that the 16-bit sum with the desired character gives the actual index into DATA.

DATA

public static final String[] DATA
Information about each character. The low order 5 bits form the character type, the next bit is a flag for non-breaking spaces, and the next bit is a flag for mirrored directionality. The high order 9 bits form the offset into the attribute tables. Note that this limits the number of unique character attributes to 512, which is not a problem as of Unicode version 4.0.0, but may soon become one.

DIRECTION

public static final String[] DIRECTION
This is the attribute table for computing the directionality class of a character, as well as a marker of characters with a multi-character capitalization. The direction is taken by performing a signed shift right by 2 (where a result of -1 means an unknown direction, such as for undefined characters). The lower 2 bits form a count of the additional characters that will be added to a String when performing multi-character uppercase expansion. This count is also used, along with the offset in UPPER_SPECIAL, to determine how much of UPPER_EXPAND to use when performing the case conversion. Note that this information is stored as an unsigned char since this is a String literal.

LARGENUMS

public static final int[] LARGENUMS
The array containing the numeric values that are too large to be stored as chars in NUM_VALUE. NUM_VALUE in this case will contain a negative integer N such that LARGENUMS[-N - 3] contains the correct numeric value.

LOWER

public static final String[] LOWER
This is the attribute table for computing the lowercase representation of a character. The value is the signed difference between the character and its lowercase version. Note that this is stored as an unsigned char since this is a String literal.

NUM_VALUE

public static final String[] NUM_VALUE
This is the attribute table for computing the numeric value of a character. The value is -1 if Unicode does not define a value, -2 if the value is not a positive integer, otherwise it is the value. Note that this is a signed value, but stored as an unsigned char since this is a String literal.

SHIFT

public static final int[] SHIFT
The character shift amount to look up the block offset. In other words, (char) (BLOCKS.value[ch >> SHIFT[p]] + ch) is the index where ch is described in DATA if ch is in Unicode plane p. Note that p is simply the integer division of ch and 0x10000.

SOURCE

public static final String SOURCE
The Unicode definition file that was parsed to build this database.
Field Value:
"../doc/unicode/UnicodeData-4.0.0.txt"

TITLE

public static final String TITLE
This is the listing of titlecase special cases (all other characters can use UPPER to determine their titlecase). The listing is a sorted sequence of character pairs; converting the first character of the pair to titlecase produces the second character.
Field Value:
"\u01c4\u01c5\u01c5\u01c5\u01c6\u01c5\u01c7\u01c8\u01c8\u01c8\u01c9\u01c8\u01ca\u01cb\u01cb\u01cb\u01cc\u01cb\u01f1\u01f2\u01f2\u01f2\u01f3\u01f2"

UPPER

public static final String[] UPPER
This is the attribute table for computing the single-character uppercase representation of a character. The value is the signed difference between the character and its uppercase version. Note that this is stored as an unsigned char since this is a String literal. When capitalizing a String, you must first check if a multi-character uppercase sequence exists before using this character.

UPPER_EXPAND

public static final String UPPER_EXPAND
This is the listing of special case multi-character uppercase sequences. Characters listed in UPPER_SPECIAL index into this table to find their uppercase expansion. Remember that you must also perform special-casing on two single-character sequences in the Turkish locale, which are not covered here in CharData.
Field Value:
"SS\u02bcNJ\u030c\u0399\u0308\u0301\u03a5\u0308\u0301\u0535\u0552H\u0331T\u0308W\u030aY\u030aA\u02be\u03a5\u0313\u03a5\u0313\u0300\u03a5\u0313\u0301\u03a5\u0313\u0342\u1f08\u0399\u1f09\u0399\u1f0a\u0399\u1f0b\u0399\u1f0c\u0399\u1f0d\u0399\u1f0e\u0399\u1f0f\u0399\u1f08\u0399\u1f09\u0399\u1f0a\u0399\u1f0b\u0399\u1f0c\u0399\u1f0d\u0399\u1f0e\u0399\u1f0f\u0399\u1f28\u0399\u1f29\u0399\u1f2a\u0399\u1f2b\u0399\u1f2c\u0399\u1f2d\u0399\u1f2e\u0399\u1f2f\u0399\u1f28\u0399\u1f29\u0399\u1f2a\u0399\u1f2b\u0399\u1f2c\u0399\u1f2d\u0399\u1f2e\u0399\u1f2f\u0399\u1f68\u0399\u1f69\u0399\u1f6a\u0399\u1f6b\u0399\u1f6c\u0399\u1f6d\u0399\u1f6e\u0399\u1f6f\u0399\u1f68\u0399\u1f69\u0399\u1f6a\u0399\u1f6b\u0399\u1f6c\u0399\u1f6d\u0399\u1f6e\u0399\u1f6f\u0399\u1fba\u0399\u0391\u0399\u0386\u0399\u0391\u0342\u0391\u0342\u0399\u0391\u0399\u1fca\u0399\u0397\u0399\u0389\u0399\u0397\u0342\u0397\u0342\u0399\u0397\u0399\u0399\u0308\u0300\u0399\u0308\u0301\u0399\u0342\u0399\u0308\u0342\u03a5\u0308\u0300\u03a5\u0308\u0301\u03a1\u0313\u03a5\u0342\u03a5\u0308\u0342\u1ffa\u0399\u03a9\u0399\u038f\u0399\u03a9\u0342\u03a9\u0342\u0399\u03a9\u0399FFFIFLFFIFFLSTST\u0544\u0546\u0544\u0535\u0544\u053b\u054e\u0546\u0544\u053d"

UPPER_SPECIAL

public static final String UPPER_SPECIAL
This is a listing of characters with multi-character uppercase sequences. A character appears in this list exactly when it has a non-zero entry in the low-order 2-bit field of DIRECTION. The listing is a sorted sequence of pairs (hence a binary search on the even elements is an efficient way to lookup a character). The first element of a pair is the character with the expansion, and the second is the index into UPPER_EXPAND where the expansion begins. Use the 2-bit field of DIRECTION to determine where the expansion ends.
Field Value:
"\u00df\000\u0149\002\u01f0\004\u0390\006\u03b0\011\u0587\014\u1e96\016\u1e97\020\u1e98\022\u1e99\024\u1e9a\026\u1f50\030\u1f52\032\u1f54\035\u1f56 \u1f80#\u1f81%\u1f82'\u1f83)\u1f84+\u1f85-\u1f86/\u1f871\u1f883\u1f895\u1f8a7\u1f8b9\u1f8c;\u1f8d=\u1f8e?\u1f8fA\u1f90C\u1f91E\u1f92G\u1f93I\u1f94K\u1f95M\u1f96O\u1f97Q\u1f98S\u1f99U\u1f9aW\u1f9bY\u1f9c[\u1f9d]\u1f9e_\u1f9fa\u1fa0c\u1fa1e\u1fa2g\u1fa3i\u1fa4k\u1fa5m\u1fa6o\u1fa7q\u1fa8s\u1fa9u\u1faaw\u1faby\u1fac{\u1fad}\u1fae\u007f\u1faf\u0081\u1fb2\u0083\u1fb3\u0085\u1fb4\u0087\u1fb6\u0089\u1fb7\u008b\u1fbc\u008e\u1fc2\u0090\u1fc3\u0092\u1fc4\u0094\u1fc6\u0096\u1fc7\u0098\u1fcc\u009b\u1fd2\u009d\u1fd3\u00a0\u1fd6\u00a3\u1fd7\u00a5\u1fe2\u00a8\u1fe3\u00ab\u1fe4\u00ae\u1fe6\u00b0\u1fe7\u00b2\u1ff2\u00b5\u1ff3\u00b7\u1ff4\u00b9\u1ff6\u00bb\u1ff7\u00bd\u1ffc\u00c0\ufb00\u00c2\ufb01\u00c4\ufb02\u00c6\ufb03\u00c8\ufb04\u00cb\ufb05\u00ce\ufb06\u00d0\ufb13\u00d2\ufb14\u00d4\ufb15\u00d6\ufb16\u00d8\ufb17\u00da"

gnu/java/lang/CharData -- Database for java.lang.Character Unicode info Copyright (C) 2002 Free Software Foundation, Inc. *** This file is generated by scripts/unicode-muncher.pl *** This file is part of GNU Classpath. GNU Classpath is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. GNU Classpath is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GNU Classpath; see the file COPYING. If not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Linking this library statically or dynamically with other modules is making a combined work based on this library. Thus, the terms and conditions of the GNU General Public License cover the whole combination. As a special exception, the copyright holders of this library give you permission to link this library with independent modules to produce an executable, regardless of the license terms of these independent modules, and to copy and distribute the resulting executable under terms of your choice, provided that you also meet, for each linked independent module, the terms and conditions of the license of that module. An independent module is a module which is not derived from or based on this library. If you modify this library, you may extend this exception to your version of the library, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version.