The uax-15 Reference Manual

Table of Contents

Next: , Previous: , Up: (dir)   [Contents][Index]

The uax-15 Reference Manual

This is the uax-15 Reference Manual, generated automatically by Declt version 3.0 "Montgomery Scott" on Fri Jun 26 12:32:12 2020 GMT+0.


Next: , Previous: , Up: Top   [Contents][Index]

1 Introduction

uax-15

This package provides a common lisp unicode normalization function using nfc, nfd, nfkc and nfkd as per Unicode Standard Annex #15 found at http://www.unicode.org/reports/tr15/tr15-22.html.

This is a fork of a subset of work done by Takeru Ohta in 2010. Future work is intended to provide support for https://tools.ietf.org/html/rfc8264 and https://tools.ietf.org/html/rfc7564.

Implementation Notes

This has been successfully tested on sbcl, ccl, ecl, abcl, allegro and cmucl against the unicode test file found at http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt

Clisp still has some issues. It has not been tested against lispworks or other common lisp implementations.

Usage

It has one major exported function:

The currently supported normalization methods are :nfc :nfkc :nfd :nfkd

Normalization example with reference to relevant xkcd https://www.xkcd.com/936/

    (normalize "正しい馬バッテリーステープル" :nfkc)
    "正しい馬バッテリーステープル"

    (normalize "الحصان الصحيح البطارية التيلة" :nfkc)
    "الحصان الصحيح البطارية التيلة"

    (normalize "اstáplacha ceart ceallraí capall" :nfkc)
    "اstáplacha ceart ceallraí capall"

To Do list

More relevant xkcd https://xkcd.com/1726/, https://xkcd.com/1953/, https://www.xkcd.com/1209/, https://xkcd.com/1137/

Data Files

Other References


Next: , Previous: , Up: Top   [Contents][Index]

2 Systems

The main system appears first, followed by any subsystem dependency.


Previous: , Up: Systems   [Contents][Index]

2.1 uax-15

Author

Takeru Ohta, Sabra Crolleton <sabra.crolleton@gmail.com>

License

MIT

Description

Common lisp implementation of Unicode normalization functions :nfc, :nfd, :nfkc and :nfkd (Uax-15)

Dependencies
Source

uax-15.asd (file)

Component

src (module)


Next: , Previous: , Up: Top   [Contents][Index]

3 Modules

Modules are listed depth-first from the system components tree.


Previous: , Up: Modules   [Contents][Index]

3.1 uax-15/src

Parent

uax-15 (system)

Location

src/

Components

Next: , Previous: , Up: Top   [Contents][Index]

4 Files

Files are sorted by type and then listed depth-first from the systems components trees.


Previous: , Up: Files   [Contents][Index]

4.1 Lisp


Next: , Previous: , Up: Lisp files   [Contents][Index]

4.1.1 uax-15.asd

Location

uax-15.asd

Systems

uax-15 (system)

Packages

uax-15-system

Internal Definitions

*string-file* (special variable)


Next: , Previous: , Up: Lisp files   [Contents][Index]

4.1.2 uax-15/src/package.lisp

Parent

src (module)

Location

src/package.lisp

Packages

uax-15


Next: , Previous: , Up: Lisp files   [Contents][Index]

4.1.3 uax-15/src/utilities.lisp

Dependency

package.lisp (file)

Parent

src (module)

Location

src/utilities.lisp

Internal Definitions

Next: , Previous: , Up: Lisp files   [Contents][Index]

4.1.4 uax-15/src/trivial-utf-16.lisp

Dependency

package.lisp (file)

Parent

src (module)

Location

src/trivial-utf-16.lisp

Internal Definitions

Next: , Previous: , Up: Lisp files   [Contents][Index]

4.1.5 uax-15/src/precomputed-tables.lisp

Dependencies
Parent

src (module)

Location

src/precomputed-tables.lisp

Internal Definitions

Next: , Previous: , Up: Lisp files   [Contents][Index]

4.1.6 uax-15/src/normalize-backend.lisp

Dependencies
Parent

src (module)

Location

src/normalize-backend.lisp

Internal Definitions

Previous: , Up: Lisp files   [Contents][Index]

4.1.7 uax-15/src/uax-15.lisp

Dependencies
Parent

src (module)

Location

src/uax-15.lisp

Exported Definitions
Internal Definitions

Next: , Previous: , Up: Top   [Contents][Index]

5 Packages

Packages are listed by definition order.


Next: , Previous: , Up: Packages   [Contents][Index]

5.1 uax-15-system

Source

uax-15.asd

Use List
Internal Definitions

*string-file* (special variable)


Previous: , Up: Packages   [Contents][Index]

5.2 uax-15

Source

package.lisp (file)

Use List

common-lisp

Exported Definitions
Internal Definitions

Next: , Previous: , Up: Top   [Contents][Index]

6 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


Next: , Previous: , Up: Definitions   [Contents][Index]

6.1 Exported definitions


Previous: , Up: Exported definitions   [Contents][Index]

6.1.1 Functions

Function: get-canonical-combining-class-map ()
Package

uax-15

Source

uax-15.lisp (file)

Function: get-illegal-char-list NORMALIZATION-FORM

Takes a normalization form, e.g. :nfkc and returns a list of lists of form (#NO-BREAK_SPACE NIL) where the first item is the character name and the second item has the value N or M or nil indicating whether the character may require renormalization.

Package

uax-15

Source

uax-15.lisp (file)

Function: get-mapping NORMALIZATION-FORM &aux MAPPING

Note no mapping for :nfkc

Package

uax-15

Source

uax-15.lisp (file)

Function: normalize STR NORMALIZATION-FORM &key RFC

Base external function which calls the appropriate normalization for the normalization form. The default normaliation form is :nfkc, but :nfd, :nfkd and :nfc are also available.

Package

uax-15

Source

uax-15.lisp (file)


Previous: , Up: Definitions   [Contents][Index]

6.2 Internal definitions


Next: , Previous: , Up: Internal definitions   [Contents][Index]

6.2.1 Special variables

Special Variable: *canonical-combining-class*
Package

uax-15

Source

precomputed-tables.lisp (file)

Special Variable: *canonical-comp-map*
Package

uax-15

Source

precomputed-tables.lisp (file)

Special Variable: *canonical-decomp-map*
Package

uax-15

Source

precomputed-tables.lisp (file)

Special Variable: *compatible-decomp-map*
Package

uax-15

Source

precomputed-tables.lisp (file)

Special Variable: *composition-exclusions-data*
Package

uax-15

Source

precomputed-tables.lisp (file)

Special Variable: *data-directory*
Package

uax-15

Source

precomputed-tables.lisp (file)

Special Variable: *derived-normalization-props-data*
Package

uax-15

Source

uax-15.lisp (file)

Special Variable: *derived-normalization-props-data-file*
Package

uax-15

Source

uax-15.lisp (file)

Special Variable: *string-file*
Package

uax-15-system

Source

uax-15.asd

Special Variable: *unicode-data*
Package

uax-15

Source

precomputed-tables.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

6.2.2 Macros

Macro: nconcf LIST1 LIST2
Package

uax-15

Source

utilities.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

6.2.3 Functions

Function: bad-char-error MESSAGE &key VALUE NORMALIZATION-FORM
Package

uax-15

Source

utilities.lisp (file)

Function: canonical-ordering DECOMPOSED-STRING &aux S
Package

uax-15

Source

normalize-backend.lisp (file)

Function: codepoint-as-utf-16 CODEPOINT

Translate a Unicode code point to its UTF-16 representation. Returns a list of one or two codepoints. Passes surrogate code points straight through.

Package

uax-15

Source

trivial-utf-16.lisp (file)

Function: compose DECOMPOSED-STRING
Package

uax-15

Source

normalize-backend.lisp (file)

Function: compose-hangul STR &aux LEN
Package

uax-15

Source

normalize-backend.lisp (file)

Function: decode-utf-16 UTF-16-STRING

Turn a vector of UTF-16 code units into a vector of Unicode code points. Passes unpaired surrogate codepoints straight through.

Package

uax-15

Source

trivial-utf-16.lisp (file)

Function: decompose S TYPE
Package

uax-15

Source

normalize-backend.lisp (file)

Function: decompose-char CHAR &optional TYPE
Package

uax-15

Source

normalize-backend.lisp (file)

Function: decompose-hangul-char CH
Package

uax-15

Source

normalize-backend.lisp (file)

Function: encode-utf-16 UNICODE-STRING

Turn a vector of Unicode code points into a vector of UTF-16 code units. Indifferent to unpaired surrogates.

Package

uax-15

Source

trivial-utf-16.lisp (file)

Function: from-unicode-string UNICODE-STRING

Take a vector of Unicode code points and turn it into a Lisp string.

Package

uax-15

Source

trivial-utf-16.lisp (file)

Function: get-canonical-combining-class CH
Package

uax-15

Source

normalize-backend.lisp (file)

Function: int-to-hex-string INT
Package

uax-15

Source

utilities.lisp (file)

Function: nfc S
Package

uax-15

Source

normalize-backend.lisp (file)

Function: nfd S
Package

uax-15

Source

normalize-backend.lisp (file)

Function: nfkc S
Package

uax-15

Source

normalize-backend.lisp (file)

Function: nfkd S
Package

uax-15

Source

normalize-backend.lisp (file)

Function: normalize-char CHR NORMALIZATION-FORM

Runs normalize on a single character input. You must provide the normalization form (:nfd, :nfkd, :nfc, or :nfkc)

Package

uax-15

Source

uax-15.lisp (file)

Function: parse-hex-list-to-string LST

Takes a list of numbers and returns a string of characters

Package

uax-15

Source

utilities.lisp (file)

Function: parse-hex-string-to-char STR

Parse a hex string which is a single character into a character using code-char.

Package

uax-15

Source

utilities.lisp (file)

Function: parse-hex-string-to-int STR

Parse a string which is a single character in hex to a decimal.

Package

uax-15

Source

utilities.lisp (file)

Function: parse-hex-string-to-string STR

Takes a string which may be one or more hex numbers e.g. ’0044 0307’, builds an array of characters, coerces to string and returns the string. Mostly used for testing.

Package

uax-15

Source

utilities.lisp (file)

Function: surrogates-to-codepoint HIGH-SURROGATE LOW-SURROGATE

Translate a pair of surrogate codepoints to a non-BMP codepoint. Returns the codepoint as an integer.

Package

uax-15

Source

trivial-utf-16.lisp (file)

Function: to-unicode-string LISP-STRING

Take a Lisp string and turn it into a vector of Unicode code points.

Package

uax-15

Source

trivial-utf-16.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

6.2.4 Generic functions

Generic Function: bad-char-error-message CONDITION
Generic Function: (setf bad-char-error-message) NEW-VALUE CONDITION
Package

uax-15

Methods
Method: bad-char-error-message (CONDITION bad-char-error)
Method: (setf bad-char-error-message) NEW-VALUE (CONDITION bad-char-error)
Source

utilities.lisp (file)

Generic Function: bad-char-error-normalization-form CONDITION
Generic Function: (setf bad-char-error-normalization-form) NEW-VALUE CONDITION
Package

uax-15

Methods
Method: bad-char-error-normalization-form (CONDITION bad-char-error)
Method: (setf bad-char-error-normalization-form) NEW-VALUE (CONDITION bad-char-error)
Source

utilities.lisp (file)

Generic Function: bad-char-error-value CONDITION
Generic Function: (setf bad-char-error-value) NEW-VALUE CONDITION
Package

uax-15

Methods
Method: bad-char-error-value (CONDITION bad-char-error)
Method: (setf bad-char-error-value) NEW-VALUE (CONDITION bad-char-error)
Source

utilities.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

6.2.5 Conditions

Condition: bad-char-error ()
Package

uax-15

Source

utilities.lisp (file)

Direct superclasses

error (condition)

Direct methods
Direct slots
Slot: message

Text message indicating what went wrong with the validation.

Initargs

:message

Initform

(quote nil)

Readers

bad-char-error-message (generic function)

Writers

(setf bad-char-error-message) (generic function)

Slot: value

The value of the field for which the error is signalled.

Initargs

:value

Initform

(quote nil)

Readers

bad-char-error-value (generic function)

Writers

(setf bad-char-error-value) (generic function)

Slot: normalization-form

The normalization form for the error was signalled.

Initargs

:normalization-form

Initform

(quote nil)

Readers

bad-char-error-normalization-form (generic function)

Writers

(setf bad-char-error-normalization-form) (generic function)


Previous: , Up: Internal definitions   [Contents][Index]

6.2.6 Types

Type: high-surrogate ()

A Unicode High Surrogate.

Package

uax-15

Source

trivial-utf-16.lisp (file)

Type: low-surrogate ()

A Unicode Low Surrogate.

Package

uax-15

Source

trivial-utf-16.lisp (file)

Type: unicode-point ()

A Unicode code point.

Package

uax-15

Source

trivial-utf-16.lisp (file)

Type: unicode-string ()

A vector of Unicode code points.

Package

uax-15

Source

trivial-utf-16.lisp (file)


Previous: , Up: Top   [Contents][Index]

Appendix A Indexes


Next: , Previous: , Up: Indexes   [Contents][Index]

A.1 Concepts

Jump to:   F   L   M   U  
Index Entry  Section

F
File, Lisp, uax-15.asd: The uax-15․asd file
File, Lisp, uax-15/src/normalize-backend.lisp: The uax-15/src/normalize-backend․lisp file
File, Lisp, uax-15/src/package.lisp: The uax-15/src/package․lisp file
File, Lisp, uax-15/src/precomputed-tables.lisp: The uax-15/src/precomputed-tables․lisp file
File, Lisp, uax-15/src/trivial-utf-16.lisp: The uax-15/src/trivial-utf-16․lisp file
File, Lisp, uax-15/src/uax-15.lisp: The uax-15/src/uax-15․lisp file
File, Lisp, uax-15/src/utilities.lisp: The uax-15/src/utilities․lisp file

L
Lisp File, uax-15.asd: The uax-15․asd file
Lisp File, uax-15/src/normalize-backend.lisp: The uax-15/src/normalize-backend․lisp file
Lisp File, uax-15/src/package.lisp: The uax-15/src/package․lisp file
Lisp File, uax-15/src/precomputed-tables.lisp: The uax-15/src/precomputed-tables․lisp file
Lisp File, uax-15/src/trivial-utf-16.lisp: The uax-15/src/trivial-utf-16․lisp file
Lisp File, uax-15/src/uax-15.lisp: The uax-15/src/uax-15․lisp file
Lisp File, uax-15/src/utilities.lisp: The uax-15/src/utilities․lisp file

M
Module, uax-15/src: The uax-15/src module

U
uax-15.asd: The uax-15․asd file
uax-15/src: The uax-15/src module
uax-15/src/normalize-backend.lisp: The uax-15/src/normalize-backend․lisp file
uax-15/src/package.lisp: The uax-15/src/package․lisp file
uax-15/src/precomputed-tables.lisp: The uax-15/src/precomputed-tables․lisp file
uax-15/src/trivial-utf-16.lisp: The uax-15/src/trivial-utf-16․lisp file
uax-15/src/uax-15.lisp: The uax-15/src/uax-15․lisp file
uax-15/src/utilities.lisp: The uax-15/src/utilities․lisp file

Jump to:   F   L   M   U  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.2 Functions

Jump to:   (  
B   C   D   E   F   G   I   M   N   P   S   T  
Index Entry  Section

(
(setf bad-char-error-message): Internal generic functions
(setf bad-char-error-message): Internal generic functions
(setf bad-char-error-normalization-form): Internal generic functions
(setf bad-char-error-normalization-form): Internal generic functions
(setf bad-char-error-value): Internal generic functions
(setf bad-char-error-value): Internal generic functions

B
bad-char-error: Internal functions
bad-char-error-message: Internal generic functions
bad-char-error-message: Internal generic functions
bad-char-error-normalization-form: Internal generic functions
bad-char-error-normalization-form: Internal generic functions
bad-char-error-value: Internal generic functions
bad-char-error-value: Internal generic functions

C
canonical-ordering: Internal functions
codepoint-as-utf-16: Internal functions
compose: Internal functions
compose-hangul: Internal functions

D
decode-utf-16: Internal functions
decompose: Internal functions
decompose-char: Internal functions
decompose-hangul-char: Internal functions

E
encode-utf-16: Internal functions

F
from-unicode-string: Internal functions
Function, bad-char-error: Internal functions
Function, canonical-ordering: Internal functions
Function, codepoint-as-utf-16: Internal functions
Function, compose: Internal functions
Function, compose-hangul: Internal functions
Function, decode-utf-16: Internal functions
Function, decompose: Internal functions
Function, decompose-char: Internal functions
Function, decompose-hangul-char: Internal functions
Function, encode-utf-16: Internal functions
Function, from-unicode-string: Internal functions
Function, get-canonical-combining-class: Internal functions
Function, get-canonical-combining-class-map: Exported functions
Function, get-illegal-char-list: Exported functions
Function, get-mapping: Exported functions
Function, int-to-hex-string: Internal functions
Function, nfc: Internal functions
Function, nfd: Internal functions
Function, nfkc: Internal functions
Function, nfkd: Internal functions
Function, normalize: Exported functions
Function, normalize-char: Internal functions
Function, parse-hex-list-to-string: Internal functions
Function, parse-hex-string-to-char: Internal functions
Function, parse-hex-string-to-int: Internal functions
Function, parse-hex-string-to-string: Internal functions
Function, surrogates-to-codepoint: Internal functions
Function, to-unicode-string: Internal functions

G
Generic Function, (setf bad-char-error-message): Internal generic functions
Generic Function, (setf bad-char-error-normalization-form): Internal generic functions
Generic Function, (setf bad-char-error-value): Internal generic functions
Generic Function, bad-char-error-message: Internal generic functions
Generic Function, bad-char-error-normalization-form: Internal generic functions
Generic Function, bad-char-error-value: Internal generic functions
get-canonical-combining-class: Internal functions
get-canonical-combining-class-map: Exported functions
get-illegal-char-list: Exported functions
get-mapping: Exported functions

I
int-to-hex-string: Internal functions

M
Macro, nconcf: Internal macros
Method, (setf bad-char-error-message): Internal generic functions
Method, (setf bad-char-error-normalization-form): Internal generic functions
Method, (setf bad-char-error-value): Internal generic functions
Method, bad-char-error-message: Internal generic functions
Method, bad-char-error-normalization-form: Internal generic functions
Method, bad-char-error-value: Internal generic functions

N
nconcf: Internal macros
nfc: Internal functions
nfd: Internal functions
nfkc: Internal functions
nfkd: Internal functions
normalize: Exported functions
normalize-char: Internal functions

P
parse-hex-list-to-string: Internal functions
parse-hex-string-to-char: Internal functions
parse-hex-string-to-int: Internal functions
parse-hex-string-to-string: Internal functions

S
surrogates-to-codepoint: Internal functions

T
to-unicode-string: Internal functions

Jump to:   (  
B   C   D   E   F   G   I   M   N   P   S   T  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.3 Variables

Jump to:   *  
M   N   S   V  
Index Entry  Section

*
*canonical-combining-class*: Internal special variables
*canonical-comp-map*: Internal special variables
*canonical-decomp-map*: Internal special variables
*compatible-decomp-map*: Internal special variables
*composition-exclusions-data*: Internal special variables
*data-directory*: Internal special variables
*derived-normalization-props-data*: Internal special variables
*derived-normalization-props-data-file*: Internal special variables
*string-file*: Internal special variables
*unicode-data*: Internal special variables

M
message: Internal conditions

N
normalization-form: Internal conditions

S
Slot, message: Internal conditions
Slot, normalization-form: Internal conditions
Slot, value: Internal conditions
Special Variable, *canonical-combining-class*: Internal special variables
Special Variable, *canonical-comp-map*: Internal special variables
Special Variable, *canonical-decomp-map*: Internal special variables
Special Variable, *compatible-decomp-map*: Internal special variables
Special Variable, *composition-exclusions-data*: Internal special variables
Special Variable, *data-directory*: Internal special variables
Special Variable, *derived-normalization-props-data*: Internal special variables
Special Variable, *derived-normalization-props-data-file*: Internal special variables
Special Variable, *string-file*: Internal special variables
Special Variable, *unicode-data*: Internal special variables

V
value: Internal conditions

Jump to:   *  
M   N   S   V  

Previous: , Up: Indexes   [Contents][Index]

A.4 Data types

Jump to:   B   C   H   L   P   S   T   U  
Index Entry  Section

B
bad-char-error: Internal conditions

C
Condition, bad-char-error: Internal conditions

H
high-surrogate: Internal types

L
low-surrogate: Internal types

P
Package, uax-15: The uax-15 package
Package, uax-15-system: The uax-15-system package

S
System, uax-15: The uax-15 system

T
Type, high-surrogate: Internal types
Type, low-surrogate: Internal types
Type, unicode-point: Internal types
Type, unicode-string: Internal types

U
uax-15: The uax-15 system
uax-15: The uax-15 package
uax-15-system: The uax-15-system package
unicode-point: Internal types
unicode-string: Internal types

Jump to:   B   C   H   L   P   S   T   U