The uax-14 Reference Manual

Table of Contents

Next: , Previous: , Up: (dir)   [Contents][Index]

The uax-14 Reference Manual

This is the uax-14 Reference Manual, version 1.0.0, generated automatically by Declt version 3.0 "Montgomery Scott" on Mon Dec 02 11:35:42 2019 GMT+0.


Next: , Previous: , Up: Top   [Contents][Index]

1 Introduction

## About UAX-14
This is an implementation of the "Unicode Standards Annex #14"(http://www.unicode.org/reports/tr14/)'s line breaking algorithm. It provides a fast and convenient way to determine line breaking opportunities in text.

Note that this algorithm does not support break opportunities that require morphological analysis. In order to handle such cases, please consult a system that provides this kind of capability, such as a hyphenation algorithm.

Also note that this system is completely unaware of layouting decisions. Any kind of layouting decisions, such as which breaks to pick, how to space between words, how to handle bidirectionality, and what to do in emergency situations when there are no breaks on an overfull line are left up to the user.

The system passes all tests offered by the Unicode standard.

## How To
The system will compile binary database files on first load. Should anything go wrong during this process, a note is produced on load. If you would like to prevent this automated loading, push ``uax-14-no-load`` to ``*features*`` before loading. You can then manually load the database files when convenient through ``load-databases``.

Once loaded, you can produce a list of line breaks for a string with ``list-breaks`` or break a string at every opportunity with ``break-string``. Typically however you will want to scan for the next break as you move along the string during layouting. To do so, create a breaker with ``make-breaker``, and call ``next-break`` whenever the next line break opportunity is required.

In pseudo-code, that could look something like this. We assume the local nickname ``uax-14`` for ``org.shirakumo.alloy.uax-14`` here.

::common lisp
(loop with breaker = (uax-14:make-breaker string)
      with start = 0 and last = 0
      do (multiple-value-bind (pos mandatory) (uax-14:next-break breaker)
           (cond (mandatory
                  (insert-break pos)
                  (setf start pos))
                 ((beyond-extents-p start pos)
                  (if (< last start) ; Force a break if we are overfull.
                      (loop while (beyond-extents-p start pos)
                            do (let ((next (find-last-fitting-cluster start)))
                                 (insert-break next)
                                 (setf start next))
                            finally (setf pos start))
                      (insert-break last))))
           (setf last pos)))
::

## External Files
The following files are from their corresponding external sources, last accessed on 2019.09.03:

- ``LineBreak.txt`` https://www.unicode.org/Public/UCD/latest/ucd/LineBreak.txt
- ``LineBreakTest.txt`` https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/LineBreakTest.txt

At the time, Unicode 12.1 was considered the latest version.

## Acknowledgements
The code in this project is largely based on the "linebreak"(https://github.com/foliojs/linebreak) project by Devon Govett et al.


Next: , Previous: , Up: Top   [Contents][Index]

2 Systems

The main system appears first, followed by any subsystem dependency.


Previous: , Up: Systems   [Contents][Index]

2.1 uax-14

Maintainer

Nicolas Hafner <shinmera@tymoon.eu>

Author

Nicolas Hafner <shinmera@tymoon.eu>

Home Page

https://github.com/Shinmera/uax-14

Source Control

(:git "https://github.com/shinmera/uax-14.git")

Bug Tracker

https://github.com/Shinmera/uax-14/issues

License

zlib

Description

Implementation of the Unicode Standards Annex #14’s line breaking algorithm

Version

1.0.0

Dependency

documentation-utils

Source

uax-14.asd (file)

Components

Next: , Previous: , Up: Top   [Contents][Index]

3 Files

Files are sorted by type and then listed depth-first from the systems components trees.


Previous: , Up: Files   [Contents][Index]

3.1 Lisp


Next: , Previous: , Up: Lisp files   [Contents][Index]

3.1.1 uax-14.asd

Location

uax-14.asd

Systems

uax-14 (system)


Next: , Previous: , Up: Lisp files   [Contents][Index]

3.1.2 uax-14/package.lisp

Parent

uax-14 (system)

Location

package.lisp

Packages

org.shirakumo.alloy.uax-14


Next: , Previous: , Up: Lisp files   [Contents][Index]

3.1.3 uax-14/database.lisp

Dependency

package.lisp (file)

Parent

uax-14 (system)

Location

database.lisp

Exported Definitions
Internal Definitions

Next: , Previous: , Up: Lisp files   [Contents][Index]

3.1.4 uax-14/uax-14.lisp

Dependency

database.lisp (file)

Parent

uax-14 (system)

Location

uax-14.lisp

Exported Definitions
Internal Definitions

Previous: , Up: Lisp files   [Contents][Index]

3.1.5 uax-14/documentation.lisp

Dependency

uax-14.lisp (file)

Parent

uax-14 (system)

Location

documentation.lisp


Next: , Previous: , Up: Top   [Contents][Index]

4 Packages

Packages are listed by definition order.


Previous: , Up: Packages   [Contents][Index]

4.1 org.shirakumo.alloy.uax-14

Source

package.lisp (file)

Use List

common-lisp

Exported Definitions
Internal Definitions

Next: , Previous: , Up: Top   [Contents][Index]

5 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


Next: , Previous: , Up: Definitions   [Contents][Index]

5.1 Exported definitions


Next: , Previous: , Up: Exported definitions   [Contents][Index]

5.1.1 Special variables

Special Variable: *line-break-database-file*

Variable containing the absolute path of the line break database file.

See LOAD-DATABASES
See COMPILE-DATABASES

Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Special Variable: *pair-table-file*

Variable containing the absolute path of the pair table file.

See LOAD-DATABASES
See COMPILE-DATABASES

Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)


Next: , Previous: , Up: Exported definitions   [Contents][Index]

5.1.2 Functions

Function: break-string STRING &optional MANDATORY-ONLY BREAKER

Returns a list of all the pieces of the string, broken.

If MANDATORY-ONLY is T, the string is only split at mandatory line break opportunities, otherwise it is split at every opportunity.

See MAKE-BREAKER
See NEXT-BREAK

Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: compile-databases ()

Compiles the database files from their sources.

This will load an optional part of the system and compile the database files to an efficient byte representation. If the compilation is successful, LOAD-DATABASES is called automatically.

See *LINE-BREAK-DATABASE-FILE*
See *PAIR-TABLE-FILE*
See LOAD-DATABASES

Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: list-breaks STRING &optional BREAKER

Returns a list of all line break opportunities in the string.

The list has the following form:

LIST ::= ENTRY+
ENTRY ::= (position mandatory)

This is equivalent to constructing a breaker and collecting the values of NEXT-BREAK in a loop.

See MAKE-BREAKER
See NEXT-BREAK

Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: load-databases ()

Loads the databases from their files into memory.

If one of the files is missing, a warning of type NO-DATABASE-FILES is signalled. If the loading succeeds, T is returned.

See *LINE-BREAK-DATABASE-FILE*
See *PAIR-TABLE-FILE*
See NO-DATABASE-FILES

Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: make-breaker STRING &optional BREAKER

Returns a breaker that can find line break opportunities in the given string.

If the optional breaker argument is supplied, the supplied breaker is modified and reset to work with the new string instead. This allows
you to re-use a breaker.

Note that while you may pass a non-simple string, modifying this
string without resetting any breaker using it will result in undefined behaviour.

See BREAKER

Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: next-break BREAKER

Returns the next line breaking opportunity of the breaker, if any.

Returns two values:

POSITION — The character index in the string at which the break is located, or NIL if no further breaks are possible. MANDATORY — Whether the break must be made at this location.

Note that there is always in the very least one break opportunity, namely at the end of the string. However, after consuming this break opportunity, NEXT-BREAK will return NIL.

Note that you may have to insert additional line breaks as required by the layout constraints.

See BREAKER

Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)


Next: , Previous: , Up: Exported definitions   [Contents][Index]

5.1.3 Conditions

Condition: no-database-files ()

Warning signalled when LOAD-DATABASES is called and the files are not present.

Two restarts must be active when this condition is signalled:

COMPILE — Call COMPILE-DATABASES
ABORT — Abort loading the databases, leaving them at their
previous state.

See LOAD-DATABASES

Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Direct superclasses

warning (condition)


Previous: , Up: Exported definitions   [Contents][Index]

5.1.4 Structures

Structure: breaker ()

Contains line breaking state.

An instance of this is only useful for passing to MAKE-BREAKER and NEXT-BREAK. It contains internal state that manages the line breaking algorithm.

See MAKE-BREAKER
See NEXT-BREAK

Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Direct superclasses

structure-object (structure)

Direct methods

print-object (method)

Direct slots
Slot: string
Type

string

Readers

breaker-string (function)

Writers

(setf breaker-string) (function)

Slot: pos
Type

org.shirakumo.alloy.uax-14::idx

Initform

0

Readers

breaker-pos (function)

Writers

(setf breaker-pos) (function)

Slot: last-pos
Type

org.shirakumo.alloy.uax-14::idx

Initform

0

Readers

breaker-last-pos (function)

Writers

(setf breaker-last-pos) (function)

Slot: cur-class
Type

(unsigned-byte 8)

Initform

0

Readers

breaker-cur-class (function)

Writers

(setf breaker-cur-class) (function)

Slot: next-class
Type

(unsigned-byte 8)

Initform

0

Readers

breaker-next-class (function)

Writers

(setf breaker-next-class) (function)

Slot: lb8a
Type

boolean

Readers

breaker-lb8a (function)

Writers

(setf breaker-lb8a) (function)

Slot: lb21a
Type

boolean

Readers

breaker-lb21a (function)

Writers

(setf breaker-lb21a) (function)

Slot: lb30a
Type

org.shirakumo.alloy.uax-14::idx

Initform

0

Readers

breaker-lb30a (function)

Writers

(setf breaker-lb30a) (function)


Previous: , Up: Definitions   [Contents][Index]

5.2 Internal definitions


Next: , Previous: , Up: Internal definitions   [Contents][Index]

5.2.1 Special variables

Special Variable: *here*
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

5.2.2 Macros

Macro: defglobal NAME VALUE
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

5.2.3 Compiler macros

Compiler Macro: pair-id PAIR
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Compiler Macro: type-id TYPE
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

5.2.4 Functions

Function: %make-breaker STRING
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-cur-class INSTANCE
Function: (setf breaker-cur-class) VALUE INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-last-pos INSTANCE
Function: (setf breaker-last-pos) VALUE INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-lb21a INSTANCE
Function: (setf breaker-lb21a) VALUE INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-lb30a INSTANCE
Function: (setf breaker-lb30a) VALUE INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-lb8a INSTANCE
Function: (setf breaker-lb8a) VALUE INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-next-class INSTANCE
Function: (setf breaker-next-class) VALUE INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-p OBJECT
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-pos INSTANCE
Function: (setf breaker-pos) VALUE INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: breaker-string INSTANCE
Function: (setf breaker-string) VALUE INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: char-line-break-type CHAR
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: code-point-at STRING START
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: copy-breaker INSTANCE
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: handle-simple-break NEXT-CLASS CUR-CLASS
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: line-break-id ID
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: load-line-break-database &optional SOURCE
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: load-pair-table &optional SOURCE
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: normalize-break-id ID
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: normalize-first-break ID
Package

org.shirakumo.alloy.uax-14

Source

uax-14.lisp (file)

Function: pair-id PAIR
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: pair-type B A
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: pair-type-id B A
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Function: type-id TYPE
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)


Previous: , Up: Internal definitions   [Contents][Index]

5.2.5 Types

Type: code ()
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)

Type: idx ()
Package

org.shirakumo.alloy.uax-14

Source

database.lisp (file)


Previous: , Up: Top   [Contents][Index]

Appendix A Indexes


Next: , Previous: , Up: Indexes   [Contents][Index]

A.1 Concepts

Jump to:   F   L   U  
Index Entry  Section

F
File, Lisp, uax-14.asd: The uax-14․asd file
File, Lisp, uax-14/database.lisp: The uax-14/database․lisp file
File, Lisp, uax-14/documentation.lisp: The uax-14/documentation․lisp file
File, Lisp, uax-14/package.lisp: The uax-14/package․lisp file
File, Lisp, uax-14/uax-14.lisp: The uax-14/uax-14․lisp file

L
Lisp File, uax-14.asd: The uax-14․asd file
Lisp File, uax-14/database.lisp: The uax-14/database․lisp file
Lisp File, uax-14/documentation.lisp: The uax-14/documentation․lisp file
Lisp File, uax-14/package.lisp: The uax-14/package․lisp file
Lisp File, uax-14/uax-14.lisp: The uax-14/uax-14․lisp file

U
uax-14.asd: The uax-14․asd file
uax-14/database.lisp: The uax-14/database․lisp file
uax-14/documentation.lisp: The uax-14/documentation․lisp file
uax-14/package.lisp: The uax-14/package․lisp file
uax-14/uax-14.lisp: The uax-14/uax-14․lisp file

Jump to:   F   L   U  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.2 Functions

Jump to:   %   (  
B   C   D   F   H   L   M   N   P   T  
Index Entry  Section

%
%make-breaker: Internal functions

(
(setf breaker-cur-class): Internal functions
(setf breaker-last-pos): Internal functions
(setf breaker-lb21a): Internal functions
(setf breaker-lb30a): Internal functions
(setf breaker-lb8a): Internal functions
(setf breaker-next-class): Internal functions
(setf breaker-pos): Internal functions
(setf breaker-string): Internal functions

B
break-string: Exported functions
breaker-cur-class: Internal functions
breaker-last-pos: Internal functions
breaker-lb21a: Internal functions
breaker-lb30a: Internal functions
breaker-lb8a: Internal functions
breaker-next-class: Internal functions
breaker-p: Internal functions
breaker-pos: Internal functions
breaker-string: Internal functions

C
char-line-break-type: Internal functions
code-point-at: Internal functions
compile-databases: Exported functions
Compiler Macro, pair-id: Internal compiler macros
Compiler Macro, type-id: Internal compiler macros
copy-breaker: Internal functions

D
defglobal: Internal macros

F
Function, %make-breaker: Internal functions
Function, (setf breaker-cur-class): Internal functions
Function, (setf breaker-last-pos): Internal functions
Function, (setf breaker-lb21a): Internal functions
Function, (setf breaker-lb30a): Internal functions
Function, (setf breaker-lb8a): Internal functions
Function, (setf breaker-next-class): Internal functions
Function, (setf breaker-pos): Internal functions
Function, (setf breaker-string): Internal functions
Function, break-string: Exported functions
Function, breaker-cur-class: Internal functions
Function, breaker-last-pos: Internal functions
Function, breaker-lb21a: Internal functions
Function, breaker-lb30a: Internal functions
Function, breaker-lb8a: Internal functions
Function, breaker-next-class: Internal functions
Function, breaker-p: Internal functions
Function, breaker-pos: Internal functions
Function, breaker-string: Internal functions
Function, char-line-break-type: Internal functions
Function, code-point-at: Internal functions
Function, compile-databases: Exported functions
Function, copy-breaker: Internal functions
Function, handle-simple-break: Internal functions
Function, line-break-id: Internal functions
Function, list-breaks: Exported functions
Function, load-databases: Exported functions
Function, load-line-break-database: Internal functions
Function, load-pair-table: Internal functions
Function, make-breaker: Exported functions
Function, next-break: Exported functions
Function, normalize-break-id: Internal functions
Function, normalize-first-break: Internal functions
Function, pair-id: Internal functions
Function, pair-type: Internal functions
Function, pair-type-id: Internal functions
Function, type-id: Internal functions

H
handle-simple-break: Internal functions

L
line-break-id: Internal functions
list-breaks: Exported functions
load-databases: Exported functions
load-line-break-database: Internal functions
load-pair-table: Internal functions

M
Macro, defglobal: Internal macros
make-breaker: Exported functions

N
next-break: Exported functions
normalize-break-id: Internal functions
normalize-first-break: Internal functions

P
pair-id: Internal compiler macros
pair-id: Internal functions
pair-type: Internal functions
pair-type-id: Internal functions

T
type-id: Internal compiler macros
type-id: Internal functions

Jump to:   %   (  
B   C   D   F   H   L   M   N   P   T  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.3 Variables

Jump to:   *  
C   L   N   P   S  
Index Entry  Section

*
*here*: Internal special variables
*line-break-database-file*: Exported special variables
*pair-table-file*: Exported special variables

C
cur-class: Exported structures

L
last-pos: Exported structures
lb21a: Exported structures
lb30a: Exported structures
lb8a: Exported structures

N
next-class: Exported structures

P
pos: Exported structures

S
Slot, cur-class: Exported structures
Slot, last-pos: Exported structures
Slot, lb21a: Exported structures
Slot, lb30a: Exported structures
Slot, lb8a: Exported structures
Slot, next-class: Exported structures
Slot, pos: Exported structures
Slot, string: Exported structures
Special Variable, *here*: Internal special variables
Special Variable, *line-break-database-file*: Exported special variables
Special Variable, *pair-table-file*: Exported special variables
string: Exported structures

Jump to:   *  
C   L   N   P   S  

Previous: , Up: Indexes   [Contents][Index]

A.4 Data types

Jump to:   B   C   I   N   O   P   S   T   U  
Index Entry  Section

B
breaker: Exported structures

C
code: Internal types
Condition, no-database-files: Exported conditions

I
idx: Internal types

N
no-database-files: Exported conditions

O
org.shirakumo.alloy.uax-14: The org․shirakumo․alloy․uax-14 package

P
Package, org.shirakumo.alloy.uax-14: The org․shirakumo․alloy․uax-14 package

S
Structure, breaker: Exported structures
System, uax-14: The uax-14 system

T
Type, code: Internal types
Type, idx: Internal types

U
uax-14: The uax-14 system

Jump to:   B   C   I   N   O   P   S   T   U