The uax-14 Reference Manual

Next: , Previous: , Up: (dir)   [Contents][Index]

The uax-14 Reference Manual

This is the uax-14 Reference Manual, version 1.0.0, generated automatically by Declt version 4.0 beta 2 "William Riker" on Thu Sep 15 06:25:27 2022 GMT+0.

Table of Contents


1 Introduction

## About UAX-14
This is an implementation of the "Unicode Standards Annex #14"(http://www.unicode.org/reports/tr14/)'s line breaking algorithm. It provides a fast and convenient way to determine line breaking opportunities in text.

Note that this algorithm does not support break opportunities that require morphological analysis. In order to handle such cases, please consult a system that provides this kind of capability, such as a hyphenation algorithm.

Also note that this system is completely unaware of layouting decisions. Any kind of layouting decisions, such as which breaks to pick, how to space between words, how to handle bidirectionality, and what to do in emergency situations when there are no breaks on an overfull line are left up to the user.

The system passes all tests offered by the Unicode standard.

## How To
The system will compile binary database files on first load. Should anything go wrong during this process, a note is produced on load. If you would like to prevent this automated loading, push ``uax-14-no-load`` to ``*features*`` before loading. You can then manually load the database files when convenient through ``load-databases``.

Once loaded, you can produce a list of line breaks for a string with ``list-breaks`` or break a string at every opportunity with ``break-string``. Typically however you will want to scan for the next break as you move along the string during layouting. To do so, create a breaker with ``make-breaker``, and call ``next-break`` whenever the next line break opportunity is required.

In pseudo-code, that could look something like this. We assume the local nickname ``uax-14`` for ``org.shirakumo.alloy.uax-14`` here.

::common lisp
(loop with breaker = (uax-14:make-breaker string)
      with start = 0 and last = 0
      do (multiple-value-bind (pos mandatory) (uax-14:next-break breaker)
           (cond (mandatory
                  (insert-break pos)
                  (setf start pos))
                 ((beyond-extents-p start pos)
                  (if (< last start) ; Force a break if we are overfull.
                      (loop while (beyond-extents-p start pos)
                            do (let ((next (find-last-fitting-cluster start)))
                                 (insert-break next)
                                 (setf start next))
                            finally (setf pos start))
                      (insert-break last))))
           (setf last pos)))
::

## External Files
The following files are from their corresponding external sources, last accessed on 2019.09.03:

- ``LineBreak.txt`` https://www.unicode.org/Public/UCD/latest/ucd/LineBreak.txt
- ``LineBreakTest.txt`` https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/LineBreakTest.txt

At the time, Unicode 12.1 was considered the latest version.

## Acknowledgements
The code in this project is largely based on the "linebreak"(https://github.com/foliojs/linebreak) project by Devon Govett et al.


2 Systems

The main system appears first, followed by any subsystem dependency.


Previous: , Up: Systems   [Contents][Index]

2.1 uax-14

Implementation of the Unicode Standards Annex #14’s line breaking algorithm

Maintainer

Nicolas Hafner <shinmera@tymoon.eu>

Author

Nicolas Hafner <shinmera@tymoon.eu>

Home Page

https://github.com/Shinmera/uax-14

Source Control

(GIT https://github.com/Shinmera/uax-14.git)

Bug Tracker

https://github.com/Shinmera/uax-14/issues

License

zlib

Version

1.0.0

Dependency

documentation-utils (system).

Source

uax-14.asd.

Child Components

3 Files

Files are sorted by type and then listed depth-first from the systems components trees.


Previous: , Up: Files   [Contents][Index]

3.1 Lisp


Next: , Previous: , Up: Lisp   [Contents][Index]

3.1.1 uax-14/uax-14.asd

Source

uax-14.asd.

Parent Component

uax-14 (system).

ASDF Systems

uax-14.


3.1.2 uax-14/package.lisp

Source

uax-14.asd.

Parent Component

uax-14 (system).

Packages

org.shirakumo.alloy.uax-14.


3.1.3 uax-14/database.lisp

Dependency

package.lisp (file).

Source

uax-14.asd.

Parent Component

uax-14 (system).

Public Interface
Internals

3.1.4 uax-14/uax-14.lisp

Dependency

database.lisp (file).

Source

uax-14.asd.

Parent Component

uax-14 (system).

Public Interface
Internals

Previous: , Up: Lisp   [Contents][Index]

3.1.5 uax-14/documentation.lisp

Dependency

uax-14.lisp (file).

Source

uax-14.asd.

Parent Component

uax-14 (system).


4 Packages

Packages are listed by definition order.


Previous: , Up: Packages   [Contents][Index]

4.1 org.shirakumo.alloy.uax-14

Source

package.lisp.

Use List

common-lisp.

Public Interface
Internals

5 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


Next: , Previous: , Up: Definitions   [Contents][Index]

5.1 Public Interface


5.1.1 Special variables

Special Variable: *line-break-database-file*

Variable containing the absolute path of the line break database file.

See LOAD-DATABASES
See COMPILE-DATABASES

Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Special Variable: *pair-table-file*

Variable containing the absolute path of the pair table file.

See LOAD-DATABASES
See COMPILE-DATABASES

Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.


5.1.2 Ordinary functions

Function: break-string (string &optional mandatory-only breaker)

Returns a list of all the pieces of the string, broken.

If MANDATORY-ONLY is T, the string is only split at mandatory line break opportunities, otherwise it is split at every opportunity.

See MAKE-BREAKER
See NEXT-BREAK

Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Function: compile-databases ()

Compiles the database files from their sources.

This will load an optional part of the system and compile the database files to an efficient byte representation. If the compilation is successful, LOAD-DATABASES is called automatically.

See *LINE-BREAK-DATABASE-FILE*
See *PAIR-TABLE-FILE*
See LOAD-DATABASES

Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: list-breaks (string &optional breaker)

Returns a list of all line break opportunities in the string.

The list has the following form:

LIST ::= ENTRY+
ENTRY ::= (position mandatory)

This is equivalent to constructing a breaker and collecting the values of NEXT-BREAK in a loop.

See MAKE-BREAKER
See NEXT-BREAK

Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Function: load-databases ()

Loads the databases from their files into memory.

If one of the files is missing, a warning of type NO-DATABASE-FILES is signalled. If the loading succeeds, T is returned.

See *LINE-BREAK-DATABASE-FILE*
See *PAIR-TABLE-FILE*
See NO-DATABASE-FILES

Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: make-breaker (string &optional breaker)

Returns a breaker that can find line break opportunities in the given string.

If the optional breaker argument is supplied, the supplied breaker is modified and reset to work with the new string instead. This allows
you to re-use a breaker.

Note that while you may pass a non-simple string, modifying this
string without resetting any breaker using it will result in undefined behaviour.

See BREAKER

Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Function: next-break (breaker)

Returns the next line breaking opportunity of the breaker, if any.

Returns two values:

POSITION — The character index in the string at which the break is located, or NIL if no further breaks are possible. MANDATORY — Whether the break must be made at this location.

Note that there is always in the very least one break opportunity, namely at the end of the string. However, after consuming this break opportunity, NEXT-BREAK will return NIL.

Note that you may have to insert additional line breaks as required by the layout constraints.

See BREAKER

Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.


5.1.3 Standalone methods

Method: print-object ((breaker breaker) stream)
Source

uax-14.lisp.


5.1.4 Conditions

Condition: no-database-files

Warning signalled when LOAD-DATABASES is called and the files are not present.

Two restarts must be active when this condition is signalled:

COMPILE — Call COMPILE-DATABASES
ABORT — Abort loading the databases, leaving them at their
previous state.

See LOAD-DATABASES

Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Direct superclasses

warning.


Previous: , Up: Public Interface   [Contents][Index]

5.1.5 Structures

Structure: breaker

Contains line breaking state.

An instance of this is only useful for passing to MAKE-BREAKER and NEXT-BREAK. It contains internal state that manages the line breaking algorithm.

See MAKE-BREAKER
See NEXT-BREAK

Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Direct superclasses

structure-object.

Direct methods

print-object.

Direct slots
Slot: string
Package

common-lisp.

Type

string

Readers

breaker-string.

Writers

(setf breaker-string).

Slot: pos
Type

org.shirakumo.alloy.uax-14::idx

Initform

0

Readers

breaker-pos.

Writers

(setf breaker-pos).

Slot: last-pos
Type

org.shirakumo.alloy.uax-14::idx

Initform

0

Readers

breaker-last-pos.

Writers

(setf breaker-last-pos).

Slot: cur-class
Type

(unsigned-byte 8)

Initform

0

Readers

breaker-cur-class.

Writers

(setf breaker-cur-class).

Slot: next-class
Type

(unsigned-byte 8)

Initform

0

Readers

breaker-next-class.

Writers

(setf breaker-next-class).

Slot: lb8a
Type

boolean

Readers

breaker-lb8a.

Writers

(setf breaker-lb8a).

Slot: lb21a
Type

boolean

Readers

breaker-lb21a.

Writers

(setf breaker-lb21a).

Slot: lb30a
Type

org.shirakumo.alloy.uax-14::idx

Initform

0

Readers

breaker-lb30a.

Writers

(setf breaker-lb30a).


5.2 Internals


Next: , Previous: , Up: Internals   [Contents][Index]

5.2.1 Special variables

Special Variable: *here*
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.


5.2.2 Macros

Macro: defglobal (name value)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.


Next: , Previous: , Up: Internals   [Contents][Index]

5.2.3 Compiler macros

Compiler Macro: pair-id (pair)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Compiler Macro: type-id (type)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.


Next: , Previous: , Up: Internals   [Contents][Index]

5.2.4 Ordinary functions

Function: %make-breaker (string)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Reader: breaker-cur-class (instance)
Writer: (setf breaker-cur-class) (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Target Slot

cur-class.

Reader: breaker-last-pos (instance)
Writer: (setf breaker-last-pos) (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Target Slot

last-pos.

Reader: breaker-lb21a (instance)
Writer: (setf breaker-lb21a) (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Target Slot

lb21a.

Reader: breaker-lb30a (instance)
Writer: (setf breaker-lb30a) (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Target Slot

lb30a.

Reader: breaker-lb8a (instance)
Writer: (setf breaker-lb8a) (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Target Slot

lb8a.

Reader: breaker-next-class (instance)
Writer: (setf breaker-next-class) (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Target Slot

next-class.

Function: breaker-p (object)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Reader: breaker-pos (instance)
Writer: (setf breaker-pos) (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Target Slot

pos.

Reader: breaker-string (instance)
Writer: (setf breaker-string) (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Target Slot

string.

Function: char-line-break-type (char)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: code-point-at (string start)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Function: copy-breaker (instance)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Function: handle-simple-break (next-class cur-class)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Function: line-break-id (id)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: load-line-break-database (&optional source)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: load-pair-table (&optional source)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: normalize-break-id (id)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Function: normalize-first-break (id)
Package

org.shirakumo.alloy.uax-14.

Source

uax-14.lisp.

Function: pair-id (pair)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: pair-type (b a)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: pair-type-id (b a)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Function: type-id (type)
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.


5.2.5 Types

Type: code ()
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.

Type: idx ()
Package

org.shirakumo.alloy.uax-14.

Source

database.lisp.


Appendix A Indexes


Next: , Previous: , Up: Indexes   [Contents][Index]

A.1 Concepts


Next: , Previous: , Up: Indexes   [Contents][Index]

A.2 Functions

Jump to:   %   (  
B   C   D   F   H   L   M   N   P   T  
Index Entry  Section

%
%make-breaker: Private ordinary functions

(
(setf breaker-cur-class): Private ordinary functions
(setf breaker-last-pos): Private ordinary functions
(setf breaker-lb21a): Private ordinary functions
(setf breaker-lb30a): Private ordinary functions
(setf breaker-lb8a): Private ordinary functions
(setf breaker-next-class): Private ordinary functions
(setf breaker-pos): Private ordinary functions
(setf breaker-string): Private ordinary functions

B
break-string: Public ordinary functions
breaker-cur-class: Private ordinary functions
breaker-last-pos: Private ordinary functions
breaker-lb21a: Private ordinary functions
breaker-lb30a: Private ordinary functions
breaker-lb8a: Private ordinary functions
breaker-next-class: Private ordinary functions
breaker-p: Private ordinary functions
breaker-pos: Private ordinary functions
breaker-string: Private ordinary functions

C
char-line-break-type: Private ordinary functions
code-point-at: Private ordinary functions
compile-databases: Public ordinary functions
Compiler Macro, pair-id: Private compiler macros
Compiler Macro, type-id: Private compiler macros
copy-breaker: Private ordinary functions

D
defglobal: Private macros

F
Function, %make-breaker: Private ordinary functions
Function, (setf breaker-cur-class): Private ordinary functions
Function, (setf breaker-last-pos): Private ordinary functions
Function, (setf breaker-lb21a): Private ordinary functions
Function, (setf breaker-lb30a): Private ordinary functions
Function, (setf breaker-lb8a): Private ordinary functions
Function, (setf breaker-next-class): Private ordinary functions
Function, (setf breaker-pos): Private ordinary functions
Function, (setf breaker-string): Private ordinary functions
Function, break-string: Public ordinary functions
Function, breaker-cur-class: Private ordinary functions
Function, breaker-last-pos: Private ordinary functions
Function, breaker-lb21a: Private ordinary functions
Function, breaker-lb30a: Private ordinary functions
Function, breaker-lb8a: Private ordinary functions
Function, breaker-next-class: Private ordinary functions
Function, breaker-p: Private ordinary functions
Function, breaker-pos: Private ordinary functions
Function, breaker-string: Private ordinary functions
Function, char-line-break-type: Private ordinary functions
Function, code-point-at: Private ordinary functions
Function, compile-databases: Public ordinary functions
Function, copy-breaker: Private ordinary functions
Function, handle-simple-break: Private ordinary functions
Function, line-break-id: Private ordinary functions
Function, list-breaks: Public ordinary functions
Function, load-databases: Public ordinary functions
Function, load-line-break-database: Private ordinary functions
Function, load-pair-table: Private ordinary functions
Function, make-breaker: Public ordinary functions
Function, next-break: Public ordinary functions
Function, normalize-break-id: Private ordinary functions
Function, normalize-first-break: Private ordinary functions
Function, pair-id: Private ordinary functions
Function, pair-type: Private ordinary functions
Function, pair-type-id: Private ordinary functions
Function, type-id: Private ordinary functions

H
handle-simple-break: Private ordinary functions

L
line-break-id: Private ordinary functions
list-breaks: Public ordinary functions
load-databases: Public ordinary functions
load-line-break-database: Private ordinary functions
load-pair-table: Private ordinary functions

M
Macro, defglobal: Private macros
make-breaker: Public ordinary functions
Method, print-object: Public standalone methods

N
next-break: Public ordinary functions
normalize-break-id: Private ordinary functions
normalize-first-break: Private ordinary functions

P
pair-id: Private compiler macros
pair-id: Private ordinary functions
pair-type: Private ordinary functions
pair-type-id: Private ordinary functions
print-object: Public standalone methods

T
type-id: Private compiler macros
type-id: Private ordinary functions

Jump to:   %   (  
B   C   D   F   H   L   M   N   P   T