The trivial-utf-8 Reference Manual

Next: , Previous: , Up: (dir)   [Contents][Index]

The trivial-utf-8 Reference Manual

This is the trivial-utf-8 Reference Manual, generated automatically by Declt version 4.0 beta 2 "William Riker" on Thu Sep 15 06:24:34 2022 GMT+0.

Table of Contents


1 Introduction

# Trivial UTF-8 Manual

###### \[in package TRIVIAL-UTF-8\]
## TRIVIAL-UTF-8 ASDF System

- Description: A small library for doing UTF-8-based input and output.
- Licence: ZLIB
- Author: Marijn Haverbeke 
- Maintainer: Gábor Melis 
- Homepage: [https://common-lisp.net/project/trivial-utf-8/](https://common-lisp.net/project/trivial-utf-8/)
- Bug tracker: [https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues](https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues)
- Source control: [GIT](https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8.git)

## Introduction

Trivial UTF-8 is a small library for doing UTF-8-based in- and
output on a Lisp implementation that already supports Unicode -
meaning CHAR-CODE and CODE-CHAR deal with Unicode character codes.

The rationale for the existence of this library is that while
Unicode-enabled implementations usually do provide some kind of
interface to dealing with character encodings, these are typically
not terribly flexible or uniform.

The [Babel][babel] library solves a similar problem while
understanding more encodings. Trivial UTF-8 was written before Babel
existed, but for new projects you might be better off going with
Babel. The one plus that Trivial UTF-8 has is that it doesn't depend
on any other libraries.

[babel]: https://common-lisp.net/project/babel/ 


## Links

Here is the [official repository][trivial-utf-8-repo] and the
[HTML documentation][trivial-utf-8-doc] for the latest version.

[trivial-utf-8-repo]: https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8 

[trivial-utf-8-doc]: http://melisgl.github.io/mgl-pax-world/trivial-utf-8-manual.html 


## Reference

- [function] UTF-8-BYTE-LENGTH STRING

    Calculate the amount of bytes needed to encode STRING.

- [function] STRING-TO-UTF-8-BYTES STRING &KEY NULL-TERMINATE

    Convert STRING into an array of unsigned bytes containing its UTF-8
    representation. If NULL-TERMINATE, add an extra 0 byte at the end.

- [function] UTF-8-GROUP-SIZE BYTE

    Determine the amount of bytes that are part of the character whose
    encoding starts with BYTE. May signal UTF-8-DECODING-ERROR.

- [function] UTF-8-BYTES-TO-STRING BYTES &KEY (START 0) (END (LENGTH BYTES))

    Convert the START, END subsequence of the array of BYTES containing
    UTF-8 encoded characters to a [STRING][type]. The element type of
    BYTES may be anything as long as it can be `COERCE`d into
    an `(UNSIGNED-BYTES 8)` array. May signal UTF-8-DECODING-ERROR.

- [function] READ-UTF-8-STRING INPUT &KEY NULL-TERMINATED STOP-AT-EOF (CHAR-LENGTH -1) (BYTE-LENGTH -1)

    Read UTF-8 encoded data from INPUT, a byte stream, and construct a
    string with the characters found. When NULL-TERMINATED is given,
    stop reading at a null character. If STOP-AT-EOF, then stop at
    END-OF-FILE without raising an error. The CHAR-LENGTH and
    BYTE-LENGTH parameters can be used to specify the max amount of
    characters or bytes to read, where -1 means no limit. May signal
    UTF-8-DECODING-ERROR.

- [condition] UTF-8-DECODING-ERROR SIMPLE-ERROR

* * *
###### \[generated by [MGL-PAX](https://github.com/melisgl/mgl-pax)\]


2 Systems

The main system appears first, followed by any subsystem dependency.


Previous: , Up: Systems   [Contents][Index]

2.1 trivial-utf-8

A small library for doing UTF-8-based input and output.

Maintainer

Gábor Melis <mega@retes.hu>

Author

Marijn Haverbeke <marijnh@gmail.com>

Home Page

https://common-lisp.net/project/trivial-utf-8/

Source Control

(GIT https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8.git)

Bug Tracker

https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues

License

ZLIB

Source

trivial-utf-8.asd.

Child Component

trivial-utf-8.lisp (file).


3 Files

Files are sorted by type and then listed depth-first from the systems components trees.


Previous: , Up: Files   [Contents][Index]

3.1 Lisp


3.1.1 trivial-utf-8/trivial-utf-8.asd

Source

trivial-utf-8.asd.

Parent Component

trivial-utf-8 (system).

ASDF Systems

trivial-utf-8.


3.1.2 trivial-utf-8/trivial-utf-8.lisp

Source

trivial-utf-8.asd.

Parent Component

trivial-utf-8 (system).

Packages

trivial-utf-8.

Public Interface
Internals

4 Packages

Packages are listed by definition order.


Previous: , Up: Packages   [Contents][Index]

4.1 trivial-utf-8

Source

trivial-utf-8.lisp.

Use List

common-lisp.

Public Interface
Internals

5 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


Next: , Previous: , Up: Definitions   [Contents][Index]

5.1 Public Interface


5.1.1 Ordinary functions

Function: read-utf-8-string (input &key null-terminated stop-at-eof char-length byte-length)

Read UTF-8 encoded data from INPUT, a byte stream, and construct a string with the characters found. When NULL-TERMINATED is given, stop reading at a null character. If STOP-AT-EOF, then stop at END-OF-FILE without raising an error. The CHAR-LENGTH and BYTE-LENGTH parameters can be used to specify the max amount of characters or bytes to read, where -1 means no limit. May signal UTF-8-DECODING-ERROR.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.

Function: string-to-utf-8-bytes (string &key null-terminate)

Convert STRING into an array of unsigned bytes containing its UTF-8 representation. If NULL-TERMINATE, add an extra 0 byte at the end.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.

Function: utf-8-byte-length (string)

Calculate the amount of bytes needed to encode STRING.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.

Function: utf-8-bytes-to-string (bytes &key start end)

Convert the START, END subsequence of the array of BYTES containing UTF-8 encoded characters to a [STRING][type]. The element type of BYTES may be anything as long as it can be ‘COERCE‘d into
an ‘(UNSIGNED-BYTES 8)‘ array. May signal UTF-8-DECODING-ERROR.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.

Function: utf-8-group-size (byte)

Determine the amount of bytes that are part of the character whose encoding starts with BYTE. May signal UTF-8-DECODING-ERROR.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.

Function: write-utf-8-bytes (string byte-stream &key null-terminate)

Write STRING to BYTE-STREAM, encoding it as UTF-8. If NULL-TERMINATE, write an extra 0 byte at the end.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.


5.1.2 Conditions

Condition: utf-8-decoding-error
Package

trivial-utf-8.

Source

trivial-utf-8.lisp.

Direct superclasses

simple-error.

Direct slots
Slot: message
Initargs

:message

Slot: byte
Package

common-lisp.

Initform

(quote nil)

Initargs

:byte


5.2 Internals


Next: , Previous: , Up: Internals   [Contents][Index]

5.2.1 Special variables

Special Variable: *optimize*
Package

trivial-utf-8.

Source

trivial-utf-8.lisp.


5.2.2 Macros

Macro: as-utf-8-bytes (char writer)

Given the character CHAR, call the WRITER for every byte in the UTF-8 encoded form of that character. WRITER may a function or a macro.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.


Previous: , Up: Internals   [Contents][Index]

5.2.3 Ordinary functions

Function: get-utf-8-character (bytes group-size &optional start)

Extract the character from an array of BYTES encoded in GROUP-SIZE number of bytes from the START position. May signal UTF-8-DECODING-ERROR.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.

Function: utf-8-string-length (bytes &key start end)

Calculate the length of the string encoded by the subsequence of the array of BYTES bounded by START and END.

Package

trivial-utf-8.

Source

trivial-utf-8.lisp.


Appendix A Indexes


Next: , Previous: , Up: Indexes   [Contents][Index]

A.1 Concepts


Next: , Previous: , Up: Indexes   [Contents][Index]

A.3 Variables

Jump to:   *  
B   M   S  
Index Entry  Section

*
*optimize*: Private special variables

B
byte: Public conditions

M
message: Public conditions

S
Slot, byte: Public conditions
Slot, message: Public conditions
Special Variable, *optimize*: Private special variables

Jump to:   *  
B   M   S