The alexa Reference Manual

Table of Contents

Next: , Previous: , Up: (dir)   [Contents][Index]

The alexa Reference Manual

This is the alexa Reference Manual, version 2.1.1, generated automatically by Declt version 3.0 "Montgomery Scott" on Fri Jun 26 09:43:55 2020 GMT+0.


Next: , Previous: , Up: Top   [Contents][Index]

1 Introduction

ALEXA: A Lexical Analyzer Generator

Introduction

ALEXA is a tool similar to lex or flex for generating lexical analyzers. Unlike tools like lex, however, ALEXA defines a domain-specific language within your Lisp program, so you don't need to invoke a separate tool.

The alexa package exports a single macro: define-string-lexer. You can view the documentation for this macro using the standard Common Lisp facilities, e.g., describe or

(format t "~A"
        (documentation 'alexa:define-string-lexer
                       'function))

ALEXA may be used with parsing libraries such as cl-yacc.

Special Features

ALEXA has a few unique features to make lexical analysis easier. These include:

ALEXA has been written with efficiency in mind. Generated lexers will avoid consing in most cases, unless you use the $@ variable which expands into a SUBSEQ call on the string.

Example

Arithmetic Expression Tokenization

The following simple example shows how to tokenize simple arithmetic expressions. First, we define what a token is and how to make one.

(deftype token ()
  `(cons keyword t))

(defun tok (type &optional val)
  (cons type val))

Advanced applications may opt to store other information in their token data structure, such as token position in the string (which can be extracted with $< and $>).

Next, we define the lexer. We create two aliases :num and :name to make our lexical rules a little bit easier to read.

(alexa:define-string-lexer arith-lexer
  "Make a lexical analyzer for arithmetic expressions."
  ((:num "\\d+")
   (:name "[A-Za-z][A-Za-z0-9_]*"))
  ("pi"       (return (tok :number pi)))
  ("{{NAME}}" (return (tok :variable (intern $@))))
  ("{{NUM}}"  (return (tok :number (parse-integer $@))))
  ("[+*/-]"   (return (tok :operator (intern $@ 'keyword))))
  ("\\("      (return (tok :left-paren)))
  ("\\)"      (return (tok :right-paren)))
  ("\\s+"     nil))

To use the lexer, we make one by calling arith-lexer on the string being lexically analyzed. To get the next token, we just funcall our lexer. We can make a small helper function to lex an entire string until the lexer has been exhausted.

(defun lex-line (string)
  (loop :with lexer := (arith-lexer string)
        :for tok := (funcall lexer)
        :while tok
          :collect tok))

Calling lex-line on arithmetic expressions now gives us our expected results.

> (lex-line "2*(x+1)/z")
((:NUMBER . 2)
 (:OPERATOR . :*)
 (:LEFT-PAREN)
 (:VARIABLE . |x|)
 (:OPERATOR . :+)
 (:NUMBER . 1)
 (:RIGHT-PAREN)
 (:OPERATOR . :/)
 (:VARIABLE . |z|))

> (lex-line "1/(1/R_1 + 1/R_2)")
((:NUMBER . 1)
 (:OPERATOR . :/)
 (:LEFT-PAREN)
 (:NUMBER . 1)
 (:OPERATOR . :/)
 (:VARIABLE . R_1)
 (:OPERATOR . :+)
 (:NUMBER . 1)
 (:OPERATOR . :/)
 (:VARIABLE . R_2)
 (:RIGHT-PAREN))

In our lexer, we have pi as the rule for one of the lexical actions. ALEXA will generate code to correctly identify it. ALEXA follows the rule of matching the longest string possible, breaking ties depending on the order of the rules stated in the lexer's definition. In the following example, we have both pi and pip. Even though two rules match pi, the first one breaks the tie and we correctly resolve it to 3.141.... Similarly, the lexer matches pip against the correct rule because it was the longest match.

> (lex-line "pi+2*pip")

((:NUMBER . 3.141592653589793d0)
 (:OPERATOR . :+)
 (:NUMBER . 2)
 (:OPERATOR . :*)
 (:VARIABLE . |pip|))

Note that ALEXA has no notion of "specificity". If the rules for pi and {{NAME}} were flipped, then {{NAME}} would always supersede pi.

Eager Matching

In our arithmetic expression lexer, we can optimize the lexing process. If we match against any of the latter six rules, then we know the match is unambiguous and we can fire the rule. ALEXA is very conservative, and has no idea about this. You can tell ALEXA that this is the case by declaring said rules as EAGER. In addition, we can re-order the rules favorably. We get the equivalent lexer:

(alexa:define-string-lexer arith-lexer-opt
  "Make a lexical analyzer for arithmetic expressions."
  ((:num "\\d+")
   (:name "[A-Za-z][A-Za-z0-9_]*"))
  ((eager "{{NUM}}")  (return (tok :number (parse-integer $@))))
  ((eager "[+*/-]")   (return (tok :operator (intern $@ 'keyword))))
  ((eager "\\(")      (return (tok :left-paren)))
  ((eager "\\)")      (return (tok :right-paren)))
  ((eager "\\s+")     nil)
  ("pi"               (return (tok :number pi)))
  ((eager "{{NAME}}") (return (tok :variable (intern $@)))))

As a microbenchmark, we generate random strings of tokenizable content.

(defun lex (lexer)
  (loop :for tok := (funcall lexer)
        :while tok
          :collect tok))

(defun random-word ()
  (substitute #\_ #\- (format nil "~R" (random 1000))))

(defun random-expr (n)
  (with-output-to-string (s)
    (loop :repeat n :do
      (alexandria:whichever
       (format s "~D" (random most-positive-fixnum))
       (format s "~C" (alexandria:random-elt "+-*/"))
       (write-string "pi " s)
       (write-string (random-word) s)
       (format s "(")
       (format s ")")
       (loop :repeat (random 8) :do (write-char #\Space s))))))

(defun test (&optional (string (random-expr 1000000)))
  (format t "Length of string: ~D~%" (length string))
  (let ((l (arith-lexer string))
        (lo (arith-lexer-opt string)))
    (sb-ext:gc :full t)
    (time (lex l))
    (sb-ext:gc :full t)
    (time (lex lo))
    nil))

Calling test on SBCL 1.4.7.192-8b8c9bcc0 gives results like

Length of string: 7037267
Evaluation took:
  2.276 seconds of real time
  2.276657 seconds of total run time (2.274266 user, 0.002391 system)
  100.04% CPU
  6,357,950,986 processor cycles
  0 bytes consed

Evaluation took:
  1.571 seconds of real time
  1.572094 seconds of total run time (1.570309 user, 0.001785 system)
  100.06% CPU
  4,389,463,410 processor cycles
  0 bytes consed

In this example, we reduce the runtime by about 31%.

Contributing

If you have suggestions or questions, please file an issue in GitHub. If you have any bug fixes or improvements, please make a pull request.

License and Copyright

This software is released under the BSD 3-clause license. See LICENSE.txt for details.

Copyright © 2016–2018 Rigetti Comput


Next: , Previous: , Up: Top   [Contents][Index]

2 Systems

The main system appears first, followed by any subsystem dependency.


Previous: , Up: Systems   [Contents][Index]

2.1 alexa

Author

Robert Smith <robert@rigetti.com>

License

BSD 3-clause (See LICENSE.txt)

Description

A lexical analyzer generator.

Version

2.1.1

Dependencies
Source

alexa.asd (file)

Components

Next: , Previous: , Up: Top   [Contents][Index]

3 Modules

Modules are listed depth-first from the system components tree.


Previous: , Up: Modules   [Contents][Index]

3.1 alexa/src

Dependency

license.txt (file)

Parent

alexa (system)

Location

src/

Components

Next: , Previous: , Up: Top   [Contents][Index]

4 Files

Files are sorted by type and then listed depth-first from the systems components trees.


Next: , Previous: , Up: Files   [Contents][Index]

4.1 Lisp


Next: , Previous: , Up: Lisp files   [Contents][Index]

4.1.1 alexa.asd

Location

alexa.asd

Systems

alexa (system)


Next: , Previous: , Up: Lisp files   [Contents][Index]

4.1.2 alexa/src/package.lisp

Parent

src (module)

Location

src/package.lisp

Packages

Previous: , Up: Lisp files   [Contents][Index]

4.1.3 alexa/src/alexa.lisp

Dependency

package.lisp (file)

Parent

src (module)

Location

src/alexa.lisp

Exported Definitions
Internal Definitions

Previous: , Up: Files   [Contents][Index]

4.2 Static


Previous: , Up: Static files   [Contents][Index]

4.2.1 alexa/LICENSE.txt

Parent

alexa (system)

Location

LICENSE.txt


Next: , Previous: , Up: Top   [Contents][Index]

5 Packages

Packages are listed by definition order.


Next: , Previous: , Up: Packages   [Contents][Index]

5.1 alexa-internal

A package to stash away generated symbols.

Source

package.lisp (file)


Previous: , Up: Packages   [Contents][Index]

5.2 alexa

A lexical analyzer generator.

Source

package.lisp (file)

Use List

common-lisp

Exported Definitions
Internal Definitions

Next: , Previous: , Up: Top   [Contents][Index]

6 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


Next: , Previous: , Up: Definitions   [Contents][Index]

6.1 Exported definitions


Next: , Previous: , Up: Exported definitions   [Contents][Index]

6.1.1 Macros

Macro: define-string-lexer NAME &body BODY

Define a lexical analyzer named NAME.

Defining a lexical analyzer is actually defining a function named NAME whose lambda list is

(STRING &KEY (START 0) (END (LENGTH STRING)))

The STRING is the string to be analyzed, and START/END are the starting and ending positions to be looked at. Calling the function named NAME will produce a closure which, if called repeatedly, will produce results according to the lexical rules defined. When the input string is exhausted, NIL is returned, and the string will be unbound within the closure to allow garbage collection.

If STRING is not a SIMPLE-STRING, then it will be coerced into one (which will cons).

The lexer will fire the action which had the longest match, and ties are broken based on the order of the actions (earlier ones are preferred). This rule can be selectively disabled for a particular action if one declares it to be a short circuiting (see below).

Signals LEXER-MATCH-ERROR as a continuable error if no match was found.

The syntax of BODY is:

<doc string>?
(<alias definition>*)
<lexical action>*

An <alias definition> is a list

(<keyword> <regex string>)

The name of the keyword may be used in the <lexical action> regexes. A <lexical action> is a list

(<pattern spec> &body <code>)

A <pattern spec> has the following grammar:

<pattern spec> := <regex string>
| (EAGER <regex string>)

The EAGER option is defined below.

The <regex string> is matched against the input string greedily and in the order they are listed in the BODY. When the longest match is found, assuming no EAGER declarations, it will execute <code>. Within <code>, the following symbols are bound:

$1, $2, ..., $n: String match on (unnamed) register n
$NAME : String match on named register (?<NAME>...)
$@ : Entire string match.
$<, $> : Start and end position of match.

Generally, <code> should explicitly RETURN some token object for a semantic analyzer to examine. Currently, only a single value can be returned. (All other values will be ignored.) An explicit RETURN is needed. If no RETURN is provided, then the lexer will throw away the match and move on as if the lexer were called again. (This is most often used to ignore matches, like whitespace.)

The <regex string> of the lexical action may use the names of the symbols defined in the <alias definition> forms. For example, given the alias definitions

((:int "\\d+")
(:ident "[a-z]+"))

one can use {{INT}} and {{IDENT}} within the <regex string>s of the <lexical action>.

If the <pattern spec> uses EAGER, then the lexical action will "short circuit". The EAGER option states that if a match occurs on this pattern, <code> should be executed immediately, disregarding the "longest match" rule. This can be used for certain kinds of optimizations.

Package

alexa

Source

alexa.lisp (file)


Previous: , Up: Exported definitions   [Contents][Index]

6.1.2 Conditions

Condition: lexer-match-error ()

Error to be signaled if the lexer didn’t find a match.

Package

alexa

Source

alexa.lisp (file)

Direct superclasses

simple-error (condition)


Previous: , Up: Definitions   [Contents][Index]

6.2 Internal definitions


Next: , Previous: , Up: Internal definitions   [Contents][Index]

6.2.1 Macros

Macro: let-lazy BINDINGS &body BODY
Package

alexa

Source

alexa.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

6.2.2 Functions

Function: copy-pattern INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: empty-match-error REGEX
Package

alexa

Source

alexa.lisp (file)

Function: extract-registers PARSE-TREE
Package

alexa

Source

alexa.lisp (file)

Function: fill-in-aliases ALIASES REGEX
Package

alexa

Source

alexa.lisp (file)

Function: generate-pattern-execution-code PAT STRING-VAR MATCH-START-VAR MATCH-END-VAR REG-STARTS-VAR REG-ENDS-VAR
Package

alexa

Source

alexa.lisp (file)

Function: generate-pattern-match-code PAT EXECUTE-TAG STRING-VAR START-VAR END-VAR MATCH-START-VAR MATCH-END-VAR REG-STARTS-VAR REG-ENDS-VAR MAX-MATCH-LENGTH-VAR MATCH-RULE-INDEX-VAR I
Package

alexa

Source

alexa.lisp (file)

Function: make-pattern &key (REGEX REGEX) (SHORT-CIRCUIT-P SHORT-CIRCUIT-P) (PARSE-TREE PARSE-TREE) (NUM-REGISTERS NUM-REGISTERS) (REGISTER-NAMES REGISTER-NAMES) (SCANNER-NAME SCANNER-NAME) (FIRE-NAME FIRE-NAME) (CODE CODE)
Package

alexa

Source

alexa.lisp (file)

Function: pattern-code INSTANCE
Function: (setf pattern-code) VALUE INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: pattern-fire-name INSTANCE
Function: (setf pattern-fire-name) VALUE INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: pattern-num-registers INSTANCE
Function: (setf pattern-num-registers) VALUE INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: pattern-p OBJECT
Package

alexa

Source

alexa.lisp (file)

Function: pattern-parse-tree INSTANCE
Function: (setf pattern-parse-tree) VALUE INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: pattern-regex INSTANCE
Function: (setf pattern-regex) VALUE INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: pattern-register-names INSTANCE
Function: (setf pattern-register-names) VALUE INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: pattern-register-variables PAT PACKAGE
Package

alexa

Source

alexa.lisp (file)

Function: pattern-scanner-name INSTANCE
Function: (setf pattern-scanner-name) VALUE INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: pattern-short-circuit-p INSTANCE
Function: (setf pattern-short-circuit-p) VALUE INSTANCE
Package

alexa

Source

alexa.lisp (file)

Function: walk-tree F TREE
Package

alexa

Source

alexa.lisp (file)


Next: , Previous: , Up: Internal definitions   [Contents][Index]

6.2.3 Structures

Structure: pattern ()
Package

alexa

Source

alexa.lisp (file)

Direct superclasses

structure-object (structure)

Direct slots
Slot: regex
Type

string

Readers

pattern-regex (function)

Writers

(setf pattern-regex) (function)

Slot: short-circuit-p
Type

boolean

Readers

pattern-short-circuit-p (function)

Writers

(setf pattern-short-circuit-p) (function)

Slot: parse-tree
Readers

pattern-parse-tree (function)

Writers

(setf pattern-parse-tree) (function)

Slot: num-registers
Type

integer

Readers

pattern-num-registers (function)

Writers

(setf pattern-num-registers) (function)

Slot: register-names
Type

list

Readers

pattern-register-names (function)

Writers

(setf pattern-register-names) (function)

Slot: scanner-name
Type

symbol

Readers

pattern-scanner-name (function)

Writers

(setf pattern-scanner-name) (function)

Slot: fire-name
Type

symbol

Readers

pattern-fire-name (function)

Writers

(setf pattern-fire-name) (function)

Slot: code
Readers

pattern-code (function)

Writers

(setf pattern-code) (function)


Previous: , Up: Internal definitions   [Contents][Index]

6.2.4 Types

Type: non-negative-fixnum ()
Package

alexa

Source

alexa.lisp (file)


Previous: , Up: Top   [Contents][Index]

Appendix A Indexes


Next: , Previous: , Up: Indexes   [Contents][Index]

A.1 Concepts

Jump to:   A   F   L   M   S  
Index Entry  Section

A
alexa.asd: The alexa․asd file
alexa/LICENSE.txt: The alexa/license․txt file
alexa/src: The alexa/src module
alexa/src/alexa.lisp: The alexa/src/alexa․lisp file
alexa/src/package.lisp: The alexa/src/package․lisp file

F
File, Lisp, alexa.asd: The alexa․asd file
File, Lisp, alexa/src/alexa.lisp: The alexa/src/alexa․lisp file
File, Lisp, alexa/src/package.lisp: The alexa/src/package․lisp file
File, static, alexa/LICENSE.txt: The alexa/license․txt file

L
Lisp File, alexa.asd: The alexa․asd file
Lisp File, alexa/src/alexa.lisp: The alexa/src/alexa․lisp file
Lisp File, alexa/src/package.lisp: The alexa/src/package․lisp file

M
Module, alexa/src: The alexa/src module

S
Static File, alexa/LICENSE.txt: The alexa/license․txt file

Jump to:   A   F   L   M   S  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.2 Functions

Jump to:   (  
C   D   E   F   G   L   M   P   W  
Index Entry  Section

(
(setf pattern-code): Internal functions
(setf pattern-fire-name): Internal functions
(setf pattern-num-registers): Internal functions
(setf pattern-parse-tree): Internal functions
(setf pattern-regex): Internal functions
(setf pattern-register-names): Internal functions
(setf pattern-scanner-name): Internal functions
(setf pattern-short-circuit-p): Internal functions

C
copy-pattern: Internal functions

D
define-string-lexer: Exported macros

E
empty-match-error: Internal functions
extract-registers: Internal functions

F
fill-in-aliases: Internal functions
Function, (setf pattern-code): Internal functions
Function, (setf pattern-fire-name): Internal functions
Function, (setf pattern-num-registers): Internal functions
Function, (setf pattern-parse-tree): Internal functions
Function, (setf pattern-regex): Internal functions
Function, (setf pattern-register-names): Internal functions
Function, (setf pattern-scanner-name): Internal functions
Function, (setf pattern-short-circuit-p): Internal functions
Function, copy-pattern: Internal functions
Function, empty-match-error: Internal functions
Function, extract-registers: Internal functions
Function, fill-in-aliases: Internal functions
Function, generate-pattern-execution-code: Internal functions
Function, generate-pattern-match-code: Internal functions
Function, make-pattern: Internal functions
Function, pattern-code: Internal functions
Function, pattern-fire-name: Internal functions
Function, pattern-num-registers: Internal functions
Function, pattern-p: Internal functions
Function, pattern-parse-tree: Internal functions
Function, pattern-regex: Internal functions
Function, pattern-register-names: Internal functions
Function, pattern-register-variables: Internal functions
Function, pattern-scanner-name: Internal functions
Function, pattern-short-circuit-p: Internal functions
Function, walk-tree: Internal functions

G
generate-pattern-execution-code: Internal functions
generate-pattern-match-code: Internal functions

L
let-lazy: Internal macros

M
Macro, define-string-lexer: Exported macros
Macro, let-lazy: Internal macros
make-pattern: Internal functions

P
pattern-code: Internal functions
pattern-fire-name: Internal functions
pattern-num-registers: Internal functions
pattern-p: Internal functions
pattern-parse-tree: Internal functions
pattern-regex: Internal functions
pattern-register-names: Internal functions
pattern-register-variables: Internal functions
pattern-scanner-name: Internal functions
pattern-short-circuit-p: Internal functions

W
walk-tree: Internal functions

Jump to:   (  
C   D   E   F   G   L   M   P   W  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.3 Variables

Jump to:   C   F   N   P   R   S  
Index Entry  Section

C
code: Internal structures

F
fire-name: Internal structures

N
num-registers: Internal structures

P
parse-tree: Internal structures

R
regex: Internal structures
register-names: Internal structures

S
scanner-name: Internal structures
short-circuit-p: Internal structures
Slot, code: Internal structures
Slot, fire-name: Internal structures
Slot, num-registers: Internal structures
Slot, parse-tree: Internal structures
Slot, regex: Internal structures
Slot, register-names: Internal structures
Slot, scanner-name: Internal structures
Slot, short-circuit-p: Internal structures

Jump to:   C   F   N   P   R   S  

Previous: , Up: Indexes   [Contents][Index]

A.4 Data types

Jump to:   A   C   L   N   P   S   T  
Index Entry  Section

A
alexa: The alexa system
alexa: The alexa package
alexa-internal: The alexa-internal package

C
Condition, lexer-match-error: Exported conditions

L
lexer-match-error: Exported conditions

N
non-negative-fixnum: Internal types

P
Package, alexa: The alexa package
Package, alexa-internal: The alexa-internal package
pattern: Internal structures

S
Structure, pattern: Internal structures
System, alexa: The alexa system

T
Type, non-negative-fixnum: Internal types

Jump to:   A   C   L   N   P   S   T