Next: Introduction, Previous: (dir), Up: (dir) [Contents][Index]
This is the alexa Reference Manual, version 2.1.1, generated automatically by Declt version 3.0 "Montgomery Scott" on Tue Dec 22 11:37:59 2020 GMT+0.
• Introduction | What alexa is all about | |
• Systems | The systems documentation | |
• Modules | The modules documentation | |
• Files | The files documentation | |
• Packages | The packages documentation | |
• Definitions | The symbols documentation | |
• Indexes | Concepts, functions, variables and data types |
ALEXA is a tool similar to lex
or flex
for generating lexical analyzers. Unlike tools like lex
, however, ALEXA defines a domain-specific language within your Lisp program, so you don't need to invoke a separate tool.
The alexa
package exports a single macro: define-string-lexer
. You can view the documentation for this macro using the standard Common Lisp facilities, e.g., describe
or
(format t "~A"
(documentation 'alexa:define-string-lexer
'function))
ALEXA may be used with parsing libraries such as cl-yacc
.
ALEXA has a few unique features to make lexical analysis easier. These include:
$<
and $>
to determine the start and end of your matches. The value of (- $> $<)
is equal to the length of the match.(?<DIGIT>\d)
will match a digit, and bind it in your Lisp code to the lexical variable $DIGIT
.[A-Za-z][A-Za-z0-9_]
for each identifier. You can instead define the alias (:ident "[A-Za-z][A-Za-z0-9_]")
and use {{IDENT}}
in your lexical rules.EAGER
.ALEXA has been written with efficiency in mind. Generated lexers will avoid consing in most cases, unless you use the $@
variable which expands into a SUBSEQ
call on the string.
The following simple example shows how to tokenize simple arithmetic expressions. First, we define what a token is and how to make one.
(deftype token ()
`(cons keyword t))
(defun tok (type &optional val)
(cons type val))
Advanced applications may opt to store other information in their token data structure, such as token position in the string (which can be extracted with $<
and $>
).
Next, we define the lexer. We create two aliases :num
and :name
to make our lexical rules a little bit easier to read.
(alexa:define-string-lexer arith-lexer
"Make a lexical analyzer for arithmetic expressions."
((:num "\\d+")
(:name "[A-Za-z][A-Za-z0-9_]*"))
("pi" (return (tok :number pi)))
("{{NAME}}" (return (tok :variable (intern $@))))
("{{NUM}}" (return (tok :number (parse-integer $@))))
("[+*/-]" (return (tok :operator (intern $@ 'keyword))))
("\\(" (return (tok :left-paren)))
("\\)" (return (tok :right-paren)))
("\\s+" nil))
To use the lexer, we make one by calling arith-lexer
on the string being lexically analyzed. To get the next token, we just funcall
our lexer. We can make a small helper function to lex an entire string until the lexer has been exhausted.
(defun lex-line (string)
(loop :with lexer := (arith-lexer string)
:for tok := (funcall lexer)
:while tok
:collect tok))
Calling lex-line
on arithmetic expressions now gives us our expected results.
> (lex-line "2*(x+1)/z")
((:NUMBER . 2)
(:OPERATOR . :*)
(:LEFT-PAREN)
(:VARIABLE . |x|)
(:OPERATOR . :+)
(:NUMBER . 1)
(:RIGHT-PAREN)
(:OPERATOR . :/)
(:VARIABLE . |z|))
> (lex-line "1/(1/R_1 + 1/R_2)")
((:NUMBER . 1)
(:OPERATOR . :/)
(:LEFT-PAREN)
(:NUMBER . 1)
(:OPERATOR . :/)
(:VARIABLE . R_1)
(:OPERATOR . :+)
(:NUMBER . 1)
(:OPERATOR . :/)
(:VARIABLE . R_2)
(:RIGHT-PAREN))
In our lexer, we have pi
as the rule for one of the lexical actions. ALEXA will generate code to correctly identify it. ALEXA follows the rule of matching the longest string possible, breaking ties depending on the order of the rules stated in the lexer's definition. In the following example, we have both pi
and pip
. Even though two rules match pi
, the first one breaks the tie and we correctly resolve it to 3.141...
. Similarly, the lexer matches pip
against the correct rule because it was the longest match.
> (lex-line "pi+2*pip")
((:NUMBER . 3.141592653589793d0)
(:OPERATOR . :+)
(:NUMBER . 2)
(:OPERATOR . :*)
(:VARIABLE . |pip|))
Note that ALEXA has no notion of "specificity". If the rules for pi
and {{NAME}}
were flipped, then {{NAME}}
would always supersede pi
.
In our arithmetic expression lexer, we can optimize the lexing process. If we match against any of the latter six rules, then we know the match is unambiguous and we can fire the rule. ALEXA is very conservative, and has no idea about this. You can tell ALEXA that this is the case by declaring said rules as EAGER
. In addition, we can re-order the rules favorably. We get the equivalent lexer:
(alexa:define-string-lexer arith-lexer-opt
"Make a lexical analyzer for arithmetic expressions."
((:num "\\d+")
(:name "[A-Za-z][A-Za-z0-9_]*"))
((eager "{{NUM}}") (return (tok :number (parse-integer $@))))
((eager "[+*/-]") (return (tok :operator (intern $@ 'keyword))))
((eager "\\(") (return (tok :left-paren)))
((eager "\\)") (return (tok :right-paren)))
((eager "\\s+") nil)
("pi" (return (tok :number pi)))
((eager "{{NAME}}") (return (tok :variable (intern $@)))))
As a microbenchmark, we generate random strings of tokenizable content.
(defun lex (lexer)
(loop :for tok := (funcall lexer)
:while tok
:collect tok))
(defun random-word ()
(substitute #\_ #\- (format nil "~R" (random 1000))))
(defun random-expr (n)
(with-output-to-string (s)
(loop :repeat n :do
(alexandria:whichever
(format s "~D" (random most-positive-fixnum))
(format s "~C" (alexandria:random-elt "+-*/"))
(write-string "pi " s)
(write-string (random-word) s)
(format s "(")
(format s ")")
(loop :repeat (random 8) :do (write-char #\Space s))))))
(defun test (&optional (string (random-expr 1000000)))
(format t "Length of string: ~D~%" (length string))
(let ((l (arith-lexer string))
(lo (arith-lexer-opt string)))
(sb-ext:gc :full t)
(time (lex l))
(sb-ext:gc :full t)
(time (lex lo))
nil))
Calling test
on SBCL 1.4.7.192-8b8c9bcc0
gives results like
Length of string: 7037267
Evaluation took:
2.276 seconds of real time
2.276657 seconds of total run time (2.274266 user, 0.002391 system)
100.04% CPU
6,357,950,986 processor cycles
0 bytes consed
Evaluation took:
1.571 seconds of real time
1.572094 seconds of total run time (1.570309 user, 0.001785 system)
100.06% CPU
4,389,463,410 processor cycles
0 bytes consed
In this example, we reduce the runtime by about 31%.
If you have suggestions or questions, please file an issue in GitHub. If you have any bug fixes or improvements, please make a pull request.
This software is released under the BSD 3-clause license. See LICENSE.txt
for details.
Copyright © 2016–2018 Rigetti Comput
Next: Modules, Previous: Introduction, Up: Top [Contents][Index]
The main system appears first, followed by any subsystem dependency.
• The alexa system |
Robert Smith <robert@rigetti.com>
BSD 3-clause (See LICENSE.txt)
A lexical analyzer generator.
2.1.1
alexa.asd (file)
Modules are listed depth-first from the system components tree.
• The alexa/src module |
license.txt (file)
alexa (system)
src/
Files are sorted by type and then listed depth-first from the systems components trees.
• Lisp files | ||
• Static files |
Next: Static files, Previous: Files, Up: Files [Contents][Index]
• The alexa.asd file | ||
• The alexa/src/package.lisp file | ||
• The alexa/src/alexa.lisp file |
Next: The alexa/src/package․lisp file, Previous: Lisp files, Up: Lisp files [Contents][Index]
alexa.asd
alexa (system)
Next: The alexa/src/alexa․lisp file, Previous: The alexa․asd file, Up: Lisp files [Contents][Index]
src (module)
src/package.lisp
Previous: The alexa/src/package․lisp file, Up: Lisp files [Contents][Index]
package.lisp (file)
src (module)
src/alexa.lisp
Previous: Lisp files, Up: Files [Contents][Index]
• The alexa/license.txt file |
Previous: Static files, Up: Static files [Contents][Index]
alexa (system)
LICENSE.txt
Next: Definitions, Previous: Files, Up: Top [Contents][Index]
Packages are listed by definition order.
• The alexa-internal package | ||
• The alexa package |
Next: The alexa package, Previous: Packages, Up: Packages [Contents][Index]
A package to stash away generated symbols.
package.lisp (file)
Previous: The alexa-internal package, Up: Packages [Contents][Index]
A lexical analyzer generator.
package.lisp (file)
common-lisp
Definitions are sorted by export status, category, package, and then by lexicographic order.
• Exported definitions | ||
• Internal definitions |
Next: Internal definitions, Previous: Definitions, Up: Definitions [Contents][Index]
• Exported macros | ||
• Exported conditions |
Next: Exported conditions, Previous: Exported definitions, Up: Exported definitions [Contents][Index]
Define a lexical analyzer named NAME.
Defining a lexical analyzer is actually defining a function named NAME whose lambda list is
(STRING &KEY (START 0) (END (LENGTH STRING)))
The STRING is the string to be analyzed, and START/END are the starting and ending positions to be looked at. Calling the function named NAME will produce a closure which, if called repeatedly, will produce results according to the lexical rules defined. When the input string is exhausted, NIL is returned, and the string will be unbound within the closure to allow garbage collection.
If STRING is not a SIMPLE-STRING, then it will be coerced into one (which will cons).
The lexer will fire the action which had the longest match, and ties are broken based on the order of the actions (earlier ones are preferred). This rule can be selectively disabled for a particular action if one declares it to be a short circuiting (see below).
Signals LEXER-MATCH-ERROR as a continuable error if no match was found.
The syntax of BODY is:
<doc string>?
(<alias definition>*)
<lexical action>*
An <alias definition> is a list
(<keyword> <regex string>)
The name of the keyword may be used in the <lexical action> regexes. A <lexical action> is a list
(<pattern spec> &body <code>)
A <pattern spec> has the following grammar:
<pattern spec> := <regex string>
| (EAGER <regex string>)
The EAGER option is defined below.
The <regex string> is matched against the input string greedily and in the order they are listed in the BODY. When the longest match is found, assuming no EAGER declarations, it will execute <code>. Within <code>, the following symbols are bound:
$1, $2, ..., $n: String match on (unnamed) register n
$NAME : String match on named register (?<NAME>...)
$@ : Entire string match.
$<, $> : Start and end position of match.
Generally, <code> should explicitly RETURN some token object for a semantic analyzer to examine. Currently, only a single value can be returned. (All other values will be ignored.) An explicit RETURN is needed. If no RETURN is provided, then the lexer will throw away the match and move on as if the lexer were called again. (This is most often used to ignore matches, like whitespace.)
The <regex string> of the lexical action may use the names of the symbols defined in the <alias definition> forms. For example, given the alias definitions
((:int "\\d+")
(:ident "[a-z]+"))
one can use {{INT}} and {{IDENT}} within the <regex string>s of the <lexical action>.
If the <pattern spec> uses EAGER, then the lexical action will "short circuit". The EAGER option states that if a match occurs on this pattern, <code> should be executed immediately, disregarding the "longest match" rule. This can be used for certain kinds of optimizations.
alexa.lisp (file)
Previous: Exported macros, Up: Exported definitions [Contents][Index]
Error to be signaled if the lexer didn’t find a match.
alexa.lisp (file)
simple-error (condition)
Previous: Exported definitions, Up: Definitions [Contents][Index]
• Internal macros | ||
• Internal functions | ||
• Internal structures | ||
• Internal types |
Next: Internal functions, Previous: Internal definitions, Up: Internal definitions [Contents][Index]
alexa.lisp (file)
Next: Internal structures, Previous: Internal macros, Up: Internal definitions [Contents][Index]
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
alexa.lisp (file)
Next: Internal types, Previous: Internal functions, Up: Internal definitions [Contents][Index]
alexa.lisp (file)
structure-object (structure)
string
pattern-regex (function)
(setf pattern-regex) (function)
boolean
pattern-short-circuit-p (function)
(setf pattern-short-circuit-p) (function)
pattern-parse-tree (function)
(setf pattern-parse-tree) (function)
integer
pattern-num-registers (function)
(setf pattern-num-registers) (function)
list
pattern-register-names (function)
(setf pattern-register-names) (function)
symbol
pattern-scanner-name (function)
(setf pattern-scanner-name) (function)
symbol
pattern-fire-name (function)
(setf pattern-fire-name) (function)
pattern-code (function)
(setf pattern-code) (function)
Previous: Internal structures, Up: Internal definitions [Contents][Index]
alexa.lisp (file)
Previous: Definitions, Up: Top [Contents][Index]
• Concept index | ||
• Function index | ||
• Variable index | ||
• Data type index |
Next: Function index, Previous: Indexes, Up: Indexes [Contents][Index]
Jump to: | A F L M S |
---|
Jump to: | A F L M S |
---|
Next: Variable index, Previous: Concept index, Up: Indexes [Contents][Index]
Jump to: | (
C D E F G L M P W |
---|
Jump to: | (
C D E F G L M P W |
---|
Next: Data type index, Previous: Function index, Up: Indexes [Contents][Index]
Jump to: | C F N P R S |
---|
Jump to: | C F N P R S |
---|
Previous: Variable index, Up: Indexes [Contents][Index]
Jump to: | A C L N P S T |
---|
Jump to: | A C L N P S T |
---|