The chtml-matcher Reference Manual

This is the chtml-matcher Reference Manual, version 1.0, generated automatically by Declt version 4.0 beta 2 "William Riker" on Mon Feb 26 14:54:32 2024 GMT+0.

Table of Contents


1 Introduction


2 Systems

The main system appears first, followed by any subsystem dependency.


2.1 chtml-matcher

A unifying template matcher based on closure-html for web scraping and extraction

Maintainer

Ian Eslick

Author

Ian Eslick

License

MIT style license

Version

1.0

Dependencies
  • closure-html (system).
  • stdutils (system).
  • f-underscore (system).
  • cl-ppcre (system).
Source

chtml-matcher.asd.

Child Components

3 Files

Files are sorted by type and then listed depth-first from the systems components trees.


3.1 Lisp


3.1.1 chtml-matcher/chtml-matcher.asd

Source

chtml-matcher.asd.

Parent Component

chtml-matcher (system).

ASDF Systems

chtml-matcher.

Packages

chtml-matcher-system.


3.1.2 chtml-matcher/package.lisp

Source

chtml-matcher.asd.

Parent Component

chtml-matcher (system).

Packages

chtml-matcher.


3.1.3 chtml-matcher/bindings.lisp

Source

chtml-matcher.asd.

Parent Component

chtml-matcher (system).

Public Interface
Internals

3.1.4 chtml-matcher/matcher.lisp

Source

chtml-matcher.asd.

Parent Component

chtml-matcher (system).

Public Interface
Internals

4 Packages

Packages are listed by definition order.


4.1 chtml-matcher

Source

package.lisp.

Use List
  • common-lisp.
  • f-underscore.
  • stdutils.
Public Interface
Internals

4.2 chtml-matcher-system

Source

chtml-matcher.asd.

Use List
  • asdf/interface.
  • common-lisp.

5 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


5.1 Public Interface


5.1.1 Macros

Macro: with-bindings (vars bindings &body body)
Package

chtml-matcher.

Source

bindings.lisp.


5.1.2 Ordinary functions

Function: clear-bindings (dict)

Clear all bindings

Package

chtml-matcher.

Source

bindings.lisp.

Function: find-in-lhtml (lhtml tag attributes &optional n)

Convenience function for generating state from an lhtml tree

Package

chtml-matcher.

Source

matcher.lisp.

Function: get-binding (var dict)
Package

chtml-matcher.

Source

bindings.lisp.

Function: get-bindings (dict)

Return an alist of bindings

Package

chtml-matcher.

Source

bindings.lisp.

Function: html->lhtml (html)
Package

chtml-matcher.

Source

matcher.lisp.

Function: lhtml->html (lhtml)
Package

chtml-matcher.

Source

matcher.lisp.

Function: lhtml-constant-node-p (tree)
Package

chtml-matcher.

Source

matcher.lisp.

Function: lhtml-node-attribute-name (attr)
Package

chtml-matcher.

Source

matcher.lisp.

Function: lhtml-node-attribute-value (attr)
Package

chtml-matcher.

Source

matcher.lisp.

Function: lhtml-node-attributes (node)
Package

chtml-matcher.

Source

matcher.lisp.

Function: lhtml-node-body (node)
Package

chtml-matcher.

Source

matcher.lisp.

Function: lhtml-node-name (node)
Package

chtml-matcher.

Source

matcher.lisp.

Function: lhtml-node-string (lhtml-node)
Package

chtml-matcher.

Source

matcher.lisp.

Function: make-bindings (&optional variable value)

Make bindings, optionally with a seed variable and value

Package

chtml-matcher.

Source

bindings.lisp.

Function: match-template (template datum)

Top level matcher

Package

chtml-matcher.

Source

matcher.lisp.

Function: set-binding (var value dict)
Package

chtml-matcher.

Source

bindings.lisp.


5.1.3 Generic functions

Generic Function: set-bindings (bind1 bind2)
Package

chtml-matcher.

Methods
Method: set-bindings ((dict1 binding-dictionary) dict2)
Source

bindings.lisp.

Method: set-bindings ((bindings cons) (dict binding-dictionary))

Set all bindings in bindings list and return the dict. First argument dominates.

Source

bindings.lisp.

Method: set-bindings (bindings (dict binding-dictionary))
Source

bindings.lisp.

Method: set-bindings (bind1 bind2)
Source

bindings.lisp.


5.1.4 Standalone methods

Method: print-object ((dict binding-dictionary) stream)
Source

bindings.lisp.

Method: print-object ((state parser-state) stream)
Source

matcher.lisp.


5.1.5 Classes

Class: binding-dictionary
Package

chtml-matcher.

Source

bindings.lisp.

Direct methods
Direct slots
Slot: binds
Initargs

:bindings

Readers

bindings.

Writers

(setf bindings).


5.2 Internals


5.2.1 Special variables

Special Variable: *match-logging*
Package

chtml-matcher.

Source

matcher.lisp.

Special Variable: *match-logging-indent*
Package

chtml-matcher.

Source

matcher.lisp.

Special Variable: *match-logging-stream*
Package

chtml-matcher.

Source

matcher.lisp.


5.2.2 Macros

Macro: assert-state ()

Test the invariant properties of the state

Package

chtml-matcher.

Source

matcher.lisp.

Macro: match-log-message (op state &rest criterion)
Package

chtml-matcher.

Source

matcher.lisp.

Macro: tglambda (args msg &body body)
Package

chtml-matcher.

Source

matcher.lisp.

Macro: with-body-binds ((var state fn) &body body)
Package

chtml-matcher.

Source

matcher.lisp.

Macro: with-local-parse-state ((var state) &body body)

Make it easy to perform a non-distructive parse operation over a subtree based on the current parse state

Package

chtml-matcher.

Source

matcher.lisp.

Macro: with-state ((state) &body body)
Package

chtml-matcher.

Source

matcher.lisp.

Macro: with-template ((tag args body) template &body rest)

Generate local vars for various for template components

Package

chtml-matcher.

Source

matcher.lisp.


5.2.3 Ordinary functions

Function: as-keyword (object)

Convert a string or symbol to a keyword symbol

Package

chtml-matcher.

Source

matcher.lisp.

Function: attributes-equal-p (tattrs nattrs)

Verify that attrs1 is a proper subset of attrs2
under equalp of string form of names. Ignore variable attribute values

Package

chtml-matcher.

Source

matcher.lisp.

Function: bind-attributes (node attr-template bindings)

Given an attribute template and the current node, when bindings exist and a variable occurs in the template attribute value position, add it to the bindings

Package

chtml-matcher.

Source

matcher.lisp.

Function: bind-node (node variable attributes-template bindings)

Bind the node to the variable including attributes if wanted

Package

chtml-matcher.

Source

matcher.lisp.

Function: bind-node-body (node variable bindings)

Bind the body list to the variable in bindings

Package

chtml-matcher.

Source

matcher.lisp.

Function: children-done-p (state)

If body is empty or has one element, return t

Package

chtml-matcher.

Source

matcher.lisp.

Function: clean-var (var)
Package

chtml-matcher.

Source

bindings.lisp.

Function: copy-parser-state (instance)
Package

chtml-matcher.

Source

matcher.lisp.

Function: copy-state (state)

Make a duplicate of the current state

Package

chtml-matcher.

Source

matcher.lisp.

Function: current-node (state)

Current node is always first element of the body list (Invariant)

Package

chtml-matcher.

Source

matcher.lisp.

Function: current-path-tags (state)

List of tags from root to current

Package

chtml-matcher.

Source

matcher.lisp.

Function: find-all-nodes (state tag attributes)

Find all matching instances of a node in the tree

Package

chtml-matcher.

Source

matcher.lisp.

Function: find-and-bind-node (state tag attributes variable bindings &optional n)

Find a node and bind it and it’s attributes if provided

Package

chtml-matcher.

Source

matcher.lisp.

Function: find-child (state tag attributes)

Walk the child list of the parser until a match is found. If no more children, returns nil.

Package

chtml-matcher.

Source

matcher.lisp.

Function: find-node (state tag attributes &optional n)

Find the nth occurance of tag and attributes from current state via next-node

Package

chtml-matcher.

Source

matcher.lisp.

Function: finish-body (state)

When we’re done with the body, return to prior path, popping as necessary

Package

chtml-matcher.

Source

matcher.lisp.

Function: get-attribute (name attributes)

Given a name, equalp match string forms of name and attribute nmaes

Package

chtml-matcher.

Source

matcher.lisp.

Function: instance-variable-p (symbol)
Package

chtml-matcher.

Source

matcher.lisp.

Function: log-state (state)
Package

chtml-matcher.

Source

matcher.lisp.

Function: make-local-state (state)

Make a new state object rooted at the current node

Package

chtml-matcher.

Source

matcher.lisp.

Function: make-parser-state (&key tree path body)
Package

chtml-matcher.

Source

matcher.lisp.

Function: make-state (lhtml)

The initial state consists of a virtual body
of which the current node is the top level node of the tree. We keep track of the root of the tree.

Package

chtml-matcher.

Source

matcher.lisp.

Function: map-child-bindings (fn state body-fns)

Map fn across sequential applications of body-fns for the body list of the provided state. Moves state to end of child list and returns bindings if all match

Package

chtml-matcher.

Source

matcher.lisp.

Function: match-log-end (result)
Package

chtml-matcher.

Source

matcher.lisp.

Function: next-child (state)

Linear walk of the current child list, nil on end of list

Package

chtml-matcher.

Source

matcher.lisp.

Function: next-node (state)

Depth first tree walker. Given the current state, update the state so that (first body) contains the next node in the tree. Returns the side effected state

Package

chtml-matcher.

Source

matcher.lisp.

Function: node-match-p (state tag attributes)

Match current node to tag and attributes

Package

chtml-matcher.

Source

matcher.lisp.

Reader: parser-state-body (instance)
Writer: (setf parser-state-body) (instance)
Package

chtml-matcher.

Source

matcher.lisp.

Target Slot

body.

Function: parser-state-p (object)
Package

chtml-matcher.

Source

matcher.lisp.

Reader: parser-state-path (instance)
Writer: (setf parser-state-path) (instance)
Package

chtml-matcher.

Source

matcher.lisp.

Target Slot

path.

Reader: parser-state-tree (instance)
Writer: (setf parser-state-tree) (instance)
Package

chtml-matcher.

Source

matcher.lisp.

Target Slot

tree.

Function: reset-state (state)

Reset state to the initial state

Package

chtml-matcher.

Source

matcher.lisp.

Function: search-tag-p (tag)

Is this tag a search variable?

Package

chtml-matcher.

Source

matcher.lisp.

Function: start-body (state node-body)

Modify state to make the first node of the current node’s body the current node and record the state of the current body variable to the path variable. When we pop, we the next node is at the top so we push the rest of the current body

Package

chtml-matcher.

Source

matcher.lisp.

Function: start-current-body (state)

Modify state to make the current node the first

Package

chtml-matcher.

Source

matcher.lisp.

Function: state-done-p (state)
Package

chtml-matcher.

Source

matcher.lisp.

Function: symbol->base (var)

Return a symbol minus the leading character

Package

chtml-matcher.

Source

matcher.lisp.

Function: symbol->base-keyword (var)

Return a symbol minus the leading character

Package

chtml-matcher.

Source

matcher.lisp.

Function: symbol-base (var)

Return the base string by stripping the leading character

Package

chtml-matcher.

Source

matcher.lisp.

Function: tag-equal-p (tag1 tag2)

Ensure that two tags are equal

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-bind-children (variable body-fns)

Same as tgen-merge-children but records the list of bindings from the body-fns to variable in a fresh bindings set

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-find (tag attributes body-fn)

Find a node by tag and attributes and bind via tgen-match. State points to the child node after the bound node

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-find-bind (variable tag attributes body-fn)

Like tgen-find, but uses tgen-match-bind

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-match (tag attributes body-fn)

Try to match the current node to tag & attributes if body-fn is satisfied and return any bound attributes. Moves parse state to the next child node.

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-match-bind (variable tag attributes body-fn)

Match node and add a reference to it to the bindings. Parse state is unchanged. Relies on tgen-match debug info

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-match-fn (fn)

Returns: result from calling function Side Effect: next-child

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-match-nth (count body-fn)

Find the nth match for the provided state assuming body-fn moves the state to the next relevant node to test. Basically it’s a closure that when it’s called, recursively calls body-fn until counter hits zero and returns the last value of body-fn

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-match-regex (variable expr)

Returns: binding with variable matched to regex register result or nil

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-match-string (string)

Returns: t when it matches

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-match-var (variable)

Matches anything and binds it to variable in a fresh binding Returns: bindings

Package

chtml-matcher.

Source

matcher.lisp.

Function: tgen-merge-children (body-fns)

Assumes the parse tree is looking at the first element of a tag body
and that the body-fns are required sequential matches. Walks children until (current-node subtree) is null or all body-fns have been processed. Merges all the bindings returned from each body-fn. Each body-fn goes to next-child.

Package

chtml-matcher.

Source

matcher.lisp.

Function: trace-tgen ()
Package

chtml-matcher.

Source

matcher.lisp.

Function: variable-p (symbol)

Identify matching variables by leading #?

Package

chtml-matcher.

Source

matcher.lisp.


5.2.4 Generic functions

Generic Reader: bindings (object)
Package

chtml-matcher.

Methods
Reader Method: bindings ((binding-dictionary binding-dictionary))

automatically generated reader method

Source

bindings.lisp.

Target Slot

binds.

Generic Writer: (setf bindings) (object)
Package

chtml-matcher.

Methods
Writer Method: (setf bindings) ((binding-dictionary binding-dictionary))

automatically generated writer method

Source

bindings.lisp.

Target Slot

binds.

Generic Function: gen-template-matcher (head tbody)
Package

chtml-matcher.

Methods
Method: gen-template-matcher ((head symbol) tbody)
Source

matcher.lisp.

Method: gen-template-matcher ((head (eql :merge)) forms)
Source

matcher.lisp.

Method: gen-template-matcher ((head (eql :all)) tbody)
Source

matcher.lisp.

Method: gen-template-matcher ((head (eql :regex)) tbody)
Source

matcher.lisp.

Method: gen-template-matcher ((head (eql :fn)) tbody)
Source

matcher.lisp.

Method: gen-template-matcher ((head (eql :nth)) tbody)
Source

matcher.lisp.

Generic Function: generate-template (template)
Package

chtml-matcher.

Methods
Method: generate-template (template)

Recursively walk the template, generating nested matcher functions

Source

matcher.lisp.


5.2.5 Structures

Structure: parser-state
Package

chtml-matcher.

Source

matcher.lisp.

Direct superclasses

structure-object.

Direct methods

print-object.

Direct slots
Slot: tree
Readers

parser-state-tree.

Writers

(setf parser-state-tree).

Slot: path
Readers

parser-state-path.

Writers

(setf parser-state-path).

Slot: body
Readers

parser-state-body.

Writers

(setf parser-state-body).


Appendix A Indexes


A.1 Concepts


A.2 Functions

Jump to:   (  
A   B   C   F   G   H   I   L   M   N   P   R   S   T   V   W  
Index Entry  Section

(
(setf bindings): Private generic functions
(setf bindings): Private generic functions
(setf parser-state-body): Private ordinary functions
(setf parser-state-path): Private ordinary functions
(setf parser-state-tree): Private ordinary functions

A
as-keyword: Private ordinary functions
assert-state: Private macros
attributes-equal-p: Private ordinary functions

B
bind-attributes: Private ordinary functions
bind-node: Private ordinary functions
bind-node-body: Private ordinary functions
bindings: Private generic functions
bindings: Private generic functions

C
children-done-p: Private ordinary functions
clean-var: Private ordinary functions
clear-bindings: Public ordinary functions
copy-parser-state: Private ordinary functions
copy-state: Private ordinary functions
current-node: Private ordinary functions
current-path-tags: Private ordinary functions

F
find-all-nodes: Private ordinary functions
find-and-bind-node: Private ordinary functions
find-child: Private ordinary functions
find-in-lhtml: Public ordinary functions
find-node: Private ordinary functions
finish-body: Private ordinary functions
Function, (setf parser-state-body): Private ordinary functions
Function, (setf parser-state-path): Private ordinary functions
Function, (setf parser-state-tree): Private ordinary functions
Function, as-keyword: Private ordinary functions
Function, attributes-equal-p: Private ordinary functions
Function, bind-attributes: Private ordinary functions
Function, bind-node: Private ordinary functions
Function, bind-node-body: Private ordinary functions
Function, children-done-p: Private ordinary functions
Function, clean-var: Private ordinary functions
Function, clear-bindings: Public ordinary functions
Function, copy-parser-state: Private ordinary functions
Function, copy-state: Private ordinary functions
Function, current-node: Private ordinary functions
Function, current-path-tags: Private ordinary functions
Function, find-all-nodes: Private ordinary functions
Function, find-and-bind-node: Private ordinary functions
Function, find-child: Private ordinary functions
Function, find-in-lhtml: Public ordinary functions
Function, find-node: Private ordinary functions
Function, finish-body: Private ordinary functions
Function, get-attribute: Private ordinary functions
Function, get-binding: Public ordinary functions
Function, get-bindings: Public ordinary functions
Function, html->lhtml: Public ordinary functions
Function, instance-variable-p: Private ordinary functions
Function, lhtml->html: Public ordinary functions
Function, lhtml-constant-node-p: Public ordinary functions
Function, lhtml-node-attribute-name: Public ordinary functions
Function, lhtml-node-attribute-value: Public ordinary functions
Function, lhtml-node-attributes: Public ordinary functions
Function, lhtml-node-body: Public ordinary functions
Function, lhtml-node-name: Public ordinary functions
Function, lhtml-node-string: Public ordinary functions
Function, log-state: Private ordinary functions
Function, make-bindings: Public ordinary functions
Function, make-local-state: Private ordinary functions
Function, make-parser-state: Private ordinary functions
Function, make-state: Private ordinary functions
Function, map-child-bindings: Private ordinary functions
Function, match-log-end: Private ordinary functions
Function, match-template: Public ordinary functions
Function, next-child: Private ordinary functions
Function, next-node: Private ordinary functions
Function, node-match-p: Private ordinary functions
Function, parser-state-body: Private ordinary functions
Function, parser-state-p: Private ordinary functions
Function, parser-state-path: Private ordinary functions
Function, parser-state-tree: Private ordinary functions
Function, reset-state: Private ordinary functions
Function, search-tag-p: Private ordinary functions
Function, set-binding: Public ordinary functions
Function, start-body: Private ordinary functions
Function, start-current-body: Private ordinary functions
Function, state-done-p: Private ordinary functions
Function, symbol->base: Private ordinary functions
Function, symbol->base-keyword: Private ordinary functions
Function, symbol-base: Private ordinary functions
Function, tag-equal-p: Private ordinary functions
Function, tgen-bind-children: Private ordinary functions
Function, tgen-find: Private ordinary functions
Function, tgen-find-bind: Private ordinary functions
Function, tgen-match: Private ordinary functions
Function, tgen-match-bind: Private ordinary functions
Function, tgen-match-fn: Private ordinary functions
Function, tgen-match-nth: Private ordinary functions
Function, tgen-match-regex: Private ordinary functions
Function, tgen-match-string: Private ordinary functions
Function, tgen-match-var: Private ordinary functions
Function, tgen-merge-children: Private ordinary functions
Function, trace-tgen: Private ordinary functions
Function, variable-p: Private ordinary functions

G
gen-template-matcher: Private generic functions
gen-template-matcher: Private generic functions
gen-template-matcher: Private generic functions
gen-template-matcher: Private generic functions
gen-template-matcher: Private generic functions
gen-template-matcher: Private generic functions
gen-template-matcher: Private generic functions
generate-template: Private generic functions
generate-template: Private generic functions
Generic Function, (setf bindings): Private generic functions
Generic Function, bindings: Private generic functions
Generic Function, gen-template-matcher: Private generic functions
Generic Function, generate-template: Private generic functions
Generic Function, set-bindings: Public generic functions
get-attribute: Private ordinary functions
get-binding: Public ordinary functions
get-bindings: Public ordinary functions

H
html->lhtml: Public ordinary functions

I
instance-variable-p: Private ordinary functions

L
lhtml->html: Public ordinary functions
lhtml-constant-node-p: Public ordinary functions
lhtml-node-attribute-name: Public ordinary functions
lhtml-node-attribute-value: Public ordinary functions
lhtml-node-attributes: Public ordinary functions
lhtml-node-body: Public ordinary functions
lhtml-node-name: Public ordinary functions
lhtml-node-string: Public ordinary functions
log-state: Private ordinary functions

M
Macro, assert-state: Private macros
Macro, match-log-message: Private macros
Macro, tglambda: Private macros
Macro, with-bindings: Public macros
Macro, with-body-binds: Private macros
Macro, with-local-parse-state: Private macros
Macro, with-state: Private macros
Macro, with-template: Private macros
make-bindings: Public ordinary functions
make-local-state: Private ordinary functions
make-parser-state: Private ordinary functions
make-state: Private ordinary functions
map-child-bindings: Private ordinary functions
match-log-end: Private ordinary functions
match-log-message: Private macros
match-template: Public ordinary functions
Method, (setf bindings): Private generic functions
Method, bindings: Private generic functions
Method, gen-template-matcher: Private generic functions
Method, gen-template-matcher: Private generic functions
Method, gen-template-matcher: Private generic functions
Method, gen-template-matcher: Private generic functions
Method, gen-template-matcher: Private generic functions
Method, gen-template-matcher: Private generic functions
Method, generate-template: Private generic functions
Method, print-object: Public standalone methods
Method, print-object: Public standalone methods
Method, set-bindings: Public generic functions
Method, set-bindings: Public generic functions
Method, set-bindings: Public generic functions
Method, set-bindings: Public generic functions

N
next-child: Private ordinary functions
next-node: Private ordinary functions
node-match-p: Private ordinary functions

P
parser-state-body: Private ordinary functions
parser-state-p: Private ordinary functions
parser-state-path: Private ordinary functions
parser-state-tree: Private ordinary functions
print-object: Public standalone methods
print-object: Public standalone methods

R
reset-state: Private ordinary functions

S
search-tag-p: Private ordinary functions
set-binding: Public ordinary functions
set-bindings: Public generic functions
set-bindings: Public generic functions
set-bindings: Public generic functions
set-bindings: Public generic functions
set-bindings: Public generic functions
start-body: Private ordinary functions
start-current-body: Private ordinary functions
state-done-p: Private ordinary functions
symbol->base: Private ordinary functions
symbol->base-keyword: Private ordinary functions
symbol-base: Private ordinary functions

T
tag-equal-p: Private ordinary functions
tgen-bind-children: Private ordinary functions
tgen-find: Private ordinary functions
tgen-find-bind: Private ordinary functions
tgen-match: Private ordinary functions
tgen-match-bind: Private ordinary functions
tgen-match-fn: Private ordinary functions
tgen-match-nth: Private ordinary functions
tgen-match-regex: Private ordinary functions
tgen-match-string: Private ordinary functions
tgen-match-var: Private ordinary functions
tgen-merge-children: Private ordinary functions
tglambda: Private macros
trace-tgen: Private ordinary functions

V
variable-p: Private ordinary functions

W
with-bindings: Public macros
with-body-binds: Private macros
with-local-parse-state: Private macros
with-state: Private macros
with-template: Private macros