This is the chtml-matcher Reference Manual, version 1.0, generated automatically by Declt version 4.0 beta 2 "William Riker" on Sun Dec 15 04:36:07 2024 GMT+0.
The main system appears first, followed by any subsystem dependency.
chtml-matcher
A unifying template matcher based on closure-html for web scraping and extraction
Ian Eslick
Ian Eslick
MIT style license
1.0
closure-html
(system).
stdutils
(system).
f-underscore
(system).
cl-ppcre
(system).
package.lisp
(file).
bindings.lisp
(file).
matcher.lisp
(file).
Files are sorted by type and then listed depth-first from the systems components trees.
chtml-matcher/chtml-matcher.asd
chtml-matcher/package.lisp
chtml-matcher/bindings.lisp
chtml-matcher/matcher.lisp
chtml-matcher/chtml-matcher.asd
chtml-matcher
(system).
chtml-matcher/bindings.lisp
chtml-matcher
(system).
binding-dictionary
(class).
clear-bindings
(function).
get-binding
(function).
get-bindings
(function).
make-bindings
(function).
print-object
(method).
set-binding
(function).
set-bindings
(method).
set-bindings
(method).
set-bindings
(method).
set-bindings
(method).
with-bindings
(macro).
bindings
(reader method).
(setf bindings)
(writer method).
clean-var
(function).
chtml-matcher/matcher.lisp
chtml-matcher
(system).
find-in-lhtml
(function).
html->lhtml
(function).
lhtml->html
(function).
lhtml-constant-node-p
(function).
lhtml-node-attribute-name
(function).
lhtml-node-attribute-value
(function).
lhtml-node-attributes
(function).
lhtml-node-body
(function).
lhtml-node-name
(function).
lhtml-node-string
(function).
match-template
(function).
print-object
(method).
*match-logging*
(special variable).
*match-logging-indent*
(special variable).
*match-logging-stream*
(special variable).
as-keyword
(function).
assert-state
(macro).
attributes-equal-p
(function).
bind-attributes
(function).
bind-node
(function).
bind-node-body
(function).
children-done-p
(function).
copy-parser-state
(function).
copy-state
(function).
current-node
(function).
current-path-tags
(function).
find-all-nodes
(function).
find-and-bind-node
(function).
find-child
(function).
find-node
(function).
finish-body
(function).
gen-template-matcher
(method).
gen-template-matcher
(method).
gen-template-matcher
(method).
gen-template-matcher
(method).
gen-template-matcher
(method).
gen-template-matcher
(method).
generate-template
(method).
get-attribute
(function).
instance-variable-p
(function).
log-state
(function).
make-local-state
(function).
make-parser-state
(function).
make-state
(function).
map-child-bindings
(function).
match-log-end
(function).
match-log-message
(macro).
next-child
(function).
next-node
(function).
node-match-p
(function).
parser-state
(structure).
parser-state-body
(reader).
(setf parser-state-body)
(writer).
parser-state-p
(function).
parser-state-path
(reader).
(setf parser-state-path)
(writer).
parser-state-tree
(reader).
(setf parser-state-tree)
(writer).
reset-state
(function).
search-tag-p
(function).
start-body
(function).
start-current-body
(function).
state-done-p
(function).
symbol->base
(function).
symbol->base-keyword
(function).
symbol-base
(function).
tag-equal-p
(function).
tgen-bind-children
(function).
tgen-find
(function).
tgen-find-bind
(function).
tgen-match
(function).
tgen-match-bind
(function).
tgen-match-fn
(function).
tgen-match-nth
(function).
tgen-match-regex
(function).
tgen-match-string
(function).
tgen-match-var
(function).
tgen-merge-children
(function).
tglambda
(macro).
trace-tgen
(function).
variable-p
(function).
with-body-binds
(macro).
with-local-parse-state
(macro).
with-state
(macro).
with-template
(macro).
Packages are listed by definition order.
chtml-matcher
common-lisp
.
f-underscore
.
stdutils
.
binding-dictionary
(class).
clear-bindings
(function).
find-in-lhtml
(function).
get-binding
(function).
get-bindings
(function).
html->lhtml
(function).
lhtml->html
(function).
lhtml-constant-node-p
(function).
lhtml-node-attribute-name
(function).
lhtml-node-attribute-value
(function).
lhtml-node-attributes
(function).
lhtml-node-body
(function).
lhtml-node-name
(function).
lhtml-node-string
(function).
make-bindings
(function).
match-template
(function).
set-binding
(function).
set-bindings
(generic function).
with-bindings
(macro).
*match-logging*
(special variable).
*match-logging-indent*
(special variable).
*match-logging-stream*
(special variable).
as-keyword
(function).
assert-state
(macro).
attributes-equal-p
(function).
bind-attributes
(function).
bind-node
(function).
bind-node-body
(function).
bindings
(generic reader).
(setf bindings)
(generic writer).
children-done-p
(function).
clean-var
(function).
copy-parser-state
(function).
copy-state
(function).
current-node
(function).
current-path-tags
(function).
find-all-nodes
(function).
find-and-bind-node
(function).
find-child
(function).
find-node
(function).
finish-body
(function).
gen-template-matcher
(generic function).
generate-template
(generic function).
get-attribute
(function).
instance-variable-p
(function).
log-state
(function).
make-local-state
(function).
make-parser-state
(function).
make-state
(function).
map-child-bindings
(function).
match-log-end
(function).
match-log-message
(macro).
next-child
(function).
next-node
(function).
node-match-p
(function).
parser-state
(structure).
parser-state-body
(reader).
(setf parser-state-body)
(writer).
parser-state-p
(function).
parser-state-path
(reader).
(setf parser-state-path)
(writer).
parser-state-tree
(reader).
(setf parser-state-tree)
(writer).
reset-state
(function).
search-tag-p
(function).
start-body
(function).
start-current-body
(function).
state-done-p
(function).
symbol->base
(function).
symbol->base-keyword
(function).
symbol-base
(function).
tag-equal-p
(function).
tgen-bind-children
(function).
tgen-find
(function).
tgen-find-bind
(function).
tgen-match
(function).
tgen-match-bind
(function).
tgen-match-fn
(function).
tgen-match-nth
(function).
tgen-match-regex
(function).
tgen-match-string
(function).
tgen-match-var
(function).
tgen-merge-children
(function).
tglambda
(macro).
trace-tgen
(function).
variable-p
(function).
with-body-binds
(macro).
with-local-parse-state
(macro).
with-state
(macro).
with-template
(macro).
Definitions are sorted by export status, category, package, and then by lexicographic order.
Clear all bindings
Convenience function for generating state from an lhtml tree
Return an alist of bindings
Make bindings, optionally with a seed variable and value
Top level matcher
binding-dictionary
) dict2) ¶cons
) (dict binding-dictionary
)) ¶Set all bindings in bindings list and return the dict. First argument dominates.
binding-dictionary
)) ¶binding-dictionary
) stream) ¶parser-state
) stream) ¶Test the invariant properties of the state
Make it easy to perform a non-distructive parse operation over a subtree based on the current parse state
Generate local vars for various for template components
Convert a string or symbol to a keyword symbol
Verify that attrs1 is a proper subset of attrs2
under equalp of string form of names. Ignore variable
attribute values
Given an attribute template and the current node, when bindings exist and a variable occurs in the template attribute value position, add it to the bindings
Bind the node to the variable including attributes if wanted
Bind the body list to the variable in bindings
If body is empty or has one element, return t
Make a duplicate of the current state
Current node is always first element of the body list (Invariant)
List of tags from root to current
Find all matching instances of a node in the tree
Find a node and bind it and it’s attributes if provided
Walk the child list of the parser until a match is found. If no more children, returns nil.
Find the nth occurance of tag and attributes from current state via next-node
When we’re done with the body, return to prior path, popping as necessary
Given a name, equalp match string forms of name and attribute nmaes
Make a new state object rooted at the current node
The initial state consists of a virtual body
of which the current node is the top level node of
the tree. We keep track of the root of the tree.
Map fn across sequential applications of body-fns for the body list of the provided state. Moves state to end of child list and returns bindings if all match
Linear walk of the current child list, nil on end of list
Depth first tree walker. Given the current state, update the state so that (first body) contains the next node in the tree. Returns the side effected state
Match current node to tag and attributes
body
.
path
.
tree
.
Reset state to the initial state
Is this tag a search variable?
Modify state to make the first node of the current node’s body the current node and record the state of the current body variable to the path variable. When we pop, we the next node is at the top so we push the rest of the current body
Modify state to make the current node the first
Return a symbol minus the leading character
Return a symbol minus the leading character
Return the base string by stripping the leading character
Ensure that two tags are equal
Same as tgen-merge-children but records the list of bindings from the body-fns to variable in a fresh bindings set
Find a node by tag and attributes and bind via tgen-match. State points to the child node after the bound node
Like tgen-find, but uses tgen-match-bind
Try to match the current node to tag & attributes if body-fn is satisfied and return any bound attributes. Moves parse state to the next child node.
Match node and add a reference to it to the bindings. Parse state is unchanged. Relies on tgen-match debug info
Returns: result from calling function Side Effect: next-child
Find the nth match for the provided state assuming body-fn moves the state to the next relevant node to test. Basically it’s a closure that when it’s called, recursively calls body-fn until counter hits zero and returns the last value of body-fn
Returns: binding with variable matched to regex register result or nil
Returns: t when it matches
Matches anything and binds it to variable in a fresh binding Returns: bindings
Assumes the parse tree is looking at the first element of a tag body
and that the body-fns are required sequential matches. Walks children
until (current-node subtree) is null or all body-fns have been processed.
Merges all the bindings returned from each body-fn. Each body-fn goes to
next-child.
Identify matching variables by leading #?
binding-dictionary
)) ¶automatically generated reader method
binding-dictionary
)) ¶automatically generated writer method
symbol
) tbody) ¶(eql :merge)
) forms) ¶(eql :all)
) tbody) ¶(eql :regex)
) tbody) ¶(eql :fn)
) tbody) ¶(eql :nth)
) tbody) ¶Recursively walk the template, generating nested matcher functions
Jump to: | (
A B C F G H I L M N P R S T V W |
---|
Jump to: | (
A B C F G H I L M N P R S T V W |
---|
Jump to: | *
B P S T |
---|
Jump to: | *
B P S T |
---|
Jump to: | B C F M P S |
---|
Jump to: | B C F M P S |
---|