This is the cl-conllu Reference Manual, version 0.15, generated automatically by Declt version 4.0 beta 2 "William Riker" on Sun Sep 15 03:50:48 2024 GMT+0.
cl-conllu/cl-conllu.asd
cl-conllu/packages.lisp
cl-conllu/data.lisp
cl-conllu/read-write.lisp
cl-conllu/evaluate.lisp
cl-conllu/confusion-matrix.lisp
cl-conllu/html.lisp
cl-conllu/query.lisp
cl-conllu/utils.lisp
cl-conllu/projective.lisp
cl-conllu/rdf.lisp
cl-conllu/rdf-wilbur.lisp
cl-conllu/command-line.lisp
cl-conllu/rules.lisp
cl-conllu/editor.lisp
cl-conllu/conllu-prolog.lisp
cl-conllu/tag-converter.lisp
cl-conllu/draw.lisp
The main system appears first, followed by any subsystem dependency.
cl-conllu
Common Lisp library for dealing with CoNLL-U files
Alexandre Rademaker <alexrad@br.ibm.com>
Apache 2.0
This library provides a set of functions to work with CoNLL-U files. See https://universaldependencies.org/format.html for details about the CoNLL-U format adopted by the Universal Dependencies community. The library has functions for read/write files, apply rules for sentences transformation in batch mode, tree visualization, compare and evaluation trees etc. Documentation available in https://github.com/own-pt/cl-conllu/wiki.
0.15
cl-ppcre
(system).
uuid
(system).
alexandria
(system).
cl-log
(system).
split-sequence
(system).
xmls
(system).
yason
(system).
lispbuilder-lexer
(system).
wilbur
(system).
cl-markup
(system).
optima.ppcre
(system).
packages.lisp
(file).
data.lisp
(file).
read-write.lisp
(file).
evaluate.lisp
(file).
confusion-matrix.lisp
(file).
html.lisp
(file).
query.lisp
(file).
utils.lisp
(file).
projective.lisp
(file).
rdf.lisp
(file).
rdf-wilbur.lisp
(file).
command-line.lisp
(file).
rules.lisp
(file).
editor.lisp
(file).
conllu-prolog.lisp
(file).
tag-converter.lisp
(file).
draw.lisp
(file).
Files are sorted by type and then listed depth-first from the systems components trees.
cl-conllu/cl-conllu.asd
cl-conllu/packages.lisp
cl-conllu/data.lisp
cl-conllu/read-write.lisp
cl-conllu/evaluate.lisp
cl-conllu/confusion-matrix.lisp
cl-conllu/html.lisp
cl-conllu/query.lisp
cl-conllu/utils.lisp
cl-conllu/projective.lisp
cl-conllu/rdf.lisp
cl-conllu/rdf-wilbur.lisp
cl-conllu/command-line.lisp
cl-conllu/rules.lisp
cl-conllu/editor.lisp
cl-conllu/conllu-prolog.lisp
cl-conllu/tag-converter.lisp
cl-conllu/draw.lisp
cl-conllu/data.lisp
packages.lisp
(file).
cl-conllu
(system).
adjust-sentence
(function).
initialize-instance
(method).
mtoken
(class).
mtoken-end
(reader method).
(setf mtoken-end)
(writer method).
mtoken-start
(reader method).
(setf mtoken-start)
(writer method).
print-object
(method).
print-object
(method).
sentence
(class).
sentence->text
(function).
sentence-binary-tree
(function).
sentence-equal
(function).
sentence-fill-offsets
(function).
sentence-hash-table
(function).
sentence-id
(function).
sentence-meta
(reader method).
(setf sentence-meta)
(writer method).
sentence-meta-value
(function).
sentence-mtokens
(reader method).
(setf sentence-mtokens)
(writer method).
sentence-root
(function).
sentence-size
(function).
sentence-start
(reader method).
(setf sentence-start)
(writer method).
sentence-text
(function).
sentence-tokens
(reader method).
(setf sentence-tokens)
(writer method).
sentence-valid?
(function).
simple-deprel
(function).
token
(class).
token-add-feature
(function).
token-cfrom
(reader method).
(setf token-cfrom)
(writer method).
token-children
(function).
token-cto
(reader method).
(setf token-cto)
(writer method).
token-deprel
(reader method).
(setf token-deprel)
(writer method).
token-deps
(reader method).
(setf token-deps)
(writer method).
token-feats
(reader method).
(setf token-feats)
(writer method).
token-form
(reader method).
(setf token-form)
(writer method).
token-hash-features
(function).
token-head
(reader method).
(setf token-head)
(writer method).
token-id
(reader method).
(setf token-id)
(writer method).
token-lemma
(reader method).
(setf token-lemma)
(writer method).
token-misc
(reader method).
(setf token-misc)
(writer method).
token-parent
(function).
token-rem-feature
(function).
token-sentence
(reader method).
(setf token-sentence)
(writer method).
token-upostag
(reader method).
(setf token-upostag)
(writer method).
token-xpostag
(reader method).
(setf token-xpostag)
(writer method).
*token-fields*
(special variable).
abstract-token
(class).
etoken
(class).
etoken-deps
(reader method).
(setf etoken-deps)
(writer method).
etoken-feats
(reader method).
(setf etoken-feats)
(writer method).
etoken-index
(reader method).
(setf etoken-index)
(writer method).
etoken-lemma
(reader method).
(setf etoken-lemma)
(writer method).
etoken-prev
(reader method).
(setf etoken-prev)
(writer method).
etoken-upostag
(reader method).
(setf etoken-upostag)
(writer method).
etoken-xpostag
(reader method).
(setf etoken-xpostag)
(writer method).
hash-to-features
(function).
is-descendant?
(function).
mtoken-equal
(function).
sentence-by-id
(function).
sentence-etokens
(reader method).
(setf sentence-etokens)
(writer method).
sentence-get-token-by-id
(function).
sentence-matrix
(function).
set-head
(function).
token-attached
(function).
token-equal
(function).
token-lineno
(reader method).
(setf token-lineno)
(writer method).
token-misc-value
(function).
cl-conllu/read-write.lisp
data.lisp
(file).
cl-conllu
(system).
encode
(method).
encode
(method).
lazy-stream-reader
(function).
make-sentence
(function).
read-conllu
(function).
read-directory
(function).
read-file
(function).
read-stream
(function).
write-conllu
(function).
write-conllu-to-stream
(function).
collect-meta
(function).
line->token
(function).
malformed-field
(condition).
malformed-line
(condition).
parse-field
(function).
write-sentence
(function).
write-token
(generic function).
cl-conllu/evaluate.lisp
data.lisp
(file).
read-write.lisp
(file).
cl-conllu
(system).
attachment-score-by-sentence
(function).
attachment-score-by-word
(function).
exact-match
(function).
exact-match-score
(function).
non-projectivity-accuracy
(function).
non-projectivity-precision
(function).
non-projectivity-recall
(function).
precision
(function).
recall
(function).
*deprel-value-list*
(special variable).
*token-fields*
(special variable).
exact-match-sentence
(function).
sentence-average-score
(function).
sentence-diff
(function).
token-diff
(function).
word-average-score
(function).
cl-conllu/confusion-matrix.lisp
data.lisp
(file).
evaluate.lisp
(file).
cl-conllu
(system).
confusion-matrix
(class).
confusion-matrix-cell-count
(function).
confusion-matrix-cell-tokens
(function).
confusion-matrix-cells-labels
(function).
confusion-matrix-columns-labels
(function).
confusion-matrix-corpus-id
(reader method).
(setf confusion-matrix-corpus-id)
(writer method).
confusion-matrix-labels
(function).
confusion-matrix-normalize
(function).
confusion-matrix-rows-labels
(function).
confusion-matrix-update
(function).
initialize-instance
(method).
make-confusion-matrix
(function).
print-object
(method).
confusion-matrix-copy
(function).
confusion-matrix-key-fn
(reader method).
(setf confusion-matrix-key-fn)
(writer method).
confusion-matrix-rows
(reader method).
(setf confusion-matrix-rows)
(writer method).
confusion-matrix-sort-fn
(reader method).
(setf confusion-matrix-sort-fn)
(writer method).
confusion-matrix-test-fn
(reader method).
(setf confusion-matrix-test-fn)
(writer method).
confusion-matrix-update-sentences
(function).
confusion-matrix-update-tokens
(function).
create-cell
(function).
existing-cell-p
(function).
insert-entry-confusion-matrix
(function).
cl-conllu/html.lisp
data.lisp
(file).
confusion-matrix.lisp
(file).
cl-conllu
(system).
*confusion-matrix-style*
(special variable).
format-html
(generic function).
write-columns-headers
(function).
write-rows
(function).
cl-conllu/query.lisp
cl-conllu
(system).
query
(function).
query-as-json
(function).
cl-conllu/utils.lisp
data.lisp
(file).
query.lisp
(file).
cl-conllu
(system).
diff
(function).
levenshtein
(function).
blank-line?
(function).
find-min
(function).
insert-at
(function).
mappend
(function).
print-diff
(function).
range
(function).
cl-conllu/projective.lisp
utils.lisp
(file).
cl-conllu
(system).
get-projection
(function).
is-sentence-projective
(function).
is-token-projective
(function).
non-projective-punct
(function).
cl-conllu/rdf.lisp
data.lisp
(file).
projective.lisp
(file).
cl-conllu
(system).
convert-rdf
(function).
convert-rdf-file
(function).
components->string
(function).
convert-sentence-to-turtle
(function).
escape-string
(function).
escape-turtle-char
(function).
make-dep
(function).
make-featurename
(function).
make-features
(function).
make-features-bnode
(function).
make-literal
(function).
make-metadata
(function).
make-metadata-bnode
(function).
make-token-id
(function).
make-upos
(function).
unspecified-field?
(function).
cl-conllu/rdf-wilbur.lisp
cl-conllu
(system).
convert-to-rdf
(function).
convert-features-to-rdf
(function).
convert-sentence-metadata
(function).
convert-sentence-to-rdf
(function).
convert-token-to-rdf
(function).
cl-conllu/command-line.lisp
data.lisp
(file).
rdf-wilbur.lisp
(file).
cl-conllu
(system).
adjust-conllu
(function).
draw-conllu
(function).
modify-conllu
(function).
select-sentence
(function).
write-selected-sentence
(function).
cl-conllu/rules.lisp
utils.lisp
(file).
data.lisp
(file).
command-line.lisp
(file).
cl-conllu
(system).
apply-rules
(function).
apply-rules-from-files
(function).
add-value
(function).
apply-changes
(function).
apply-conditions-in-token
(function).
apply-rhs
(function).
apply-rule-in-sentence
(function).
apply-rule-in-tokens
(function).
apply-rules-in-sentence
(function).
apply-rules-in-sentence-aux
(function).
equal-op
(function).
intern-pattern
(function).
intern-rule
(function).
intern-sides
(function).
match-token
(function).
match?
(function).
member-op
(function).
modify-value
(function).
read-rules
(function).
regex-op
(function).
rhs
(function).
rhs-vars
(function).
rls
(function).
rls-vars
(function).
valid-vars
(function).
variable-p
(function).
cl-conllu/editor.lisp
utils.lisp
(file).
data.lisp
(file).
rules.lisp
(file).
cl-conllu
(system).
conlluedit
(function).
action
(function).
actions
(function).
add-or-subt
(function).
apply-act
(function).
apply-rule
(function).
apply-rules
(function).
assocs
(function).
clear
(function).
def-match
(function).
definitions
(function).
defs-tests
(function).
error-test
(macro).
examine-acts
(function).
get-field-value
(function).
index
(reader method).
join-matchs
(function).
malformed-rule
(condition).
match-test
(function).
merge-matchs
(function).
merge-sets
(function).
multiple-merges
(function).
node-matchs
(function).
normalize-shortcut
(function).
rel-match
(function).
relation
(function).
relation-test
(function).
relations
(function).
result-act
(function).
result-acts
(function).
result-set
(function).
result-sets
(function).
sets
(function).
test-feats&misc
(function).
token-matchs
(function).
cl-conllu/conllu-prolog.lisp
data.lisp
(file).
editor.lisp
(file).
cl-conllu
(system).
convert-filename
(function).
*clauses*
(special variable).
*dependencies*
(special variable).
clean-dep-rel
(function).
clean-whitespace
(function).
emit-prolog
(function).
is-root
(function).
make-id
(function).
process-features
(function).
process-tokens
(function).
prolog-string
(function).
toprologid
(function).
valid-line
(function).
write-prolog
(function).
cl-conllu/tag-converter.lisp
data.lisp
(file).
conllu-prolog.lisp
(file).
cl-conllu
(system).
read-file-tag-suffix
(function).
read-sentence-tag-suffix
(function).
write-sentence-tag-suffix-to-stream
(function).
write-sentences-tag-suffix
(function).
write-sentences-tag-suffix-to-stream
(function).
write-token-tag-suffix
(function).
cl-conllu/draw.lisp
data.lisp
(file).
tag-converter.lisp
(file).
cl-conllu
(system).
tree-sentence
(function).
tree-sentence-by-id
(function).
adopted-unfinished?
(function).
get-data
(function).
get-kids
(function).
get-stroke-size
(function).
make-tree
(function).
make-twigs
(function).
update-lines
(function).
Packages are listed by definition order.
conllu.draw
conllu.converters.tags
cl-conllu
conllu.editor
conllu.user
conllu.rules
conllu.prolog
conllu.evaluate
conllu.html
conllu.converters.niceline
conllu.rdf
conllu.draw
cl-conllu
.
common-lisp
.
tree-sentence
(function).
tree-sentence-by-id
(function).
adopted-unfinished?
(function).
get-data
(function).
get-kids
(function).
get-stroke-size
(function).
make-tree
(function).
make-twigs
(function).
update-lines
(function).
conllu.converters.tags
cl-conllu
.
cl-ppcre
.
common-lisp
.
read-file-tag-suffix
(function).
read-sentence-tag-suffix
(function).
write-sentence-tag-suffix-to-stream
(function).
write-sentences-tag-suffix
(function).
write-sentences-tag-suffix-to-stream
(function).
write-token-tag-suffix
(function).
cl-conllu
cl-ppcre
.
com.ravenbrook.common-lisp-log
.
common-lisp
.
split-sequence
.
adjust-sentence
(function).
convert-rdf
(function).
convert-rdf-file
(function).
convert-to-rdf
(function).
diff
(function).
lazy-stream-reader
(function).
levenshtein
(function).
make-sentence
(function).
mtoken
(class).
mtoken-end
(generic reader).
(setf mtoken-end)
(generic writer).
mtoken-start
(generic reader).
(setf mtoken-start)
(generic writer).
query
(function).
query-as-json
(function).
read-conllu
(function).
read-directory
(function).
read-file
(function).
read-stream
(function).
sentence
(class).
sentence->text
(function).
sentence-binary-tree
(function).
sentence-equal
(function).
sentence-fill-offsets
(function).
sentence-hash-table
(function).
sentence-id
(function).
sentence-meta
(generic reader).
(setf sentence-meta)
(generic writer).
sentence-meta-value
(function).
sentence-mtokens
(generic reader).
(setf sentence-mtokens)
(generic writer).
sentence-root
(function).
sentence-size
(function).
sentence-start
(generic reader).
(setf sentence-start)
(generic writer).
sentence-text
(function).
sentence-tokens
(generic reader).
(setf sentence-tokens)
(generic writer).
sentence-valid?
(function).
simple-deprel
(function).
token
(class).
token-add-feature
(function).
token-cfrom
(generic reader).
(setf token-cfrom)
(generic writer).
token-children
(function).
token-cto
(generic reader).
(setf token-cto)
(generic writer).
token-deprel
(generic reader).
(setf token-deprel)
(generic writer).
token-deps
(generic reader).
(setf token-deps)
(generic writer).
token-feats
(generic reader).
(setf token-feats)
(generic writer).
token-form
(generic reader).
(setf token-form)
(generic writer).
token-hash-features
(function).
token-head
(generic reader).
(setf token-head)
(generic writer).
token-id
(generic reader).
(setf token-id)
(generic writer).
token-lemma
(generic reader).
(setf token-lemma)
(generic writer).
token-misc
(generic reader).
(setf token-misc)
(generic writer).
token-parent
(function).
token-rem-feature
(function).
token-sentence
(generic reader).
(setf token-sentence)
(generic writer).
token-upostag
(generic reader).
(setf token-upostag)
(generic writer).
token-xpostag
(generic reader).
(setf token-xpostag)
(generic writer).
write-conllu
(function).
write-conllu-to-stream
(function).
*deprels*
(special variable).
*token-fields*
(special variable).
abstract-token
(class).
adjust-conllu
(function).
and%
(function).
blank-line?
(function).
children
(function).
collect-meta
(function).
draw-conllu
(function).
etoken
(class).
etoken-deps
(generic reader).
(setf etoken-deps)
(generic writer).
etoken-feats
(generic reader).
(setf etoken-feats)
(generic writer).
etoken-index
(generic reader).
(setf etoken-index)
(generic writer).
etoken-lemma
(generic reader).
(setf etoken-lemma)
(generic writer).
etoken-prev
(generic reader).
(setf etoken-prev)
(generic writer).
etoken-upostag
(generic reader).
(setf etoken-upostag)
(generic writer).
etoken-xpostag
(generic reader).
(setf etoken-xpostag)
(generic writer).
eval-query
(function).
find-min
(function).
get-projection
(function).
hash-to-features
(function).
insert-at
(function).
is-descendant?
(function).
is-sentence-projective
(function).
is-token-projective
(function).
line->token
(function).
malformed-field
(condition).
malformed-line
(condition).
mappend
(function).
modify-conllu
(function).
mtoken-equal
(function).
non-projective-punct
(function).
or%
(function).
parse-field
(function).
print-diff
(function).
range
(function).
r~
(function).
select-sentence
(function).
sentence-by-id
(function).
sentence-etokens
(generic reader).
(setf sentence-etokens)
(generic writer).
sentence-get-token-by-id
(function).
sentence-matrix
(function).
set-head
(function).
token-attached
(function).
token-equal
(function).
token-lineno
(generic reader).
(setf token-lineno)
(generic writer).
token-misc-value
(function).
t~
(function).
write-selected-sentence
(function).
write-sentence
(function).
write-token
(generic function).
conllu.editor
ruled based transformations
cl-conllu
.
common-lisp
.
conlluedit
(function).
action
(function).
actions
(function).
add-or-subt
(function).
apply-act
(function).
apply-rule
(function).
apply-rules
(function).
assocs
(function).
clear
(function).
def-match
(function).
definitions
(function).
defs-tests
(function).
error-test
(macro).
examine-acts
(function).
get-field-value
(function).
index
(generic reader).
join-matchs
(function).
malformed-rule
(condition).
match-test
(function).
merge-matchs
(function).
merge-sets
(function).
multiple-merges
(function).
node-matchs
(function).
normalize-shortcut
(function).
rel-match
(function).
relation
(function).
relation-test
(function).
relations
(function).
result-act
(function).
result-acts
(function).
result-set
(function).
result-sets
(function).
sets
(function).
test-feats&misc
(function).
token-matchs
(function).
conllu.rules
cl-conllu
.
common-lisp
.
apply-rules
(function).
apply-rules-from-files
(function).
add-value
(function).
apply-changes
(function).
apply-conditions-in-token
(function).
apply-rhs
(function).
apply-rule-in-sentence
(function).
apply-rule-in-tokens
(function).
apply-rules-in-sentence
(function).
apply-rules-in-sentence-aux
(function).
equal-op
(function).
intern-pattern
(function).
intern-rule
(function).
intern-sides
(function).
match-token
(function).
match?
(function).
member-op
(function).
modify-value
(function).
read-rules
(function).
regex-op
(function).
rhs
(function).
rhs-vars
(function).
rls
(function).
rls-vars
(function).
valid-vars
(function).
variable-p
(function).
conllu.prolog
alexandria
.
cl-conllu
.
common-lisp
.
split-sequence
.
convert-filename
(function).
*clauses*
(special variable).
*dependencies*
(special variable).
clean-dep-rel
(function).
clean-whitespace
(function).
emit-prolog
(function).
is-root
(function).
make-id
(function).
process-features
(function).
process-tokens
(function).
prolog-string
(function).
toprologid
(function).
valid-line
(function).
write-prolog
(function).
conllu.evaluate
Functions for evaluating datasets and parser outputs in the CoNLL-U format.
cl-conllu
.
common-lisp
.
attachment-score-by-sentence
(function).
attachment-score-by-word
(function).
confusion-matrix
(class).
confusion-matrix-cell-count
(function).
confusion-matrix-cell-tokens
(function).
confusion-matrix-cells-labels
(function).
confusion-matrix-columns-labels
(function).
confusion-matrix-corpus-id
(generic reader).
(setf confusion-matrix-corpus-id)
(generic writer).
confusion-matrix-labels
(function).
confusion-matrix-normalize
(function).
confusion-matrix-rows-labels
(function).
confusion-matrix-update
(function).
exact-match
(function).
exact-match-score
(function).
make-confusion-matrix
(function).
non-projectivity-accuracy
(function).
non-projectivity-precision
(function).
non-projectivity-recall
(function).
precision
(function).
recall
(function).
*deprel-value-list*
(special variable).
*token-fields*
(special variable).
confusion-matrix-copy
(function).
confusion-matrix-key-fn
(generic reader).
(setf confusion-matrix-key-fn)
(generic writer).
confusion-matrix-rows
(generic reader).
(setf confusion-matrix-rows)
(generic writer).
confusion-matrix-sort-fn
(generic reader).
(setf confusion-matrix-sort-fn)
(generic writer).
confusion-matrix-test-fn
(generic reader).
(setf confusion-matrix-test-fn)
(generic writer).
confusion-matrix-update-sentences
(function).
confusion-matrix-update-tokens
(function).
create-cell
(function).
exact-match-sentence
(function).
existing-cell-p
(function).
insert-entry-confusion-matrix
(function).
sentence-average-score
(function).
sentence-diff
(function).
token-diff
(function).
word-average-score
(function).
conllu.html
Functions for producing html formatting of objects in the library.
cl-conllu
.
common-lisp
.
*confusion-matrix-style*
(special variable).
format-html
(generic function).
write-columns-headers
(function).
write-rows
(function).
conllu.rdf
alexandria
.
cl-conllu
.
common-lisp
.
split-sequence
.
wilbur
.
components->string
(function).
convert-features-to-rdf
(function).
convert-sentence-metadata
(function).
convert-sentence-to-rdf
(function).
convert-sentence-to-turtle
(function).
convert-token-to-rdf
(function).
escape-string
(function).
escape-turtle-char
(function).
make-dep
(function).
make-featurename
(function).
make-features
(function).
make-features-bnode
(function).
make-literal
(function).
make-metadata
(function).
make-metadata-bnode
(function).
make-token-id
(function).
make-upos
(function).
unspecified-field?
(function).
Definitions are sorted by export status, category, package, and then by lexicographic order.
HTML for styling the confusion matrix.
Receives a sentence and reenumerate IDs and HEAD values of each token so that their order (as in sentence-tokens) is respected.
Attachment score by sentence (macro-average).
The attachment score is the percentage of words that have correct
arcs to their heads. The unlabeled attachment score (UAS) considers
only who is the head of the token, while the labeled attachment
score (LAS) considers both the head and the arc label (dependency
label / syntactic class).
In order to choose between labeled or unlabeled,
set the key argument LABELED.
References:
- Dependency Parsing. Kubler, Mcdonald and Nivre (pp.79-80)
Attachment score by word (micro-average).
The attachment score is the percentage of words that have correct
arcs to their heads. The unlabeled attachment score (UAS) considers
only who is the head of the token, while the labeled attachment
score (LAS) considers both the head and the arc label (dependency
label / syntactic class).
In order to choose between labeled or unlabeled,
set the key argument LABELED.
References:
- Dependency Parsing. Kubler, Mcdonald and Nivre (pp.79-80)
Returns the number of tokens that are contained in the cell defined
by LABEL1 LABEL2 in the confusion matrix CM.
If DEFAULT-IF-UNDEFINED, returns 0. Otherwise, raises an error in case there is no such cell.
Returns the list of (SENT-ID . TOKEN-ID) of tokens in the cell
LABEL1 LABEL2.
If DEFAULT-IF-UNDEFINED, returns the empty list. Otherwise, raises an error in case there is no such cell.
Returns a list of ’(LABEL1 LABEL2) for each cell in the confusion matrix CM.
Returns the list of all labels in the confusion matrix CM.
Returns a new CONFUSION-MATRIX with new empty cells for each pair (LABEL1 LABEL2) of labels in (confusion-matrix-labels CM) that are undefined in CM.
Returns the list of labels occuring in the rows of the confusion matix CM.
Updates an existing confusion matrix by a list of sentences LIST-SENT1 and LIST-SENT2.
Converts the collection of sentences (as generated by READ-CONLLU) in CONLL, using the function TEXT-FN to extract the text of each sentence and ID-FN to extract the id of each sentence (we need this as there is no standardized way of knowing this.) Also the generated Turtle file contains a lot of duplication so when you import it into your triple-store, make sure you remove all duplicate triples afterwards.
Converts a list of sentences (e.g. as generated by READ-CONLLU)
in SENTENCES, using the function TEXT-FN to extract the text of each
sentence and ID-FN to extract the id of each sentence (we need this
as there is no standardized way of knowing this.)
Currently only ntriples is supported as RDF-FORMAT.
Returns the list of sentences of LIST-SENT1 that are an exact
match to the corresponding sentence of LIST-SENT2 (same position in
list).
LIST-SENT1 and LIST-SENT2 must have the same size with corresponding sentences in order.
Returns the percentage of sentences of LIST-SENT1 that are an exact
match to LIST-SENT2.
LIST-SENT1 and LIST-SENT2 must have the same size with corresponding
sentences in order.
The typical use case is comparing the result of a tagger (or
parser) against a test set, where an exact match is a completely
correct tagging (or parse) for the sentence.
References:
- Dependency Parsing. Kubler, Mcdonald and Nivre (p.79)
Return a function that returns one CoNLL-U sentence per call.
Creates a new confusion matrix from the lists of sentences LIST-SENT1 and LIST-SENT2.
Restricted to words which are classified as of syntactical class
(dependency type to head) DEPREL, returns the precision:
the number of true positives divided by the number of words
predicted positive (that is, predicted as of class DEPREL).
We assume that LIST-SENT1 is the classified (predicted) result
and LIST-SENT2 is the list of golden (correct) sentences.
ERROR-TYPE defines what is considered an error (a false negative).
Some usual values are:
- ’(deprel) :: for the deprel tagging task only
- ’(head) :: for considering errors for each syntactic class
- ’(deprel head) :: for considering correct only when both deprel
and head are correct.
Writes as sentence object input from STREAM as
FORM.SEPARATOR.TAGVALUE (without dots), followed by a whitespace
character.
Example:
;; Consider the file example.txt, with contents:
;; Pudim_NOUN é_VERB bom_ADJ ._PUNCT
;; E_CONJ torta_NOUN também_ADV ._PUNCT
(with-open-file (s "./example.txt")
(write-conllu-to-stream (read-sentence-tag-suffix s ’upostag "_")))
1 Pudim _ NOUN _ _ _ _ _ _
2 é _ VERB _ _ _ _ _ _
3 bom _ ADJ _ _ _ _ _ _
4 . _ PUNCT _ _ _ _ _ _
1 E _ CONJ _ _ _ _ _ _
2 torta _ NOUN _ _ _ _ _ _
3 também _ ADV _ _ _ _ _ _
4 . _ PUNCT _ _ _ _ _ _
Restricted to words which are originally of syntactic class
(dependency type to head) DEPREL, returns the recall:
the number of true positives divided by the number of words
originally positive (that is, originally of class DEPREL).
We assume that LIST-SENT1 is the classified result
and LIST-SENT2 is the list of golden (correct) sentences.
ERROR-TYPE defines what is considered an error (a false negative).
Some usual values are:
- ’(deprel) :: for the deprel tagging task only
- ’(head) :: for considering errors for each syntactic class
- ’(deprel head) :: for considering correct only when both deprel
and head are correct.
Receives SENTENCE, a sentence object, and returns a string
reconstructed from its tokens and mtokens.
If IGNORE-MTOKENS, then tokens’ forms are used. Else, tokens with
id contained in a mtoken are not used, with mtoken’s form being
used instead.
It is possible to special format some tokens. In order to do so, both SPECIAL-FORMAT-TEST and SPECIAL-FORMAT-FUNCTION should be passed. Then for each object (token or mtoken) for which SPECIAL-FORMAT-TEST returns a non-nil result, its form is modified by SPECIAL-FORMAT-FUNCTION in the final string.
Based on the idea from [1], it produces a tree view of the
sentence, still need to improve the priorities of children.
Code at https://github.com/sivareddyg/UDepLambda in file src/deplambda/parser/TreeTransformer.java method ’binarizeTree’
[1] Siva Reddy, O. Tackstrom, M. Collins, T. Kwiatkowski, D. Das, M. Steedman, and M. Lapataw, Transforming Dependency Structures to Logical Forms for Semantic Parsing, Transactions of the Association for Computational Linguistics, pp. 127–140, Apr. 2016.
Tests if, for each slot, sent-1 has the same values as sent-2. For tokens and multiword tokens, it uses token-equal and mtoken-equal, respectively.
Writes sentence as CoNLL-U file in STREAM as FORM.SEPARATOR.TAGVALUE (without
dots), followed by a whitespace character.
If TAG is NIL, then writes only FORMs, followed by a whitespace character.
Example:
;; supposing sentence already defined
(write-sentence-tag-suffix-to-stream (sentence :tag ’xpostag :separator "_"))
Pierre_NNP Vinken_NNP ,_, 61_CD years_NNS old_JJ ,_, will_MD join_VB the_DT board_NN as_IN
a_DT nonexecutive_JJ director_NN Nov._NNP 29_CD ._.
=> NIL
See documentation for write-sentence-tag-suffix-to-stream
See documentation for write-sentence-tag-suffix-to-stream
confusion-matrix
)) ¶confusion-matrix
)) ¶Identifier of the corpus or experiment.
Outputs a HTML string for the object.
confusion-matrix
)) ¶abstract-token
)) ¶automatically generated reader method
abstract-token
)) ¶automatically generated writer method
abstract-token
)) ¶automatically generated reader method
abstract-token
)) ¶automatically generated writer method
abstract-token
)) ¶automatically generated reader method
abstract-token
)) ¶automatically generated writer method
abstract-token
)) ¶automatically generated reader method
abstract-token
)) ¶automatically generated writer method
abstract-token
)) ¶automatically generated reader method
abstract-token
)) ¶automatically generated writer method
sentence
) &key tokens &allow-other-keys) ¶Sets the TOKEN-SENTENCE slot for each token attributed to the initialize sentence OBJ.
confusion-matrix
) &key &allow-other-keys) ¶confusion-matrix
) out) ¶(setf confusion-matrix-corpus-id)
.
confusion-matrix-corpus-id
.
(setf confusion-matrix-key-fn)
.
confusion-matrix-key-fn
.
(setf confusion-matrix-rows)
.
confusion-matrix-rows
.
(setf confusion-matrix-sort-fn)
.
confusion-matrix-sort-fn
.
(setf confusion-matrix-test-fn)
.
confusion-matrix-test-fn
.
format-html
.
initialize-instance
.
print-object
.
Identifier of the corpus or experiment.
:corpus-id
Function used to label each token.
(function cl-conllu:token-upostag)
:key-fn
Function which compares two labels. Typically a form of equality.
(function equal)
:test-fn
Function which sorts labels. By default,
converts labels to string and uses lexicographical order
(function (lambda (conllu.evaluate::x conllu.evaluate::y) (string<= (format nil "~a" conllu.evaluate::x) (format nil "~a" conllu.evaluate::y))))
:sort-fn
Parameter which contains the contents of the confusion matrix.
:id
:lemma
"_"
:upostag
"_"
:xpostag
"_"
:feats
"_"
:head
"_"
:deprel
"_"
:deps
List of the 37 universal syntactic relations in UD.
Returns a new CONFUSION-MATRIX with the same cells values as CM.
Updates an existing confusion matrix by a pair of matching sentences SENT1 and SENT2. That is, SENT1 and SENT2 should be alternative analyses of the same natural language sentence.
Input: string
Output: list of feature nodes
Returns a list of nodes to be used as objects in triples with the predicate "conll:feats".
Examples:
(let ((wilbur:*nodes* (make-instance ’wilbur:dictionary))
(wilbur:*db* (make-instance ’wilbur:db)))
(wilbur:add-namespace "olia-sys" "http://purl.org/olia/system.owl#")
(wilbur:add-namespace "conll" "http://br.ibm.com/conll/LEMMA")
(convert-features-to-rdf "Mood=Ind|Tense=Past|VerbForm=Fin" (node "sentence1")))
=>
(!"http://br.ibm.com/conll/LEMMA#verbFormFin"
!"http://br.ibm.com/conll/LEMMA#tensePast"
!"http://br.ibm.com/conll/LEMMA#moodInd")
Input: list of pairs (name value), node
Output: List of triples
Example:
(let ((wilbur:*db* (make-instance ’wilbur:db))
(metadata ’(("sent_id" . "test")
("text" . "The US troops fired into the hostile crowd, killing 4.")))
(sentence-node (node "sentence")))
(convert-sentence-metadata metadata sentence-node))
=>
((#<WILBUR:TRIPLE !"sentence" !conll:metadata/sent_id #"test" {10048AA2E3}>)
(#<WILBUR:TRIPLE !"sentence" !conll:metadata/text #"The US troops fired into the hostile crowd, killing 4." {10048AA753}>))
Creates the cell for row LABEL1 and column LABEL2 in confusion matrix CM.
Compares if two sentences are an exact match.
The typical use case is comparing the result of a tagger (or
parser) against a test sentence.
Returns SENT1 if SENT1 and SENT2 agree for all tokens with respect to
the COMPARED-FIELDS. Otherwise returns NIL.
If they do not have the same number of tokens or if the tokens do not agree on each IDENTITY-FIELD, then the sentences are not ’the same’ and thus an error is returned.
Predicate for verifying whether the cell for row LABEL1 and column LABEL2 already exist in the confusion matrix CM.
Inserts TOKEN as an occurence in the cell LABEL1 LABEL2 of the confusion matrix CM.
Receives a function and a list of lists and returns the appended result of the aplication of the function to each list.
The original file contains a set of sentences. The modified file contains some sentences from original modified, this function replaces in original the sentences presented in modified file, matching them using the sentence ids. If the modified file contains sentence not in original, the flag ’add-new’ , if true, says that these sentence must be added in the end of the original file.
Tests if, for each slot, mtoken-1 has the same values as mtoken-2.
Score by sentence (macro-average).
This is the mean of the percentage of words in each sentence
that are correct with respect to the fields in FIELDS.
We assume that LIST-SENT1 is the classified result
and LIST-SENT2 is the list of golden (correct) sentences.
If IGNORE-PUNCT, tokens which have ’PUNCT’ as upostag
in the golden sentences are ignored.
Returns a list of differences in SENT1 and SENT2.
They must have the same size.
If IGNORE-PUNCT, tokens which have ’PUNCT’ as upostag in sent2 are ignored.
Assume token’s MISC is a list of key-value pairs, return value corresponding to key MISC-KEY.
Score by word (micro-average).
This is the total mean of words in all sentences that
are correct with respect to the fields in FIELDS.
We assume that LIST-SENT1 is the classified result and LIST-SENT2 is the list of golden (correct) sentences.
If IGNORE-PUNCT, tokens which have ’PUNCT’ as upostag in the golden sentences are ignored.
Auxiliary function for format-html for confusion-matrix.
Auxiliary function for format-html for the confusion-matrix CM.
Returns a sentence whose ’sent_id’ from the list of sentences in ’filename’.
confusion-matrix
)) ¶confusion-matrix
)) ¶Function used to label each token.
confusion-matrix
)) ¶confusion-matrix
)) ¶Parameter which contains the contents of the confusion matrix.
rows
.
confusion-matrix
)) ¶confusion-matrix
)) ¶Function which sorts labels. By default,
converts labels to string and uses lexicographical order
confusion-matrix
)) ¶confusion-matrix
)) ¶Function which compares two labels. Typically a form of equality.
malformed-rule
)) ¶abstract-token
)) ¶automatically generated reader method
abstract-token
)) ¶automatically generated writer method
write a token to a line in 10 tab-separated columns.
error
.
:lineno
:form
-1
:cfrom
-1
:cto
"_"
:misc
:prev
:index
:lemma
"_"
:upostag
"_"
:xpostag
"_"
:feats
"_"
:deps
Jump to: | (
A B C D E F G H I J L M N O P Q R S T U V W |
---|
Jump to: | (
A B C D E F G H I J L M N O P Q R S T U V W |
---|
Jump to: | *
C D E F H I K L M P R S T U V X |
---|
Jump to: | *
C D E F H I K L M P R S T U V X |
---|
Jump to: | A C D E F H M P Q R S T U |
---|
Jump to: | A C D E F H M P Q R S T U |
---|