The darts.lib.sequence-metrics Reference Manual

This is the darts.lib.sequence-metrics Reference Manual, version 0.1, generated automatically by Declt version 4.0 beta 2 "William Riker" on Mon Feb 26 16:11:57 2024 GMT+0.

Table of Contents


1 Introduction


2 Systems

The main system appears first, followed by any subsystem dependency.


2.1 darts.lib.sequence-metrics

Provides various distance metrics on sequences

Maintainer

Dirk Esser

Author

Dirk Esser

License

MIT

Long Description
Version

0.1

Source

darts.lib.sequence-metrics.asd.

Child Component

src (module).


3 Modules

Modules are listed depth-first from the system components tree.


3.1 darts.lib.sequence-metrics/src

Source

darts.lib.sequence-metrics.asd.

Parent Component

darts.lib.sequence-metrics (system).

Child Components

4 Files

Files are sorted by type and then listed depth-first from the systems components trees.


4.1 Lisp


4.1.1 darts.lib.sequence-metrics/darts.lib.sequence-metrics.asd

Source

darts.lib.sequence-metrics.asd.

Parent Component

darts.lib.sequence-metrics (system).

ASDF Systems

darts.lib.sequence-metrics.

Packages

darts.asdf.


4.1.2 darts.lib.sequence-metrics/src/package.lisp

Source

darts.lib.sequence-metrics.asd.

Parent Component

src (module).

Packages

darts.lib.sequence-metrics.


4.1.3 darts.lib.sequence-metrics/src/types.lisp

Dependency

package.lisp (file).

Source

darts.lib.sequence-metrics.asd.

Parent Component

src (module).

Internals

4.1.4 darts.lib.sequence-metrics/src/levenshtein.lisp

Dependency

types.lisp (file).

Source

darts.lib.sequence-metrics.asd.

Parent Component

src (module).

Public Interface

4.1.5 darts.lib.sequence-metrics/src/hamming.lisp

Dependency

types.lisp (file).

Source

darts.lib.sequence-metrics.asd.

Parent Component

src (module).

Public Interface

4.1.6 darts.lib.sequence-metrics/src/lcs.lisp

Dependency

types.lisp (file).

Source

darts.lib.sequence-metrics.asd.

Parent Component

src (module).

Public Interface

4.1.7 darts.lib.sequence-metrics/src/jaro-winkler.lisp

Dependency

types.lisp (file).

Source

darts.lib.sequence-metrics.asd.

Parent Component

src (module).

Public Interface
Internals

4.1.8 darts.lib.sequence-metrics/src/ngrams.lisp

Dependency

types.lisp (file).

Source

darts.lib.sequence-metrics.asd.

Parent Component

src (module).

Public Interface

5 Packages

Packages are listed by definition order.


5.1 darts.lib.sequence-metrics

This package exports various forms of metric functions
on sequences. Among the ones provided are:

- Levenshtein distance
- Jaro and Jaro/Winkler distance
- Hamming distance

Most distances are provided in a very general form, working an arbitrary sequences. However, since most of these distance functions are usually applied to strings, for some frequently used metrics, optimized string versions are provided.

This package also exports a few other utility functions, which strictly speaking don’t really belong here, such as the n-gram related stuff. They live in this package, because they used to do so since the dawn of time...

Source

package.lisp.

Use List

common-lisp.

Public Interface
Internals

5.2 darts.asdf

Source

darts.lib.sequence-metrics.asd.

Use List
  • asdf/interface.
  • common-lisp.

6 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


6.1 Public Interface


6.1.1 Macros

Macro: do-ngrams ((&rest vars) (list-form &key start-padding end-padding) &body body)
Package

darts.lib.sequence-metrics.

Source

ngrams.lisp.


6.1.2 Ordinary functions

Function: hamming-distance (seq1 seq2 &key start1 end1 start2 end2 test test-not key normalized)
Package

darts.lib.sequence-metrics.

Source

hamming.lisp.

Function: jaro-distance (str1 str2 &key start1 end1 start2 end2 test test-not key)

jaro-distance SEQ1 SEQ2 &key START1 END1 START2 END2 TEST TEST-NOT KEY => DISTANCE

Package

darts.lib.sequence-metrics.

Source

jaro-winkler.lisp.

Function: jaro-winkler-distance (str1 str2 &key start1 end1 start2 end2 test test-not key prefix-length adjustment-scale)
Package

darts.lib.sequence-metrics.

Source

jaro-winkler.lisp.

Function: levenshtein-distance (s1 s2 &key start1 end1 start2 end2 test test-not key normalized)

levenshtein-distance S1 S2 &key START1 END1 START2 END2 TEST TEST-NOT KEY => NUMBER

Computes the Levenshtein distance between sequences S1 and S2. The result
value DISTANCE is the minimum number of edit operations required to
transform S1 into S2 (or vice versa), where allowed operations are:

- insert a single character at some position
- delete a single character at some position
- substitute a single character at some position by another one

The Levenshtein distance is a measure of similarity between strings. If
two strings have a distance of 0, they are equal. The TEST function is
used to compare sequence elements. The default test function is eql.

This function is a generalization of string-levenshtein-distance
for arbitrary sequence types. Use string-levenshtein-distance, if you
need a version optimized for use with strings.

Package

darts.lib.sequence-metrics.

Source

levenshtein.lisp.

Function: list-ngrams (size list &key start-padding end-padding transform)
Package

darts.lib.sequence-metrics.

Source

ngrams.lisp.

Function: longest-common-subsequence*-length (seq1 seq2 &key start1 end1 start2 end2 test test-not key)

longest-common-subsequence*-length SEQ1 SEQ2 &key START1 END1 START2 END2 TEST TEST-NOT KEY => LENGTH

Returns the length of the (or: a) longest common contiguous subsequence
of sequences SEQ1 and SEQ2. This problem is usually called the longest
common ‘substring´ problem, but in order to avoid confusion, the we use the
term subsequence* here to refer to contiguous subsequences.

Package

darts.lib.sequence-metrics.

Source

lcs.lisp.

Function: longest-common-subsequence-length (seq1 seq2 &key start1 end1 start2 end2 test test-not key)
Package

darts.lib.sequence-metrics.

Source

lcs.lisp.

Function: longest-common-subsequences* (seq1 seq2 &key start1 end1 start2 end2 test test-not key)

longest-common-subsequences* SEQ1 SEQ2 &key START1 END1 START2 END2 TEST TEST-NOT KEY => LENGTH

Returns a list containing all contigous longest common subsequences
of SEQ1 and SEQ2. This problem is usually called the longest common
‘substring´ problem, but in order to avoid confusion, the we use the
term subsequence* here to refer to contiguous subsequences.

Package

darts.lib.sequence-metrics.

Source

lcs.lisp.

Function: longest-common-substring-length (seq1 seq2 &key start1 end1 start2 end2 case-sensitive)
Package

darts.lib.sequence-metrics.

Source

lcs.lisp.

Function: longest-common-substrings (seq1 seq2 &key start1 end1 start2 end2 case-sensitive)
Package

darts.lib.sequence-metrics.

Source

lcs.lisp.

Function: map-ngrams (function size list &key start-padding end-padding)

map-ngrams FUNCTION N LIST &key START-PADDING END-PADDING => UNDEFINED

Calls FUNCTION for each n-gram of size N constructed from the elements in the given LIST. Initially, the value of START-PADDING is used to fill the first N - 1 elements in the call. When the list is exhausted, the value of END-PADDING is used to pad to a size of N. The function must accept N arguments.

Package

darts.lib.sequence-metrics.

Source

ngrams.lisp.

Function: string-hamming-distance (seq1 seq2 &key start1 end1 start2 end2 case-sensitive normalized)
Package

darts.lib.sequence-metrics.

Source

hamming.lisp.

Function: string-jaro-distance (str1 str2 &key start1 end1 start2 end2 case-sensitive)

string-jaro-distance STR1 STR2 &key START1 END1 START2 END2 CASE-SENSITIVE => DISTANCE

Package

darts.lib.sequence-metrics.

Source

jaro-winkler.lisp.

Function: string-jaro-winkler-distance (str1 str2 &key start1 end1 start2 end2 case-sensitive prefix-length adjustment-scale)
Package

darts.lib.sequence-metrics.

Source

jaro-winkler.lisp.

Function: string-levenshtein-distance (s1 s2 &key start1 end1 start2 end2 case-sensitive normalized)

string-levenshtein-distance S1 S2 &key START1 END1 START2 END2 CASE-SENSITIVE => DISTANCE

Computes the Levenshtein distance between strings S1 and S2. The result value DISTANCE is the minimum number of edit operations required to transform S1 into S2 (or vice versa), where allowed operations are:

- insert a single character at some position
- delete a single character at some position
- substitute a single character at some position by another one

The Levenshtein distance is a measure of similarity between strings. If two strings have a distance of 0, they are equal.

If CASE-SENSITIVE is true (which is the default), then comparing of characters is done in a case sensitive way, distinguishing between lower and upper case letters. If CASE-SENSITIVE is false, then this function does not distinguish between lower case and upper case letters.

See function levenshtein-distance for a generalization of this function to arbitrary sequences.

Package

darts.lib.sequence-metrics.

Source

levenshtein.lisp.


6.2 Internals


6.2.1 Constants

Constant: jaro-winkler-min-prefix-length
Package

darts.lib.sequence-metrics.

Source

jaro-winkler.lisp.

Constant: jaro-winkler-prefix-adjustment-scale
Package

darts.lib.sequence-metrics.

Source

jaro-winkler.lisp.


6.2.2 Types

Type: array-index ()

Integer type, which is large enough to hold an index into some arbitrary array (in particular, into a string).

Package

darts.lib.sequence-metrics.

Source

types.lisp.

Type: sequence-function (result &rest additional-keys)

Type of sequence metric. This function type is the essential sequence metric function type as is provided by most functions exposed by this package, if they operate on two generic sequences.

Package

darts.lib.sequence-metrics.

Source

types.lisp.

Type: string-function (result &rest additional-keys)

Type of string metric. This function type is the essential sequence metric function type as is provided by most functions exposed by this package, if they operate on two actual strings.

Package

darts.lib.sequence-metrics.

Source

types.lisp.


Appendix A Indexes


A.1 Concepts


A.2 Functions

Jump to:   D   F   H   J   L   M   S  
Index Entry  Section

D
do-ngrams: Public macros

F
Function, hamming-distance: Public ordinary functions
Function, jaro-distance: Public ordinary functions
Function, jaro-winkler-distance: Public ordinary functions
Function, levenshtein-distance: Public ordinary functions
Function, list-ngrams: Public ordinary functions
Function, longest-common-subsequence*-length: Public ordinary functions
Function, longest-common-subsequence-length: Public ordinary functions
Function, longest-common-subsequences*: Public ordinary functions
Function, longest-common-substring-length: Public ordinary functions
Function, longest-common-substrings: Public ordinary functions
Function, map-ngrams: Public ordinary functions
Function, string-hamming-distance: Public ordinary functions
Function, string-jaro-distance: Public ordinary functions
Function, string-jaro-winkler-distance: Public ordinary functions
Function, string-levenshtein-distance: Public ordinary functions

H
hamming-distance: Public ordinary functions

J
jaro-distance: Public ordinary functions
jaro-winkler-distance: Public ordinary functions

L
levenshtein-distance: Public ordinary functions
list-ngrams: Public ordinary functions
longest-common-subsequence*-length: Public ordinary functions
longest-common-subsequence-length: Public ordinary functions
longest-common-subsequences*: Public ordinary functions
longest-common-substring-length: Public ordinary functions
longest-common-substrings: Public ordinary functions

M
Macro, do-ngrams: Public macros
map-ngrams: Public ordinary functions

S
string-hamming-distance: Public ordinary functions
string-jaro-distance: Public ordinary functions
string-jaro-winkler-distance: Public ordinary functions
string-levenshtein-distance: Public ordinary functions


A.4 Data types

Jump to:   A   D   F   H   J   L   M   N   P   S   T  
Index Entry  Section

A
array-index: Private types

D
darts.asdf: The darts․asdf package
darts.lib.sequence-metrics: The darts․lib․sequence-metrics system
darts.lib.sequence-metrics: The darts․lib․sequence-metrics package
darts.lib.sequence-metrics.asd: The darts․lib․sequence-metrics/darts․lib․sequence-metrics․asd file

F
File, darts.lib.sequence-metrics.asd: The darts․lib․sequence-metrics/darts․lib․sequence-metrics․asd file
File, hamming.lisp: The darts․lib․sequence-metrics/src/hamming․lisp file
File, jaro-winkler.lisp: The darts․lib․sequence-metrics/src/jaro-winkler․lisp file
File, lcs.lisp: The darts․lib․sequence-metrics/src/lcs․lisp file
File, levenshtein.lisp: The darts․lib․sequence-metrics/src/levenshtein․lisp file
File, ngrams.lisp: The darts․lib․sequence-metrics/src/ngrams․lisp file
File, package.lisp: The darts․lib․sequence-metrics/src/package․lisp file
File, types.lisp: The darts․lib․sequence-metrics/src/types․lisp file

H
hamming.lisp: The darts․lib․sequence-metrics/src/hamming․lisp file

J
jaro-winkler.lisp: The darts․lib․sequence-metrics/src/jaro-winkler․lisp file

L
lcs.lisp: The darts․lib․sequence-metrics/src/lcs․lisp file
levenshtein.lisp: The darts․lib․sequence-metrics/src/levenshtein․lisp file

M
Module, src: The darts․lib․sequence-metrics/src module

N
ngrams.lisp: The darts․lib․sequence-metrics/src/ngrams․lisp file

P
Package, darts.asdf: The darts․asdf package
Package, darts.lib.sequence-metrics: The darts․lib․sequence-metrics package
package.lisp: The darts․lib․sequence-metrics/src/package․lisp file

S
sequence-function: Private types
src: The darts․lib․sequence-metrics/src module
string-function: Private types
System, darts.lib.sequence-metrics: The darts․lib․sequence-metrics system

T
Type, array-index: Private types
Type, sequence-function: Private types
Type, string-function: Private types
types.lisp: The darts․lib․sequence-metrics/src/types․lisp file