The edit-distance Reference Manual

Next: , Previous: , Up: (dir)   [Contents][Index]

The edit-distance Reference Manual

This is the edit-distance Reference Manual, version 1.0.0, generated automatically by Declt version 4.0 beta 2 "William Riker" on Thu Sep 15 03:43:30 2022 GMT+0.

Table of Contents


1 Introduction

edit-distance

Build Status Coverage Status

Using

Computes the edit distance (aka Levenshtein distance) between two sequences. This is a common distance measure.

For example, to compute the distance between two sequences:

(edit-distance:distance '(1 2 3 4 5) '(2 3 4 5 6))

To compute the sequence of edit operations needed to convert sequence 1 into sequence two, use the diff function

(edit-distance:diff '(1 2 3 4 5) '(2 3 4 5 6))

That will return two values a structure, as follows, and the distance.

((:DELETION 1 NIL) (:MATCH 2 2) (:MATCH 3 3) (:MATCH 4 4) (:MATCH 5 5) (:INSERTION NIL 6))
2

That structure can be printed more readibly with the FORMAT-DIFF function

(edit-distance:format-diff *path*)

Or, you can compute the diff and print it readably together by calling PRINT-DIFF:

(edit-distance:print-diff '(1 2 3 4 5) '(2 3 4 5 6))

which will print a result like this:

seq1: 1   2 3 4 5 *** []
seq2: *** 2 3 4 5 6   []

Several options are available to the FORMAT-DIFF and PRINT-DIFF to print a prefix and suffix for each line. Note that displaying substitutions relys on captialization and so substitutions are not visible for non-alphabetic sequence elements.

Additionally, other equality functions can be used, so this evaluates to a distance of zero:

 (edit-distance:distance '("ONE" "TWO" "THREE") '("one" "two"
 "three") :test 'string-equal)

Any type of sequence may be used, but for speed the input sequences are copied into new arrays. If speed is a major concern make sure to provide simple vectors as your input sequences.

Testing

To test with sbcl, run:

sbcl --eval "(asdf:load-system 'edit-distance-test)" --eval "(unless (lisp-unit:run-tests :all :edit-distance-tests) (uiop:quit 1))"

cl-edit-distance by Ben Lambert is licensed under a Creative Commons Attribution 4.0 International License.


2 Systems

The main system appears first, followed by any subsystem dependency.


Previous: , Up: Systems   [Contents][Index]

2.1 edit-distance

Compute edit distance between sequences.

Author

Ben Lambert <belambert@mac.com>

License

CC-BY-4.0

Version

1.0.0

Source

edit-distance.asd.

Child Component

src (module).


3 Modules

Modules are listed depth-first from the system components tree.


Previous: , Up: Modules   [Contents][Index]

3.1 edit-distance/src

Source

edit-distance.asd.

Parent Component

edit-distance (system).

Child Components

4 Files

Files are sorted by type and then listed depth-first from the systems components trees.


Previous: , Up: Files   [Contents][Index]

4.1 Lisp


Next: , Previous: , Up: Lisp   [Contents][Index]

4.1.1 edit-distance/edit-distance.asd

Source

edit-distance.asd.

Parent Component

edit-distance (system).

ASDF Systems

edit-distance.


4.1.2 edit-distance/src/package.lisp

Source

edit-distance.asd.

Parent Component

src (module).

Packages

edit-distance.


4.1.3 edit-distance/src/distance.lisp

Dependency

package.lisp (file).

Source

edit-distance.asd.

Parent Component

src (module).

Internals

4.1.4 edit-distance/src/interface.lisp

Dependency

distance.lisp (file).

Source

edit-distance.asd.

Parent Component

src (module).

Public Interface
Internals

4.1.5 edit-distance/src/print.lisp

Dependency

interface.lisp (file).

Source

edit-distance.asd.

Parent Component

src (module).

Internals

print-differences (function).


5 Packages

Packages are listed by definition order.


Previous: , Up: Packages   [Contents][Index]

5.1 edit-distance

Source

package.lisp.

Use List

common-lisp.

Public Interface
Internals

6 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


Next: , Previous: , Up: Definitions   [Contents][Index]

6.1 Public Interface


6.1.1 Ordinary functions

Function: diff (s1 s2 &key test)
Package

edit-distance.

Source

interface.lisp.

Function: distance (s1 s2 &key test)
Package

edit-distance.

Source

interface.lisp.

Function: format-diff (path &key file-stream prefix1 prefix2 suffix1 suffix2)

Print a human readable ’diff’ of the two sequences to the given stream (standard out by default). Optionally prefixes and suffixes of each line can be printed for easier identification and analysis.

Package

edit-distance.

Source

interface.lisp.

Function: insertions-and-deletions (seq1 seq2 &key test)
Package

edit-distance.

Source

interface.lisp.

Function: print-diff (seq1 seq2 &key file-stream test prefix1 prefix2 suffix1 suffix2)

Print a human readable ’diff’ of the two sequences to the given stream (standard out by default). Optionally prefixes and suffixes of each line can be printed for easier identification and analysis.

Package

edit-distance.

Source

interface.lisp.


6.2 Internals


Previous: , Up: Internals   [Contents][Index]

6.2.1 Ordinary functions

Function: compute-alignment (s1 s2 &key test)
Package

edit-distance.

Source

interface.lisp.

Function: compute-insertions-and-deletions (seq1 seq2 &key test)

Return a cons that points to two lists. The first is a list of the insertions; The second is a list of deletions.

Package

edit-distance.

Source

interface.lisp.

Function: edit-distance (s1 s2 &key test return-path)
Package

edit-distance.

Source

interface.lisp.

Function: get-path-from-bp-table (bp width height)

Given a back-pointer table, return a representation of the shortest path.

Package

edit-distance.

Source

distance.lisp.

Function: levenshtein-distance (s1 s2 &key test return-path)

Compute the Levenshtein distance between two sequences. If return path is t, returns the return path. O/w just returns the distance.

Package

edit-distance.

Source

distance.lisp.

Function: levenshtein-distance-fast (seq1 seq2 &key test)

Just like the previous version, but also return the number of *matches*.
This is pretty fast now... It looks like most of the time taken is in applying the test (e.g. string-equal) some non-negligible part of it is taken by the MAKE-ARRAY calls and GC’ing those...

Package

edit-distance.

Source

distance.lisp.

Function: print-differences (path &key file-stream prefix1 prefix2 suffix1 suffix2)

Given a ’path’ as produced by the above function LEVENSTEIN-DISTANCE function above, print the differences in the format of the ’align’ program.
The prefix and suffix args allow the caller to supply a string to print before, and after each.

Package

edit-distance.

Source

print.lisp.

Function: sequence-diff (seq1 seq2 &key file-stream test prefix1 prefix2 suffix1 suffix2)

Print a human readable ’diff’ of the two sequences to the given stream (standard out by default). Optionally prefixes and suffixes of each line can be printed for easier identification and analysis.

Package

edit-distance.

Source

interface.lisp.


Appendix A Indexes


Next: , Previous: , Up: Indexes   [Contents][Index]

A.1 Concepts


Next: , Previous: , Up: Indexes   [Contents][Index]

A.2 Functions

Jump to:   C   D   E   F   G   I   L   P   S  
Index Entry  Section

C
compute-alignment: Private ordinary functions
compute-insertions-and-deletions: Private ordinary functions

D
diff: Public ordinary functions
distance: Public ordinary functions

E
edit-distance: Private ordinary functions

F
format-diff: Public ordinary functions
Function, compute-alignment: Private ordinary functions
Function, compute-insertions-and-deletions: Private ordinary functions
Function, diff: Public ordinary functions
Function, distance: Public ordinary functions
Function, edit-distance: Private ordinary functions
Function, format-diff: Public ordinary functions
Function, get-path-from-bp-table: Private ordinary functions
Function, insertions-and-deletions: Public ordinary functions
Function, levenshtein-distance: Private ordinary functions
Function, levenshtein-distance-fast: Private ordinary functions
Function, print-diff: Public ordinary functions
Function, print-differences: Private ordinary functions
Function, sequence-diff: Private ordinary functions

G
get-path-from-bp-table: Private ordinary functions

I
insertions-and-deletions: Public ordinary functions

L
levenshtein-distance: Private ordinary functions
levenshtein-distance-fast: Private ordinary functions

P
print-diff: Public ordinary functions
print-differences: Private ordinary functions

S
sequence-diff: Private ordinary functions

Jump to:   C   D   E   F   G   I   L   P   S  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.3 Variables