The bk-tree Reference Manual

Table of Contents

Next: , Previous: , Up: (dir)   [Contents][Index]

The bk-tree Reference Manual

This is the bk-tree Reference Manual, generated automatically by Declt version 2.3 "Robert April" on Tue Jan 09 13:13:23 2018 GMT+0.


Next: , Previous: , Up: Top   [Contents][Index]

1 Introduction

     ___           ___
    /  /\         /  /\
   /  /::\       /  /:/
  /  /:/\:\     /  /:/
 /  /::\ \:\   /  /::\____
/__/:/\:\_\:| /__/:/\:::::\
\  \:\ \:\/:/ \__\/~|:|~~~~              ___           ___           ___
 \  \:\_\::/     |  |:|    ___          /  /\         /  /\         /  /\
  \  \:\/:/      |  |:|   /__/\        /  /::\       /  /::\       /  /::\
   \__\::/       |__|:|   \  \:\      /  /:/\:\     /  /:/\:\     /  /:/\:\
       ~~         \__\|    \__\:\    /  /::\ \:\   /  /::\ \:\   /  /::\ \:\
                           /  /::\  /__/:/\:\_\:\ /__/:/\:\ \:\ /__/:/\:\ \:\
                          /  /:/\:\ \__\/~|::\/:/ \  \:\ \:\_\/ \  \:\ \:\_\/
                         /  /:/__\/    |  |:|::/   \  \:\ \:\    \  \:\ \:\
                        /__/:/         |  |:|\/     \  \:\_\/     \  \:\_\/
                        \__\/          |__|:|~       \  \:\        \  \:\
                                        \__\|         \__\/         \__\/

About

This program implements a derivative of BK-Tree data structure described in "Some Approaches to Best-Match File Searching" paper of W. A. Burkhard and R. M. Keller. For more information about the paper, see

@article{362025,
 author = {W. A. Burkhard and R. M. Keller},
 title = {Some approaches to best-match file searching},
 journal = {Commun. ACM},
 volume = {16},
 number = {4},
 year = {1973},
 issn = {0001-0782},
 pages = {230--236},
 doi = {http://doi.acm.org/10.1145/362003.362025},
 publisher = {ACM},
 address = {New York, NY, USA},
}

In the implementation, I have used below structure to store values in the nodes:

struct node {
  distance: Metric distance between current node and its parent.
  value   : Value stored in current node.
  nodes   : Nodes collected under this node.
}

See below figure for an example.

Example BK-Tree

During every search phase, instead of walking through nodes via

j = {j, j+1, j-1, j+2, j-2, ...}
  = {0, (-1)^i+1 * ceil(i/2)}, i = 1, 2, 3, ...

as described in the original paper, program sorts nodes according their relative distance to value being searched:

distance = d(searched-value, current-node-value)
sort(nodes, lambda(node) { abs(distance - distance-of(node)) }

There is no restriction on the type of the value which will be stored in the tree, as long as you supply appropriate metric function.

Performance

Here is the results of a detailed test performed using BK-TREE package.

In every test, 100 random words are searched in the randomly created word databases. Words stored in the database are varying from 5 characters upto 10 characters.

| DB Size (words) | Threshold (distance) | Scanned Node % | Found Node % | | ---------------:| --------------------:| --------------:| ------------:| | 10,000 | 1 | 0.110 | 0.0100 | | | 2 | 0.110 | 0.0100 | | | 3 | 0.110 | 0.0100 | | | 4 | 0.160 | 0.0300 | | | 5 | 0.370 | 0.1100 | | | 6 | 7.600 | 6.5800 | | | 7 | 24.460 | 23.4300 | | | 8 | 51.360 | 49.0900 | | 50,000 | 1 | 0.0022 | 0.0020 | | | 2 | 0.0023 | 0.0021 | | | 3 | 0.0251 | 0.0030 | | | 4 | 0.0408 | 0.0127 | | | 5 | 0.3943 | 0.3196 | | | 6 | 2.5430 | 2.3869 | | | 7 | 7.6876 | 7.3874 | | | 8 | 23.6635 | 22.9339 | | 100,000 | 1 | 0.0012 | 0.0010 | | | 2 | 0.0012 | 0.0011 | | | 3 | 0.0013 | 0.0017 | | | 4 | 0.0231 | 0.0085 | | | 5 | 0.3383 | 0.2998 | | | 6 | 1.7957 | 1.7079 | | | 7 | 6.3571 | 6.1654 | | | 8 | 18.5599 | 18.0996 | | 500,000 | 1 | 0.0027 | 0.0002 | | | 2 | 0.0029 | 0.0002 | | | 3 | 0.0039 | 0.0011 | | | 4 | 0.0012 | 0.0081 | | | 5 | 0.3444 | 0.3213 | | | 6 | 0.4244 | 0.4201 | | | 7 | 13.4834 | 13.3619 | | | 8 | 30.3728 | 30.1665 |

How this table should be interpreted? The lower the difference between the third and fourth columns, the less redundant node visit performed. And the stability of this difference (which means no fluctuations in the difference) indicates the stability of the convergence.

Here is the graph of above results.

Results

Example

Here is an example about how to used supplied interface.

(defpackage :bk-tree-test (:use :cl :bk-tree))

(in-package :bk-tree-test)

(defvar *words* nil)

(defvar *tree* (make-instance 'bk-tree))

;; Build *WORDS* list.
(with-open-file (in "/home/vy/lisp/english-words.txt")
  (loop for line = (read-line in nil nil)
        while line
        do (push
            (string-trim '(#\space #\tab #\cr #\lf) line)
            *words*)))

;; Check *WORDS*.
(if (endp *words*)
    (error "*WORDS* is empty!"))

;; Fill the *TREE*.
(mapc
 (lambda (word)
   (handler-case (insert-value word *tree*)
     (duplicate-value (ctx)
       (format t "Duplicated: ~a~%" (value-of ctx)))))
 *words*)

;; Let's see that green tree.
(print-tree *tree*)

;; Test BK-Tree.
(time
 (mapc
  (lambda (result)
    (format t "~a ~a~%" (distance-of result) (value-of result)))
  (search-value "kernel" *tree* :threshold 2)))

;; Test brute levenshtein.
(time
 (loop with target-word = "kernel"
       with results = (sort
                       (mapcar
                        (lambda (word)
                          (cons (levenshtein target-word word) word))
                        *words*)
                       #'<
                       :key #'car)
       repeat 50      
       for (distance . value) in results
       while (<= distance 2)
       do (format t "~a ~a~%" distance value)))

Caveats

For performance reasons, LEVENSHTEIN function coming with the package has some limitations both on the input string and penalty costs.

(deftype levenshtein-cost ()
  "Available penalty costs."
  '(integer 0 7))

(deftype levenshtein-input-length ()
  "Maximum distance a comparison can result."
  `(integer 0 ,(- most-positive-fixnum 7)))

Just in case, configure these variables suitable to your needs.


Next: , Previous: , Up: Top   [Contents][Index]

2 Systems

The main system appears first, followed by any subsystem dependency.


Previous: , Up: Systems   [Contents][Index]

2.1 bk-tree

Source

bk-tree.asd (file)

Components

Next: , Previous: , Up: Top   [Contents][Index]

3 Files

Files are sorted by type and then listed depth-first from the systems components trees.


Previous: , Up: Files   [Contents][Index]

3.1 Lisp


Next: , Previous: , Up: Lisp files   [Contents][Index]

3.1.1 bk-tree.asd

Location

bk-tree.asd

Systems

bk-tree (system)


Next: , Previous: , Up: Lisp files   [Contents][Index]

3.1.2 bk-tree/packages.lisp

Parent

bk-tree (system)

Location

packages.lisp

Packages

bk-tree


Next: , Previous: , Up: Lisp files   [Contents][Index]

3.1.3 bk-tree/specials.lisp

Dependency

packages.lisp (file)

Parent

bk-tree (system)

Location

specials.lisp

Exported Definitions

Next: , Previous: , Up: Lisp files   [Contents][Index]

3.1.4 bk-tree/utils.lisp

Dependency

specials.lisp (file)

Parent

bk-tree (system)

Location

utils.lisp

Exported Definitions

Previous: , Up: Lisp files   [Contents][Index]

3.1.5 bk-tree/bk-tree.lisp

Dependency

utils.lisp (file)

Parent

bk-tree (system)

Location

bk-tree.lisp

Exported Definitions

Next: , Previous: , Up: Top   [Contents][Index]

4 Packages

Packages are listed by definition order.


Previous: , Up: Packages   [Contents][Index]

4.1 bk-tree

Source

packages.lisp (file)

Use List

common-lisp

Exported Definitions

Next: , Previous: , Up: Top   [Contents][Index]

5 Definitions

Definitions are sorted by export status, category, package, and then by lexicographic order.


Previous: , Up: Definitions   [Contents][Index]

5.1 Exported definitions


Next: , Previous: , Up: Exported definitions   [Contents][Index]

5.1.1 Functions

Function: average-children-count TREE

Returns average children count per node of the supplied ‘TREE’.

Package

bk-tree

Source

bk-tree.lisp (file)

Function: hairy-search-value VALUE TREE &key THRESHOLD METRIC LIMIT ORDERED-RESULTS ORDERED-TRAVERSAL

Return a list of ‘SEARCH-RESULT’ instances built from ‘TREE’ and its children whose value is no more distant from ‘VALUE’ than ‘THRESHOLD’, using ‘METRIC’ to measure the distance.

If ‘LIMIT’ is non-NIL, given number of first found results will be
returned.

If ‘ORDERED-RESULTS’ is non-NIL, returned results will be ordered according to their distances from ‘VALUES’.

If ‘ORDERED-TRAVERSAL’ is non-NIL, candidates in a level (e.g. children of a validated node) will be traversed in sorted order according to the absolute difference between the parent distance and child distance – the (probably) more similar is first.

‘HAIRY-SEARCH-VALUE’ is a feature rich and hence slower derivative of ‘SEARCH-VALUE’. For simple query patterns, consider using ‘SEARCH-VALUE’.

Package

bk-tree

Source

bk-tree.lisp (file)

Function: insert-value VALUE TREE &key METRIC

Inserts given ‘VALUE’ into supplied ‘TREE’.

Package

bk-tree

Source

bk-tree.lisp (file)

Function: levenshtein SRC DST &key INSERT-COST DELETE-COST SUBSTITUTE-COST

An O(mn) implementation of the Levenshtein distance metric.

Package

bk-tree

Source

utils.lisp (file)

Function: maximum-depth TREE

Returns maximum depth of the ‘TREE’.

Package

bk-tree

Source

bk-tree.lisp (file)

Function: print-tree TREE &key STREAM DEPTH

Prints supplied ‘TREE’ in a human-readable(?) format.

Package

bk-tree

Source

bk-tree.lisp (file)

Function: search-value VALUE TREE &key THRESHOLD METRIC ORDERED-RESULTS

Return a list of ‘SEARCH-RESULT’ instances built from ‘TREE’ and its children whose value is no more distant from ‘VALUE’ than ‘THRESHOLD’, using ‘METRIC’ to measure the distance.

If ‘ORDERED-RESULTS’ is non-NIL, collected results will be sorted according to their distances from ‘VALUE’.

Package

bk-tree

Source

bk-tree.lisp (file)


Next: , Previous: , Up: Exported definitions   [Contents][Index]

5.1.2 Generic functions

Generic Function: distance-of OBJECT
Generic Function: (setf distance-of) NEW-VALUE OBJECT
Package

bk-tree

Methods
Method: distance-of (SEARCH-RESULT search-result)

automatically generated reader method

Source

specials.lisp (file)

Method: (setf distance-of) NEW-VALUE (SEARCH-RESULT search-result)

automatically generated writer method

Source

specials.lisp (file)

Method: distance-of (BK-TREE bk-tree)
Method: (setf distance-of) NEW-VALUE (BK-TREE bk-tree)

Metric distance between current node and its parent.

Source

specials.lisp (file)

Generic Function: nodes-of OBJECT
Generic Function: (setf nodes-of) NEW-VALUE OBJECT
Package

bk-tree

Methods
Method: nodes-of (BK-TREE bk-tree)
Method: (setf nodes-of) NEW-VALUE (BK-TREE bk-tree)

Nodes collected under this node.

Source

specials.lisp (file)

Generic Function: value-of OBJECT
Generic Function: (setf value-of) NEW-VALUE OBJECT
Package

bk-tree

Methods
Method: value-of (CONDITION duplicate-value)
Method: (setf value-of) NEW-VALUE (CONDITION duplicate-value)
Source

specials.lisp (file)

Method: value-of (SEARCH-RESULT search-result)

automatically generated reader method

Source

specials.lisp (file)

Method: (setf value-of) NEW-VALUE (SEARCH-RESULT search-result)

automatically generated writer method

Source

specials.lisp (file)

Method: value-of (BK-TREE bk-tree)

automatically generated reader method

Source

specials.lisp (file)

Method: (setf value-of) NEW-VALUE (BK-TREE bk-tree)

automatically generated writer method

Source

specials.lisp (file)


Next: , Previous: , Up: Exported definitions   [Contents][Index]

5.1.3 Conditions

Condition: duplicate-value ()

Signaled upon every duplicated entry insertion.

Package

bk-tree

Source

specials.lisp (file)

Direct superclasses

error (condition)

Direct methods
Direct slots
Slot: value
Initargs

:value

Readers

value-of (generic function)

Writers

(setf value-of) (generic function)


Next: , Previous: , Up: Exported definitions   [Contents][Index]

5.1.4 Classes

Class: bk-tree ()
Package

bk-tree

Source

specials.lisp (file)

Direct superclasses

standard-object (class)

Direct methods
  • print-object (method)
  • nodes-of (method)
  • nodes-of (method)
  • value-of (method)
  • value-of (method)
  • distance-of (method)
  • distance-of (method)
Direct slots
Slot: distance

Metric distance between current node and its parent.

Type

unsigned-byte

Initargs

:distance

Initform

0

Readers

distance-of (generic function)

Writers

(setf distance-of) (generic function)

Slot: value
Initargs

:value

Readers

value-of (generic function)

Writers

(setf value-of) (generic function)

Slot: nodes

Nodes collected under this node.

Type

list

Initargs

bk-tree::nodes

Readers

nodes-of (generic function)

Writers

(setf nodes-of) (generic function)

Class: search-result ()
Package

bk-tree

Source

specials.lisp (file)

Direct superclasses

standard-object (class)

Direct methods
  • print-object (method)
  • value-of (method)
  • value-of (method)
  • distance-of (method)
  • distance-of (method)
Direct slots
Slot: distance
Type

unsigned-byte

Initargs

:distance

Readers

distance-of (generic function)

Writers

(setf distance-of) (generic function)

Slot: value
Initargs

:value

Readers

value-of (generic function)

Writers

(setf value-of) (generic function)


Previous: , Up: Exported definitions   [Contents][Index]

5.1.5 Types

Type: levenshtein-cost ()

Available penalty costs.

Package

bk-tree

Source

utils.lisp (file)

Type: levenshtein-max-distance ()

Maximum distance a comparison can result.

Package

bk-tree

Source

utils.lisp (file)


Previous: , Up: Top   [Contents][Index]

Appendix A Indexes


Next: , Previous: , Up: Indexes   [Contents][Index]

A.1 Concepts

Jump to:   B   F   L  
Index Entry  Section

B
bk-tree.asd: The bk-tree<dot>asd file
bk-tree/bk-tree.lisp: The bk-tree/bk-tree<dot>lisp file
bk-tree/packages.lisp: The bk-tree/packages<dot>lisp file
bk-tree/specials.lisp: The bk-tree/specials<dot>lisp file
bk-tree/utils.lisp: The bk-tree/utils<dot>lisp file

F
File, Lisp, bk-tree.asd: The bk-tree<dot>asd file
File, Lisp, bk-tree/bk-tree.lisp: The bk-tree/bk-tree<dot>lisp file
File, Lisp, bk-tree/packages.lisp: The bk-tree/packages<dot>lisp file
File, Lisp, bk-tree/specials.lisp: The bk-tree/specials<dot>lisp file
File, Lisp, bk-tree/utils.lisp: The bk-tree/utils<dot>lisp file

L
Lisp File, bk-tree.asd: The bk-tree<dot>asd file
Lisp File, bk-tree/bk-tree.lisp: The bk-tree/bk-tree<dot>lisp file
Lisp File, bk-tree/packages.lisp: The bk-tree/packages<dot>lisp file
Lisp File, bk-tree/specials.lisp: The bk-tree/specials<dot>lisp file
Lisp File, bk-tree/utils.lisp: The bk-tree/utils<dot>lisp file

Jump to:   B   F   L  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.2 Functions

Jump to:   (  
A   D   F   G   H   I   L   M   N   P   S   V  
Index Entry  Section

(
(setf distance-of): Exported generic functions
(setf distance-of): Exported generic functions
(setf distance-of): Exported generic functions
(setf nodes-of): Exported generic functions
(setf nodes-of): Exported generic functions
(setf value-of): Exported generic functions
(setf value-of): Exported generic functions
(setf value-of): Exported generic functions
(setf value-of): Exported generic functions

A
average-children-count: Exported functions

D
distance-of: Exported generic functions
distance-of: Exported generic functions
distance-of: Exported generic functions

F
Function, average-children-count: Exported functions
Function, hairy-search-value: Exported functions
Function, insert-value: Exported functions
Function, levenshtein: Exported functions
Function, maximum-depth: Exported functions
Function, print-tree: Exported functions
Function, search-value: Exported functions

G
Generic Function, (setf distance-of): Exported generic functions
Generic Function, (setf nodes-of): Exported generic functions
Generic Function, (setf value-of): Exported generic functions
Generic Function, distance-of: Exported generic functions
Generic Function, nodes-of: Exported generic functions
Generic Function, value-of: Exported generic functions

H
hairy-search-value: Exported functions

I
insert-value: Exported functions

L
levenshtein: Exported functions

M
maximum-depth: Exported functions
Method, (setf distance-of): Exported generic functions
Method, (setf distance-of): Exported generic functions
Method, (setf nodes-of): Exported generic functions
Method, (setf value-of): Exported generic functions
Method, (setf value-of): Exported generic functions
Method, (setf value-of): Exported generic functions
Method, distance-of: Exported generic functions
Method, distance-of: Exported generic functions
Method, nodes-of: Exported generic functions
Method, value-of: Exported generic functions
Method, value-of: Exported generic functions
Method, value-of: Exported generic functions

N
nodes-of: Exported generic functions
nodes-of: Exported generic functions

P
print-tree: Exported functions

S
search-value: Exported functions

V
value-of: Exported generic functions
value-of: Exported generic functions
value-of: Exported generic functions
value-of: Exported generic functions

Jump to:   (  
A   D   F   G   H   I   L   M   N   P   S   V  

Next: , Previous: , Up: Indexes   [Contents][Index]

A.3 Variables

Jump to:   D   N   S   V  
Index Entry  Section

D
distance: Exported classes
distance: Exported classes

N
nodes: Exported classes

S
Slot, distance: Exported classes
Slot, distance: Exported classes
Slot, nodes: Exported classes
Slot, value: Exported conditions
Slot, value: Exported classes
Slot, value: Exported classes

V
value: Exported conditions
value: Exported classes
value: Exported classes

Jump to:   D   N   S   V  

Previous: , Up: Indexes   [Contents][Index]

A.4 Data types

Jump to:   B   C   D   L   P   S   T  
Index Entry  Section

B
bk-tree: The bk-tree system
bk-tree: The bk-tree package
bk-tree: Exported classes

C
Class, bk-tree: Exported classes
Class, search-result: Exported classes
Condition, duplicate-value: Exported conditions

D
duplicate-value: Exported conditions

L
levenshtein-cost: Exported types
levenshtein-max-distance: Exported types

P
Package, bk-tree: The bk-tree package

S
search-result: Exported classes
System, bk-tree: The bk-tree system

T
Type, levenshtein-cost: Exported types
Type, levenshtein-max-distance: Exported types

Jump to:   B   C   D   L   P   S   T