Chapter Contents |
Previous |
Next |
SPEDIS |
Category: | Character |
Syntax | |
Arguments | |
Details | |
Examples |
Syntax |
SPEDIS(query,keyword) |
Details |
SPEDIS returns the distance between the query and a keyword, a nonnegative value usually less than 100, never greater than 200 with the default costs.
SPEDIS computes an asymmetric spelling distance between two words as the normalized cost for converting the keyword to the query word via a sequence of operations. SPEDIS(QUERY, KEYWORD) is NOT the same as SPEDIS(KEYWORD, QUERY).
Costs for each operation that is required to convert the keyword to the query are
Operation | Cost | Explanation |
---|---|---|
match | 0 | no change |
singlet | 25 | delete one of a double letter |
doublet | 50 | double a letter |
swap | 50 | reverse the order of two consecutive letters |
truncate | 50 | delete a letter from the end |
append | 35 | add a letter to the end |
delete | 50 | delete a letter from the middle |
insert | 100 | insert a letter in the middle |
replace | 100 | replace a letter in the middle |
firstdel | 100 | delete the first letter |
firstins | 200 | insert a letter at the beginning |
firstrep | 200 | replace the first letter |
The distance is the sum of the costs divided (in integer arithmetic) by the length of the query.
Examples |
options nodate pageno=1 linesize=64; data words; input oper $ query $ keyword $; dist = spedis(query,keyword); cost = dist * length(query); put oper $10. query $10. keyword $10. dist 5. cost 5.; datalines; match fuzzy fuzzy singlet fuzy fuzzy doublet fuuzzy fuzzy swap fzuzy fuzzy truncate fuzz fuzzy append fuzzys fuzzy delete fzzy fuzzy insert fluzzy fuzzy replace fizzy fuzzy firstdel uzzy fuzzy firstins pfuzzy fuzzy firstrep wuzzy fuzzy several floozy fuzzy ; proc print data = words; run;
The output from the DATA step is as follows:
The SAS System 1 OBS OPER QUERY KEYWORD DIST COST 1 match fuzzy fuzzy 0 0 2 singlet fuzy fuzzy 6 24 3 doublet fuuzzy fuzzy 8 48 4 swap fzuzy fuzzy 10 50 5 truncate fuzz fuzzy 12 48 6 append fuzzys fuzzy 5 30 7 delete fzzy fuzzy 12 48 8 insert fluzzy fuzzy 16 96 9 replace fizzy fuzzy 20 100 10 firstdel uzzy fuzzy 25 100 11 firstins pfuzzy fuzzy 33 198 12 firstrep wuzzy fuzzy 40 200 13 several floozy fuzzy 50 300 |
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.