A spell Corrector in perl6 part 2
In the first part we saw how we could write routines to generate string mutations by using perl6's regexen, but we also noticed that the implementation won't run on pugs, yet.
So, here is an alternative implementation of the same methods, based on Str.substr. The only thing to keep in mind is that this method does side effet modifying the current string, so in each case we have to create a copy (which for strings in perl we can do via simple assignment). The code should be quite self-explaining:
# 'abc' -> 'ac'
sub deletion($word) {
(^$word.chars).map: {substr(my $tmp = $word,$_,1)=''; $tmp};
}
# 'abc' -> 'adc'
sub substitution($word) {
gather {
for (0..$word.chars-1) X @ALPHA {
substr(my $tmp = $word,$_[0],1)=$_[1];
take $tmp;
}
}
}
# 'abc' -> 'abbc'
sub insertion($word) {
gather {
for (0..$word.chars) X @ALPHA {
substr(my $tmp = $word,$_[0],0)=$_[1];
take $tmp;
}
}
}
# 'abc' -> 'acb'
sub transposition($w) {
gather for ^$w.chars {
my $tmp=$w;
my $removed =(substr($tmp,$_,1)='');
substr($tmp,$_+1,0)=$removed;
take $tmp;
}
}
Remember that the value returned by the function is just the last computed one, so we always return implicitly the value of the gather (or of mao in the first function.
Now we can define a edits1 function that returns all the possible 1-letter variations of a word like this:
sub edits1($w) {
transposition($w),insertion($w),substitution($w),deletion($w)
}
the comma operator concatenates the arrays so edits1 simply returns a big array of all the variations.
If you compare this code with norvig's you will notice that in edits1 he is already using Sets while we keep using simple lists. Since perl6 doesn't provide a builtin Set type we will use the any() junction which provide basically the same behaviour (i.e. we can check for the presence of an object, there are no duplicates and we can iterate over it).
Now we need a known(@array) routine that selects the elements of the list that actually exist. Rmember that we have an Hash variable %WORDS[word]=>number that we have initialized somehow with the existing english words and their frequency.
Our function is thus very easy:
sub known(@words) {
@words.grep: { %NWORDS{$_}
}
The correct($word) function then becomes:
sub correct($word) {
my @values = known([$word]) or # the word is known
known(edits1($word)) or # a single letter variation is known
[$word]; # we don't have suggestions
# single argument max() doesn't work yet
@values.max: {%NWORDS{$^a} <=> %NWORDS{$^b}}
}
The last line coud be prettier, but I have the feeling the code is quite nice and clean.
For the last bit of code please wait a little bit more, but hey, it should be easy enough for you to write :)
Sorry, comments are closed for this article.