portable perspectives

a blog about programming, life, universe and everything

Spell Corrector in AliceML

It seems that writing a spell correcor is becoming my new post hello world program, so well, after having this in my hard disk for a while I thought I would publish my new solution, wrote using AliceML.

I'm really not good at ML, so this code must not be considered a good example of how to code, but I thing it's a good thing to keep track of my learning :)


import structure MkRedBlackSet  from "x-alice:/lib/data/MkRedBlackSet";
import structure MkRedBlackMap from "x-alice:/lib/data/MkRedBlackMap";

structure Set = MkRedBlackSet String;
structure Map = MkRedBlackMap String;

val wordlist = "/home/rff/wordlist.txt";
val alphabet = String.explode "abcdefghijklmnopqrstuvwxyz";

open List;

fun take2(i1,i2) = (fn cl => (take (cl,i1), drop (cl,i2)))

fun delete cl i = 
  let
    val (first, second) = take2 (i,i+1) cl
  in
    String.implode (first @ second)
  end

fun deletions cl = tabulate ((length cl), (delete cl));


fun substitute cl i = 
  let 
    val (first,second) = take2 (i, i+1) cl
    fun sub_char c = String.implode (first @ [c] @ second)
  in  
    map sub_char alphabet
  end 

fun substitutions cl =  concat( tabulate ((length cl), (substitute cl)));

fun insert cl i = 
  let 
    val (first, second) = take2 (i,i) cl
    fun ins_char c = String.implode (first @ [c] @ second)
  in  
    map ins_char alphabet
  end 

fun insertions cl =  concat( tabulate ((length cl)+1, (insert cl)));

fun transpose cl i = 
  let 
    val (first,second) = take2 (i,i+2) cl
    val c1 = nth (cl,i)
    val c2 = nth (cl,i+1)
  in  
    String.implode (first @ [c2] @[c1] @ second)
  end 

fun transpositions cl =  tabulate ((length cl)-1, (transpose cl));


fun edits1 word = 
  let 
    val w = String.explode word 
    val ws = deletions w @ transpositions w @ insertions w @  substitutions w
  in 
    foldl (Fn.flip Set.insert) (Set.empty) ws
  end 

fun insertOrUpdate map key = 
  case Map.lookup (map, key) of
     NONE => Map.insert (map, key, ref 1)
    |SOME value => (value := (!value +1) ; map)


(* train *)
fun train io =
  let 
    val chop = fn str => String.substring (str, 0, (String.size str) - 1)
  in
    case TextIO.inputLine io of
       NONE => Map.empty
      |SOME line => insertOrUpdate (train io) (chop line)
    end  

val nwords = train ( TextIO.openIn wordlist);

fun known_edits2  word = 
  let 
    fun set_edits set = Set.fold (fn (str,set) => Set.union (set, known (edits1 str))) Set.empty set
  in
    set_edits (edits1 word)
  end

fun look k = !(valOf (Map.lookup  (nwords, k)));


fun correct word =
  let 
    val candidates = find (not o Set.isEmpty) 
                           [ known (Set.fromList [word]),
                             known (edits1 word),
                             known_edits2 word]
     val sort = sort (fn (a,b)=>Int.compare ((look b), (look a))) o Set.toList
   in
     case candidates of
       SOME words => hd(sort words)
      |NONE     => word
   end  

correct "ciao" (*ciao match*);
correct "cao"  (*ciao edit 1*);
correct "mioo" (*miao edit 1*);
correct "miaoo"(*miao edit 1*);
correct "mooo" (*mao  edit 2 max value*);
correct "quux" (*quux no match*);

The worse part is the string manipulation, I think, and maybe the fact that I use references (aka mutable variables) when building the frequency count Map, but in the end I'm quite satisfied. Feel free to point out obvious errors, if you see them.

See all comments

darcs pull vs git pull

I mean, I'm not going to say that you should use $VCS1 over $VCS2, I'm not proficient enough with either. I just think that:

rff@ut:~/ramaze$ darcs pull
Pulling from \"http://manveru.mine.nu/ramaze\"...

Wed Sep 12 14:52:55 CEST 2007  Michael Fellinger <mail@foo.com>
  * Adding path for OSX to tool/tidy and improve readability of the spec for it a bit.
Shall I pull this patch? (1/1)  [ynWvpxqadjk], or ? for help: v
[Adding path for OSX to tool/tidy and improve readability of the spec for it a bit.
Michael Fellinger <mail@foo.com>**20070912125255] {
hunk ./lib/ramaze/tool/tidy.rb 18
+        /usr/lib/libtidy.dylib
hunk ./spec/ramaze/tidy.rb 10
-    Ramaze::Tool::Tidy.tidy(\"<html></html>\").should =~ %r{<html>\\s+<head>\\s+<meta name=\"generator\" content=\"HTML Tidy (.*?)\" />\\s+<title></title>\\s+</head>\\s+<body></body>\\s+</html>}
+    Ramaze::Tool::Tidy.tidy(\"<html></html>\").
+      should =~ %r{<html>\\s+<head>\\s+<meta name=\"generator\" content=\"HTML Tidy (.*?)\" />\\s+<title></title>\\s+</head>\\s+<body></body>\\s+</html>}
}

Wed Sep 12 14:52:55 CEST 2007  Michael Fellinger <mail@foo.com>
  * Adding path for OSX to tool/tidy and improve readability of the spec for it a bit.
Shall I pull this patch? (1/1)  [ynWvpxqadjk], or ? for help: y
Finished pulling and applying.


is slightly more clear than:

rff@ut:~/rubinius$ git pull
Fetching refs/heads/master from http://git.rubini.us/code using http
got 2fb92489ed4a0e58ea3d941a6f2ff8c7b8b44c2d
walk 2fb92489ed4a0e58ea3d941a6f2ff8c7b8b44c2d
got 6ae85832a699f3048ee260d9091aacc05c686fe4
got 6c0e0e63c0bb5979e2dd2b3df6e511b031656ef4
walk 6c0e0e63c0bb5979e2dd2b3df6e511b031656ef4
Getting alternates list for http://git.rubini.us/code
Getting pack list for http://git.rubini.us/code
Getting index for pack cf217ebdaa4f745cc827450e9cac23ec258d45f9
Getting pack cf217ebdaa4f745cc827450e9cac23ec258d45f9
 which contains cf987cd736618331bf421603c1491030b0fc8e35

got c77671cc8a43049bd61dd22f13fbbbc1ab4f96ec
... many more.. 
got 933282575047f8924edc6cc426c0a234bdbe16a3
* refs/remotes/origin/master: fast forward to branch \'master\' of http://git.rubini.us/code
  old..new: e85ecf8..2fb9248
Updating e85ecf8..2fb9248

Fast forward
 .gitignore                            |    1 +
 Makefile                              |    2 +-
 bin/sirb.rb                           |    1 -
 compiler/bytecode/compiler.rb         |   22 +-
 compiler/bytecode/primitive_names.rb  |    2 +-
 compiler/bytecode/rubinius.rb         |    1 +
 compiler/bytecode/system_hints.rb     |   20 +-
 compiler/translation/local_scoping.rb |    7 +-
 compiler/translation/normalize.rb     |   24 ++
 compiler/translation/states.rb        |   30 ++-
 kernel/core/compile.rb                |    6 +-
 kernel/core/compiled_method.rb        |    7 +-
 kernel/core/context.rb                |   26 ++-
 kernel/core/io.rb                     |    3 +
 kernel/core/string.rb                 |   91 ++++++--
 lib/bin/compile.rb                    |  176 ++++++++++++---
 lib/bin/sirb.rb                       |   22 +-
 lib/ext/syck/build.rb                 |    6 +
 lib/ext/syck/rbxext.c                 |   12 +-
 lib/stringio.rb                       |  413 +++++++++++++++++++++++++++++++++
 runtime/bootstrap.rba                 |  Bin 29758 -> 29986 bytes
 runtime/compiler.rba                  |  Bin 79208 -> 81314 bytes
 runtime/core.rba                      |  Bin 162952 -> 168854 bytes
 runtime/loader.rbc                    |  Bin 12565 -> 12651 bytes
 shotgun/lib/cpu.h                     |   13 +-
 shotgun/lib/cpu_primitives.c          |   18 ++
 shotgun/lib/cpu_task.c                |   11 +
 shotgun/lib/instructions.rb           |   13 +-
 shotgun/lib/methctx.c                 |    5 +-
 shotgun/lib/methctx.h                 |    2 +-
 shotgun/lib/primitives.rb             |   75 +++----
 spec/mini_rspec.rb                    |    8 +-
 32 files changed, 849 insertions(+), 168 deletions(-)
 delete mode 120000 bin/sirb.rb
 create mode 100644 lib/ext/syck/build.rb
 create mode 100644 lib/stringio.rb

But, hei, maybe it's just me :)

See all comments

AddThis Social Bookmark Button