portable perspectives

a blog about programming, life, universe and everything

Spell Corrector in AliceML

It seems that writing a spell correcor is becoming my new post hello world program, so well, after having this in my hard disk for a while I thought I would publish my new solution, wrote using AliceML.

I'm really not good at ML, so this code must not be considered a good example of how to code, but I thing it's a good thing to keep track of my learning :)


import structure MkRedBlackSet  from "x-alice:/lib/data/MkRedBlackSet";
import structure MkRedBlackMap from "x-alice:/lib/data/MkRedBlackMap";

structure Set = MkRedBlackSet String;
structure Map = MkRedBlackMap String;

val wordlist = "/home/rff/wordlist.txt";
val alphabet = String.explode "abcdefghijklmnopqrstuvwxyz";

open List;

fun take2(i1,i2) = (fn cl => (take (cl,i1), drop (cl,i2)))

fun delete cl i = 
  let
    val (first, second) = take2 (i,i+1) cl
  in
    String.implode (first @ second)
  end

fun deletions cl = tabulate ((length cl), (delete cl));


fun substitute cl i = 
  let 
    val (first,second) = take2 (i, i+1) cl
    fun sub_char c = String.implode (first @ [c] @ second)
  in  
    map sub_char alphabet
  end 

fun substitutions cl =  concat( tabulate ((length cl), (substitute cl)));

fun insert cl i = 
  let 
    val (first, second) = take2 (i,i) cl
    fun ins_char c = String.implode (first @ [c] @ second)
  in  
    map ins_char alphabet
  end 

fun insertions cl =  concat( tabulate ((length cl)+1, (insert cl)));

fun transpose cl i = 
  let 
    val (first,second) = take2 (i,i+2) cl
    val c1 = nth (cl,i)
    val c2 = nth (cl,i+1)
  in  
    String.implode (first @ [c2] @[c1] @ second)
  end 

fun transpositions cl =  tabulate ((length cl)-1, (transpose cl));


fun edits1 word = 
  let 
    val w = String.explode word 
    val ws = deletions w @ transpositions w @ insertions w @  substitutions w
  in 
    foldl (Fn.flip Set.insert) (Set.empty) ws
  end 

fun insertOrUpdate map key = 
  case Map.lookup (map, key) of
     NONE => Map.insert (map, key, ref 1)
    |SOME value => (value := (!value +1) ; map)


(* train *)
fun train io =
  let 
    val chop = fn str => String.substring (str, 0, (String.size str) - 1)
  in
    case TextIO.inputLine io of
       NONE => Map.empty
      |SOME line => insertOrUpdate (train io) (chop line)
    end  

val nwords = train ( TextIO.openIn wordlist);

fun known_edits2  word = 
  let 
    fun set_edits set = Set.fold (fn (str,set) => Set.union (set, known (edits1 str))) Set.empty set
  in
    set_edits (edits1 word)
  end

fun look k = !(valOf (Map.lookup  (nwords, k)));


fun correct word =
  let 
    val candidates = find (not o Set.isEmpty) 
                           [ known (Set.fromList [word]),
                             known (edits1 word),
                             known_edits2 word]
     val sort = sort (fn (a,b)=>Int.compare ((look b), (look a))) o Set.toList
   in
     case candidates of
       SOME words => hd(sort words)
      |NONE     => word
   end  

correct "ciao" (*ciao match*);
correct "cao"  (*ciao edit 1*);
correct "mioo" (*miao edit 1*);
correct "miaoo"(*miao edit 1*);
correct "mooo" (*mao  edit 2 max value*);
correct "quux" (*quux no match*);

The worse part is the string manipulation, I think, and maybe the fact that I use references (aka mutable variables) when building the frequency count Map, but in the end I'm quite satisfied. Feel free to point out obvious errors, if you see them.

See all comments

AliceML concurrency

In these days it seem that Erlang is considered the end of all concurrency debates. and it actually seem to provides a very nice environment for concurrent programming, possibly because it was designed to do so :)

Yet there is another environment that was designed to provide good support for concurrent programming and that I find very interesting, but is not as "hyped". I'm thinking of AliceML.

I'll try to explain the basic concepts, but I'm not really proficient in ML and I may be wrong about something, so please feel free to correct my code.

A simple echo server in Alice is something like this


import structure Socket from "x-alice:/lib/system/Socket"

fun serve (socket, host, port) =
   let 
      val lineOpt = Socket.inputLine socket
   in
      case lineOpt of
          (* read something *)
           SOME(line) => print line
          (* read nothing *)
         | NONE => ()
      ;

      Socket.close socket
   end


val port = SOME 8802;
val (socket, port) = Socket.server (port, serve);

(I know the case line is somewhat strange, but keep in mind that Alice is a completely type safe programming language. Yes, no type declarations in the above code, Yay for type inference!)

Anyway, the above code will impose an ordering on the serving of clients, because client Y will be waiting for client X to finish before the serve function is called for it.

Now, in AliceML to make an expression concurrent you just need to prefix it with spawn, so for example you can write:

spawn print ( complex_calc () )

And you'll have the calculation happen in a concurrent thread.

So, how does this relates to our serve function? Simple, you can declare a function as being always computed in a different thread replacing fun with fun spawn.

By adding one word you have completed your transition from a serial server to a multithreaded uber-responsive concurrent HTTP server that competes with lighttpd, YAWS and ngix. Or, actually, not, but you have a concurrent echo server, which is nice.

Getting values

As of now there is little difference with pthread-style or unix-processes concurrency at all. The real difference is in the presence of a result in the spawned expression. Basically whenever a spawned expression is evaluated it returns a Future value, meaning something that is not yet a real value, but will become soon. It's like your void-ish concurrent stuff in C/Java/Ruby/Python suddenly could return something. the question is: what?

Well, a spawned expression will return something called a Future, that works as a placeholder for the real value.

The main computation will work concurrently with the spawned computation and it will lock just when it needs the actual value of the spawned computation. At that point:

  • the main computation waits
  • the concurrent computation finish its work
  • the Future in the main computation is replaced with the concurrently produced value
  • the main computation can resume its work and go on

This means that, for example, to concurrently transform a list you may do:


map (fn x => spawn calc x) [1,2,3,4]

And it will not lock as long as you don't access the list elements. Given the ease you can write this stuff, it is also ultra-simple to define your own parallel map like this:


fun pmap f l = map (fn x => spawn f x) l;

When trouble strikes

But wait: what happens if a Thread fails for some reason? Simple, the Future it returns will be a Failed Future and it will do exactly what you expect: it will cause a failure whenever it is accessed. So you have two chances:

  • the result is actually important, and a failure means the global result is invalid (i.e. using the values in a concurrently computed list). Then you are accessing the Failed Future and the error propagates, as expected.
  • if a single concurrent computation fails utterly you don't care (i.e. network server), so you just avoid handling the Future object and you will never know anything about the failure, somewhat like erlang's asynchronous message passing.

I like this, it seems like an evolutionary step from old-style concurrency that is not complicated yet works, plus it blurs the line between synchronous and asynchronous message passing, which is a great thing in my opinion.

There is more than this in Alice's concurrency, for example handling shared values, the difference between Thread and OS Processes, atomic state change and so on, but all built on top of this primitives, AFAICT. Take a look at it, it even comes with a GTK gui ;)

See all comments

AddThis Social Bookmark Button