portable perspectives

a blog about programming, life, universe and everything

Matching nested structures with Regexps in Ruby 1.9

Following a discussion on hacker news I have found myself wondering about regular expressions in ruby 1.9.

In this major version ruby switched its regex engine to oniguruma (and, since a few days ago to a fork of it called Onigmo ).

This engine is widely more powerful than the original ruby one, supporting positive and negative lookbehind, named captures, better support for backreferences and most importantly for our topic, callable backreferences.

The latter means that, among other things, ruby has gained the ability to recognize arbitrarily nested structures.

Since I hadn't played with this before, I spent some time to craft a Regexp that would match a nested tree like structure, and what better example is there than good ol' lisp?

This is what I got

# this pattern has many issues, consider it an example
R = %r{
      (?<atom> [-\w]+){0}
      (?<quote> '\g<expr>){0}
      (?<expr>  
        \g<atom> | 
        \g<quote> | 
        \(  
          \g<expr>?
          (\s+  \g<expr>)* 
        \)
      ){0}
      ^\g<expr>$
    }imx

which has a rather nice feel to it, reading bottom up

  • a regex is an expression
  • an expression is
    • an atom or
    • a quotation or
    • an expression (possibly empty) followed by other expressions
  • a quotation is an expression with a single quote in the front
  • an atom is a sequence of word characters or a dash

syntax wise, there are only a couple of things that are unusual: the "grammar rules" are defined like

 (?<foo> some pattern){0}

which is a clever trick (which I first saw in the oniguruma documentation): create a named capturing group, but say that it should match zero times. This way, the definition of the group stays around, but it is not required to actually match anywhere.

The other thing is how we refer to this rules:

\g<foo> 

this simply means: apply the named pattern in this spot, via the awesome "callable backreferences" of oniguruma (also available in PCRE now). Notice that we also get forward declarations, which is rather cool.

Some quick tests show that this works (for an arguable value of "works") in some simple cases:

t('()')
t('(ciao)')
t('(ciao miao)')
t('(ciao miao)')
t('(ciao (miao))')
t('(ciao (miao bau))')
t("(ciao '(miao bau))")

t('(sum-iter k 1)')
t(<<-SEXPR)
(define sum1  (lambda (k)
    (sum-iter k 0 1)))
SEXPR
t('ok')

t("nope '(miao bau))")
t('(nope')
t(<<-SEXPR)
(define sum1  (lambda (k)
    (sum-iter k 0 1))
SEXPR

t(<<-SEXPR)
(define @  (lambda (k)
    (sum-iter k 0 1))
SEXPR


which correctly returns:

(): ok
(ciao): ok
(ciao miao): ok
(ciao miao): ok
(ciao (miao)): ok
(ciao (miao bau)): ok
(ciao '(miao bau)): ok
(sum-iter k 1): ok
(define sum1  (lambda (k)
    (sum-iter k 0 1)))
: ok
ok: ok
nope '(miao bau)): no
(nope: no
(define sum1  (lambda (k)
    (sum-iter k 0 1))
: no
(define @  (lambda (k)
    (sum-iter k 0 1))
: no

So the pattern correctly matches recursive structures with balanced parenthesis.

Sadly, this approach appears to have one serious downside: it's impossible to get at the matched elements.

From what I understand, when the regexp engine compiles the pattern, it reserves a certain number of slots for the capturing groups which is independent of the actual string being examinated.

This means that each named group only stores the value of the last match, e.g.

# no reference to "first"
>> R.match('(first last)') 
=>#<MatchData "(first last)" atom:"last" quote:nil expr:"(first last)"> 

This is actually true also for all the other regexes, without named groups or backreferences calls, e.g.

>> /((\w)*)/.match('abc')
=> #<MatchData "abc" 1:"abc" 2:"c">

except that, in my normal life as a regexp user, I'd solve that specific issue by using #scan and getting rid of the capturing group:

>> 'abc'.scan(/\w/)
=> ["a", "b", "c"]
>> 'abc'.scan(/(\w)/)
=> [["a"], ["b"], ["c"]]

but I can't do it here, as that would mean unrolling the recursion to infinite depth.

I digged for quite a while trying to understand if this was something inherent in the ruby version of oniguruma, or in oniguruma itself, or if I was just dumb.

Turns out, people way smarter than me over at pcre-dev already had a discussion related to this, from which I quote

> Is there a workable approach if the repeating pattern itself has desired
> subpatterns? With the following I find the "b" captured only once:
> /(a(b)c)(\g1)?/ with data abcabc

That's correct. Nobody has invented a scheme where the contents of capturing subpatterns are expressed as vectors. PCRE returns the value from the last time the parens captured something. So if you had /(a(b|d)c)(\g1)?/ with data abcadc you would get "d".

Anyway, it seems to me that it should be possible to hook into the regex engine so that when a match happens it is passed to a user defined routine. I seem to recall perl has something like that, and PCRE's callouts seem to be in the same league.

To be fair, I'd rather use a proper parser than regular expressions in real problems related to nested structures, as regular expressions have a tendency to grow exponentially in complexity. Still, since we can already check if a structure is recursive, it would be cool to be able to get the matched tree structure.

Even better, I'd like regular expressions in ruby to become something akin to pattern matching in perl6, and subsume the issue of basic regular expressions and full blown parsing in one go.

See all comments

Step the picoframework: bigger, smaller

Some time ago I wrote about the toy web framework for scala I had coded to get a feel for the language.

I improved the code a bit after that, but as I did not blog about it and nobody knew it Alan Dipert got the ball and run with it and now the code evolved into a proper project on github, including tests, build management and code that does much more in funny ways :)

Kudos to him who also kindly asked me to keep the name ;)

I had managed to do some things (e.g. path parameters) but he managed to improve it even more, and has already shown me a couple of tricks I did not imagine. For one, I had coded both get and post by hand, and he had done the same adding put and delete, but I just noticed the updated code with the following trick that allows them to be defined programmatically:

  val List(get, post, put, delete) = protocols map routeSetter 
  private def routeSetter(protocol: String): (String) => (=> Any) => Unit = {
    def g(path: String, fun: => Any) = Routes(protocol) += new Route(path, x => fun.toString)
    (g _).curry
  } 

(If I understand correctly this is a change due to paulp's fork. )

As the namespace between functions and values is (mostly) unified in scala the four functions can be defined with a single step. Notice the nifty usage of pattern matching for variable declarations and of currying to have a two-step operation defining the new Route.

I have the feeling the routeSetter function could be shortened by two lines with

private def routeSetter(protocol: String): (String) => (=> Any) => Unit =
  (path: String)=>((fun: => Any) => Routes(protocol) += new Route(path, x => fun.toString))

but it seems scala does not accept the definition of a by-name value in a function literal (yet?)

scala> val f= (x: =>int) => x
<console>:1: error: identifier expected but '=>' found.
       val f= (x: =>int) => x

Anyway, if you have some spare time, maybe you should fork this project and work a bit on it, I know I already did :)

See all comments

method not recognized error with Jersey

Jersey is a neat small framework implementing JSR311, the restful webservices for java spec. I'm posting this article just so that someone may avoid hitting the same problem I hit, wasting fifteen minutes trying to understand what was happening.

This error:

/some/path.foo: javax.servlet.UnavailableException: 
com.sun.jersey.api.container.ContainerException: 

Method, public theMethod(java.lang.String), 
annotated with GET of resource, class TheClass, 

is not recognized as valid Java method annotated with @HttpMethod.

Now, consider that @HttpMethod is defined by the @GET & co annotations, so what is it complaining about?

What this UnavailableException/ContainerException actually means is

you mismatched a method annotated with @GET with arguments annotated as @FormParam, whereas you should have used @QueryParam

I am led to believe that also mixing @POST and @QueryParam would lead to the same ContainerException but I did not check. I still like jersey though, although I guess the diagnostics could be improved a bit :)

See all comments

Step, a scala web picoframework

So, after playing with scala on Google App Engine two days ago, I spent a bit of time yesterday (while waiting for food from girlfriend) and today (while waiting plane to go eat at my parents' easter breakfast, unrelated) writing my hack for a web framework.

I mostly wanted to try scala's cool abstraction features, so I thought I could try to produce something that had more or less the feeling of immediateness of sinatra.

To get ruby-like blocks in scala you only need to define functions with by-name arguments in this way:

1
2
3
def foo(x:String)(fun: =>Any)

foo("something") {block returning something to be evaluated later }

Of course, what was needed at this point was something that allowed me to match the url and the http method. This is the code to allow that, using a simple Map from path+method to anonymous functions

1
2
3
4
5
6
7
8
9
10
11
12

abstract class Step extends HttpServlet {
  val Map = new HashMap[(String,String),(Any=>String)]() 
  override def service(request:HttpServletRequest ,response: HttpServletResponse) {
        val method=request.getMethod
        val path=request.getRequestURI
        response.getWriter.println(Map(method,path)());
    }
  
  def get(path:String)(fun: =>Any) =    Map.put(("GET",path),x=>fun.toString)
  def post(path:String)(fun: =>Any) =    Map.put(("POST",path),x=>fun.toString)  
}

I believe the types could be simplified, as I'd just need a ()=>Any function not Any=>String, but I could not come up with the proper definition yet :)

Anyway, this can be used in a pretty simple way, and thanks to Scala's xml literal you can have code like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class ScalaHello extends Step {
  get("/") {
    <form action="/post" method="POST">
      <textarea name="key"/>
      <input type="submit"/>
    </form>
  }

  post("/post") {
    "received a post"
  }

  def hello() = "hello scala world!"
  
  get("/hello) {
    hello
  }

  get("/number") { 1 }
  
}

Now, obviously we have the issue of passing arguments. The by-name trick does not allow us to pass parameters and we cannot inject a new variable into the closure scope (I believe), nor evaluate the block in a different scope a-la ruby's instance_eval.

It is possible in scala to define implicit variables that have dynamic scope, but I could not find example on how that would be possible with byname blocks.

What we can do though, is define a method params to access such variable.

Yet, we are defining all the "get"s inside the same object, so even if we can access the method to access the parameters we just shifted the problem of having that shared object, and what if different threads access it at the same time?

Yet, scala already provides what we need, namely dynamic scope, in the form of the DynamicVariable class, which can be used like

1
2
3
4
  private val dynamic = new DynamicVariable[SomeType](init)
   dynamic.withValue(aValue) {
     block where dynamic contains aValue
   }

Having defined the params method properly we have this code working nicely

1
2
3
4
5
6
7
8
9
10
class ScalaHello extends Step {
  get("/") {
    <form action="/post" method="POST">
      <textarea name="key"/>
      <input type="submit"/>
    </form>
  }
  post("/post") {
    "hello, this is your input "+params("key")
  }

I did not implement the pattern matching in the URL as sinatra does, but that should also be simple.

The code is pretty ugly and having started really reading about scala only few days ago it can be probably improved in many ways. The first things that come to my mind

  • I am shuffling around parameters to keep the function closure, but probably cleaning up the types should avoid this
  • I am using java.util.Map and java arrays, which means even if the code is well typed it can still go wrong
  • I have a cast that could probably be avoided
  • there is no error handling
  • the code still needs to me mapped via the evil web.xml to the / path. It should be possible to avoid this, but I have no clue on how to do it :)

Anyway, as the code is 30 lines of which about 9 are "real" code, I am fairly impressed :)

The full code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package step
import javax.servlet._;
import javax.servlet.http._;
import scala.util.DynamicVariable
import scala.collection.mutable.HashMap

abstract class Step extends HttpServlet {
  type Params = java.util.Map[String,Array[String]]
  val Map = new HashMap[(String,String),(Any=>String)]()
  val paramsMap = new DynamicVariable[Params](null)

  def params(name:String):String = paramsMap.value.get(name)(0)
  override def service(request:HttpServletRequest ,response: HttpServletResponse) {
      try {
        response.setContentType("text/html")
        paramsMap.withValue(request.getParameterMap.asInstanceOf[Params]) {
          response.getWriter.println(Map(request.getMethod,request.getRequestURI)());
        }
      }
       catch {
         case ex:NoSuchElementException => response.getWriter.println("requesting "+request.getMethod+" "+ request.getRequestURI+" but only have "
                                                   +Map)
       }
    }  
  def get(path:String)(fun: =>Any) =    Map.put(("GET",path),x=>fun.toString)
  def post(path:String)(fun: =>Any) =    Map.put(("POST",path),x=>fun.toString)
}

Please post suggestions if you want, but remeber that for heavy lifting it would still better to use a lift ;)

PS this code does not work on google app engine as of now, but I am getting this message

Error: Server Error
The server encountered an error and could not complete your request.

If the problem persists, please report your problem and mention this error message and the query that caused it.

which may mean this is a bug on their side, probably related to some Thread behaviour as I believe DynamicVariable is defined in terms of ThreadLocal. I'll let you know if I sort it out. UPDATE actually, it seems related to scala.collections.mutable.HashMap not being loaded, even thought the jar is in lib as the others. Mh..

UPDATE 2 well, it seems I had just a misconfigured path, everything works like a charm on Google App Engine too, yay!

See all comments

Sorry for the outage

if you happened to stumble on this blog in the last days you may have noticed some problems, or not be able to see anything at all.

I'm sorry, the problems were due to some updated library using too much memory and causing segmentation faults in the blog engine. Now everything should be back to normality, sorry for the incnvenience :)

See all comments

Using Scala with Google App Engine

I just got access yesterday to the new Google App Engine for Java SDK, and I thought it would be fun to check if Scala works on it.

Well, it does.

Fow what is worth, I followed the standard procedure outlined in the google documentation and I was able to run my small servlet from eclipse, in java.

Simple steps to get your scala app on google app engine

  • download the GAE plugin for ecplise
  • download, if necessary, the scala plugin
  • create a new Google Web app project
  • add "scala nature" to the project
  • define a new Scala class as an ancestor to the existing servlet, letting the former implement the servlet behaviour

After this the basic servlet that was managed as the main run configuration from eclipse still worked correctly, meaning I could run it, stop it and deploy everything correctly. Adding a few other Scala classes still let the thing work nicely.

It was obviousl possible at this point to delete the old useless java class and just keep the scala servlet. A pure scala app is now able to run on google app engine.

After that, I'd have loved to build a tiny online REPL (at least I know that nothing bad will happen as google took care of the sandboxing) but while I was trying to use the existing scala.tools.nsc.Interpreter class I hit a problem causes by that same sandbox: it seems that the Scala code accesses the file system (mostly calls to File.isDirectory and similar), and those are caught by the SecurityManager who raises an AccessControlException. Eh, just what I asked for :)

Probably it is possible to work around this issue somehow, but it seems It would require more knowledge than that I currently have, so I'm leaving it until I have more time and experience.

For those not interested in using eclipse a nice writeup of the steps necessary to use Scala with GAE are here, it is again pretty simple, and it is probably a better approach.

See all comments

Seeking Rockstar Programmers

Premise: I just graduated (yay), and so did a friend of mine. The only difference is I already have a job while he's looking for it. I never had to look for a job lately, but I did remember some online thingies existed. What I did not remember was the amount of rockstar programmers required.

I am linking some offers, cause if I did not you may get a wrong picture. I am not questioning the job offer per se, they may be good and I am in no position to judge. I just wonder what the hell rockstar means.

the leading website builder for bands, is looking for a rockstar RoR developer to join our team on a freelance basis.

(here)

They seem to be a music-related company. Oh, I get the joke.

company seeking a freelance, part-time, highly productive, jack-of-all trades "rock-star" developer, software architect, systems engineer and DBA to run all of our development, including client-side, server-side, databases and infrastructure.

(here)

In this other case though.. I do not? It seem to me massively qualified in everything.

.. is seeking a rockstar Interaction Designer to join our growing UX/Product Strategy department

(here)

Here again.. what does rockstar mean in this context? Heavy on drugs? Biting off bats' heads? Having a mass of followers?

It seems that other job boards also have these kind of proposal, for example on career builder

Software Product Comp seeks Sr. .NET Rockstar!

(here)

Hell, yeah! I love the bang.

On monster.com, you can get even more

Rockstar Developers (Java/C++ or .NET!) Sought!!!

(here)

three bangs, this is what rock is all about!

Anyway, not only programming jobs in fact. I noticed a "rockstar secretary". I thought for a moment it meant being a secretary for a rockstar, but no, it actually means being able to juggle appointments and phone calls like Mick Jagger.

At some point, companies that were young, fun and creative (or at least felt so) must have thought that to attract similarly talented people they should use these kind of expressions.

Well, my tiny suggestion is: please don't. If you are cool you don't usually need an extra step to look so.

See all comments

Life polyglot (ruby,python)

dblack recently published a nice article about things rubyists need to think about when the long awaited 1.9 release finally appears (plus, a follow up).

Of course I'm happy of the changes, but I'd like to point out that as of today, 16 january 2009, we are still allowed to play with ruby in a way that is not going to be possible again in the future. For example, the following code implements a tiny game of life engine:

#! dunno how to [python,ruby].pick(random) in sh

ON = 1
OFF= 0
WIDTH          = 40    # WIDTH of "board"
HEIGHT         = 20    # HEIGHT of "board"


"""
# " 
def range(a,b) a...b end
def len(x) x.size end
def sum(x) x.inject{|a,b|a+b} end
def init() Array.new(HEIGHT){[0]*WIDTH} end
def spawn(board, left, top, shape)
"""
end=0
def gets(): raw_input()  
init=lambda :[[0 for x in range(WIDTH)] for x in range(HEIGHT)]
def spawn(board, left, top, shape):
  """
  "
  """
  # puts a new thingy on  board, at position (left, top),
  for y in range(0,len(shape)):
      for x in range(0, len(shape[0])):
          board[top + y][left + x] = shape[y][x]
      end
  end
end

t=init()
n=init()

#blinker
spawn(t,20,12,[[1,1,1]])
#glider
spawn(t,9,3,[[0,0,1],[1,0,1],[0,1,1]])
while 1:
  for y in range(1,HEIGHT-1):
    for x in range(1,WIDTH-1):
      c=sum([t[y-1][x-1],t[y-1][x],t[y-1][x+1],
            t[y][x-1],           t[y][x+1],
            t[y+1][x-1],t[y+1][x],t[y+1][x+1]])
      n[y][x]=OFF
      if c== 2:
        n[y][x]=t[y][x]
      end
      if c== 3:
        n[y][x]=ON
      end
    end
  end
  line = ""
  for y in range(0, HEIGHT):
      for x in range(0, WIDTH):
          if t[y][x]==0:
            line += '-'
          end
          if t[y][x]==1:
            line += 'O'
          end
      end
      line = line + "\n"
  end
  print line, "Life - press button for next generation\n"
  gets()
  t,n=n,t
end

Admittedly, the two effect are not 100% compatible, because python's print behaves differently from ruby's.

Why does it work?

Few reasons:

  • Ruby and python syntax are really similar
  • Even if nobody uses it, ruby allows a colon after if, while and for statements. Yet, it is forbidden after a function signature definition, so I have to conditionally define the function signature line
  • The top block is parsed differently from ruby and python: the latter sees a triple-quoted string, while ruby sees a sequence of normal strings, which are implicitly concatenated. In both cases the other language's code is inside a non used string
  • Some additional function definitions, some reordering (no else is possible) and everything works.

Well, works for some values of the word, since it executes correctly with ruby 1.8.7 and python 2.5, which are the two interpreters I have on my box, and it should work with any ruby 1.8.x and python 2.x.

But this code will not work with either python 3 or ruby 1.9. Well, just enjoy this tiny spot in time when we still were allowed to have multilingual code :)

See all comments

Writing a Shakespeare interpreter with Parrot

During the winter holidays I thought it would be fun to start learning something more about Parrot, and see how hard it would be to write an interpreter using it.

I thought that it could been nice to try implementing something simple but fun, and so I decided to try with the Shakespeare Programming Language (SPL).

Shakespeare is a beautiful language, as you can see from the code for printing fibonacci's numbers and the task proved interesting. Since parrot provides tools to generate a language skeleton it was damn fast to be up and running.

Parsing

Writing the grammar with PGE is pretty easy, for example this is the code to parse constants in the form "a pretty lady"

rule immediate {
  <article>? [<adjective> ]* <noun> {*} 
}
token noun {
  |<positive_noun>
  |<negative_noun>
  |<neutral_noun> 
  |<nothing>
}

token adjective {
  |<positive_adjective>
  |<neutral_adjective>
  |<negative_adjective>
  |<first_person_possessive>
  |<second_person_possessive>
  |<third_person_possessive>
}

(the grammatical parts (nouns, adjectives etc) were easily generated from a script and some wordlists)

The only problem was understanding that rules and tokens in PGE have the "ratchet" property of never backtracking and that longest-token-matching is (still?) not working in PGE. This mean that for example the string "an empty house" would not parse correctly when the adjective rule is ['a' | 'the' | 'an'] because 'a' would match and the rule would never backtrack. Some reordering fixed the issue.

Generating the AST is again pretty easy: you basically associate a callback method in the actions file to each rule that you want to transform and use the standard PAST::* classes to generate it, which can be as simple as this

# generating constants
method immediate($/) {
  my $value := 1;
  if $<noun><negative_noun> {
    $value := -1;
  }
  elsif $<noun><nothing> {
    $value := 0;
  }
  for $<adjective> {
    # "your" and "my" ignored
    unless $_<first_person_possessive> || 
       $_<second_person_possessive> || 
       $_<third_person_possessive> {
      $value:= $value*2;
    }
  }
  make PAST::Val.new( :value( $value ) );
}

method test($/, $k) {
  my $test := PAST::Var.new(:name('condition'));
  make PAST::Op.new( $test, 
                 $($<sentence>),
                 :pasttype($k)); # $k = "if" or "unless"
}

The actions file is written in Not Quite Perl which is a pretty basic language although it lacks sme things that would be useful in general IMHO, such list flattening. If needed though they can be used by associating the NQP classes with classes of another parrot language, such as perl6's or ruby's or lua's.

Another small issue I had was trying to conflate capture of single and multiple occurrences of a string, such as

|'exit'  <character>**{1} {*} #= exit
|'exeunt' <character> ['and' <character>]+ {*} #= exit

without explicit quantification (**{1}), the action method would get a scalar reference for the first case and a list reference for the second, and since I cannot write @($character) (NQP does not know about this) I needed an additional conditional in the action code. By explicitly quantifying I always get a list and everything is nice.

The runtime

Basic runtime functions can be written using the PIR intermediate representation, a form of high level assembly that knows about objects, namespaces, exceptions and more. Some of the code is slightly involuted because of the shakespeare language nature: most operations actually refer to both the speaker and the other character currently on stage, but it is usually quite simple to follow

.sub 'enter'
    .param string char
    get_global $P0, char
    unless null $P0 goto fin
    die "no such character in the cast!"
  fin:
    $P0['onstage'] = 1
.end

I also tweaked a little the main routine (autogenerated) to add an additional stage to the compiler. Namely, since I could not set a global "case insensitive" flag for the grammar, I introduced a pre-parsing pass to transform the input in lower case. There are probably better ways to do this, but it seems to work.

Testing

Obviously, I needed to test this. Parrot provides some tools to manage this, namely checking that the compiler outputs the right things at each stage, or that the program output is the one expected. I decided to go through another route, again wonderfully supported, of supporting TAP natively in the language.

This means you have test support builtin in the shakespeare language, bringing it up to date to the best software engineering practices.

This was admittedly extremely easy: I just added some syntax and a couple of runtime functions to print the plan ("1...10") and the test result ("ok 2", "ok 3" etc). Parrot's test harness knows how to manage that and so everything works fine.

Since I did not want to copy the existing source code I wrote everything from scratch (bar the wordlists) and used the supplied examples as reference. So I also added those to the test suite.

It seems, though, that parrot's Parrot::Test does not manage program input, so I used a ruby script as a driver. Which gets actually executed from perl, who knows about the "#!/usr/bin/env ruby" header better than me.

I was trying to port it to perl, but I'm not satisfied with the outcome, I must investigate a bit more

Conclusions

You can get the code through bitbucket, and it should work with today's (january '09) Parrot. All in all this was fun, people in #parrot and #perl6 are nice and helpful, parrot already offers great support for writing your own language and, well, you should try it too :)

Oh, and I passed my last university exam. too :D

See all comments

Leaving in 3,2,1 (again)

So, your host in this place is going to leave you for a while, again.

Two years ago, it was to move to Hungary, best time of my life.

This time, it's to move to Galway, Ireland where I'll be working with this people on some cool stuff.

I'm a bit afraid about finding a home, and I wish I'll not be completely useless in the project, but well, better to try and fail then never try.

See you, Dear Reader, I can't understand why you're still around this place, but thank you for doing so.

See all comments

Both my languages suck

Over at desperately unenterprise there is a nice article about the things the author dislikes about haskell, and an invitation to share your own quirks about your favourite language. Well, I almost equally like ruby and python. so here is a double rant.

Why ruby sucks

First, i can reopen classes but so can the others. It's all nice, and the usual idea that this will make development impossible is dumb, but it's true that it can get annoying. Matz once used to say this could be fixed by namespaces in ruby2 but I haven't heard anything about it in years and I'm not sure it will happen. Meanwhile, I still got the occasional oh, damn why isn't this working? moments, and it sucks. People may remember that nice chainsaw infanticide logger maneuver.

Modules as namespaces also suck. They rely on a wrong assumption, that everyone else will use modules as namespaces, which of course is not always true, so if you happen to use something poorly designed you may end hupo with their User class conflicting with your. Python got it right, namespaces are implicitly defined by files/directories and you have to explicitly import stuff in the current one.

But not just that, you can also import a single name from a python module. Try that with ruby's require if you can. You're either forced to split every single method in it's own file, like facets does or you impose on the end user namespace pollution.

And it's not even that namespaces are cheap to define. The usual way is

  • create a foo.rbmain file
  • create a foo/ directory
  • create many foo/xyz.rb files
  • in every single file define the module And it gets much worse with nested namespaces. And yeah I know I can do class Module::ClassName except that then I can't load that single file separately.

(I also dislike the thing about constants being lexically scoped to be fair, but probably it's just my disliking of class variables and the fact that I'd like to use constants for that)

What is nice about ruby modules is their usage as mixins, and even though I'd like them a bit different they are cool. But so under used.

The core classes in ruby are so fat that they are almost impossible to subclass. Have you ever tried subclassing String so that you can, for instance have a TextileString where you can do substitution without considering the markup? You'll never get it right enough, just String#[] has 7 different behaviors depending on th arguments, and String#[]= has 8 of them.

The Comparable mixin rocks, everybody loves Enumerable but go figure how to make an Hash-like or File-like object without subclassing

Why python sucks

The problems with python are, well, what makes it pythonesque. Yeah, I understand why you need to explicitly define self in a function, I just believe it's dumb and too verbose. But why can't I have a bit of syntax sugar ? Make .foo be a shortcut for self.foo ? At least that would also enforce the convention!

Python has such a big set of conventions that are not supported by the language. Do not use global variables, but look, there is a useful global keyword, even though you have nested namespaces.

On the other hand, it enforces some ugly conventions, suchs as the avoidance of assignments in conditionals by splitting statements and expressions. That distinction is so annoying that as time passes more and more things become both. Conditions are statements, so here is also the expression version. Functions are statements so here is the crippled lambda expression. Assignment is a statement so here are a handful of setters that avoid using the evil = symbol.

update: sorry, this got published by error, it was still largely incomplete, so I'll juust add some more comments

A lot of things that I dislike are in the "kernel library" design, and not syntax. len, map and so do not make the language less OO, I know, but they are ugly and polluting the top namespace, why can't they be made part of proper abstractions? And I hate not being able to use mutable values as key for dictionaries, even java manages to do that, it should not be that hard.

Something that I miss syntax wise instead is the lack of a case statemente, especially of a destructuring/pattern matching one. Python people will usually tell you that you just use a dictionary. Yeah, but why I have to? I can remove else-if from the language too, but it does not mean it makes sense. case/switch statements are actually a place where the language can provide very useful behaviours once someone get past the C/Java versions of it. Take a look at ruby, SML or Scala.

And, pet peeve: why I have to type ":" in definitions? the parser does not need that, and come on, it adds nothing readibility-wise.

See all comments

ActsAsAuthenticated, insecure by default?

ActsAsAuthenticathed is probably the most widely used (and deployed) plugin for managing user registration/authentication in a rails application.

As everything coming from Rick Olson it's great, giving you a complete authentication system in few steps, with a lot of tweaking points that allows you to make it suitable to your setup.

But I think the default configuration is a bit insecure.

Point is, the default encryption routine (done in #encrypt in model.rb) uses a SHA1 digest to cypher the password. But SHA1 is designed to be a fast digest function, not a password cypher. Moreover, the default password length is restricted in the 4..40 range.

Thus, supposing a user has a password with the standard characters in the [_-a-zA-Z0-9] range, the number of different combination for an acceptable password is 64^4< 17 milions.

This may seem a lot, but if you think about it is not so much, using a dumb ruby script on my old box I can find all the hashes for this kind of password in about three minutes. Yeah, in ruby, with gc kicking in and everything. Having a salt is of no importance, it just avoids finding multiple passwords at once.

Now, if you think that there are things like NSA@Home that will try all the possible passwords of 8 characters in one day (64^8 =~ 3e14), this does not sound really secure.

I'm not a security expert, but I believe this should not be like that. I'd say that to make it a bit more secure you could change the Model#encrypt method to do multiple iterations, say

def self.encrypt(password, salt)
  cypher = password
  1000.times do 
    cypher= Digest::SHA1.hexdigest("#{salt}#{cypher}")
  end
  cypher
end

But even there, I'm not a cryptanalist so I don't know if there is something like "the n iteration of SHA1(string of k elements) may be computed in time O(log(n*k))". Cryptopeople may do stuff like that, all the time, for what I know.

So this is basically a question: is AAA slightly insecure by default? If that is true, what would you use as the encrypt function?

See all comments

ECMAScript 4, the fourth system syndrome

The second system syndrome is that evil effect that often happens when redesign a small, working system so that it becomes a huge leviathan built by piling new features over new features.

Today I was reading the overview of ECMASCript 4 from ecmascript.org, and I got this very bad feeling that maybe the fourth version of ES, is suffering of an extremely strong case of this problem.

I believe that part of the success of ES in its current incarnation is that it is quite simple. Yeah, it has its shortcomings and oddities, but in general it is simple enough that everyone can learn it in no time.

ES4, on the other hand, seems to have included almost everything that came out from programming languages in the last 50 years, save Prolog and APL.

For example, it has prototypes, interfaces, classes, metaclasses, higher order types and generic functions, which for what I can tell entails all the work ever done on object-orientation (maybe predicate classes are missing, but they may be there too).

The argument list of a function may have named arguments, rest/variadic arguments, optional arguments, optional types including: non-nullable arguments, like arguments (this seems the same as in Eiffel, used to link the type of an object to another), wrap arguments that perform automatic conversion of an object of type X into another of type Y,

Whereas many languages I know provide a concept of Module or Package to handle namespacing issues, ES4 provides packages and namespaces and program units to support name hiding and modularity. Not only this, but you can introduce variables with var to have it in the object scope or with let to have it in a block, so even more namespace separation :-)

Finally, ES4 incorporates some cool features such as list comprehensions, a kind of destructuring assignment/pattern matching, python-like slicing, triple-quotes and generators (though AFAICT this are only one way generators like in older python releases, where you can yield but you can't receive values Thanks to Neil Mix for pointing out that they are complete coroutines, and support bidirectional value passing).

And operator overloading, program pragmas, required tail call elimination, a host of interesting keywords suchy as intrinsic to reify operators, dynamic to have classes that allow adding of new fields, static to have class-level functions, final to avoid inheritance and override to use it, internal, public and so on. And of course, 6 different values of falsity, with NaN, "", 0, false, null and undefined, but not {} and [].

ECMAScript 4 seems cool, I want to use a lot of those things (yay for multi method dispatch!), but maybe it is changing the language a little bit too much .

How many of the current user of ES in one of its incarnations have ever heard of product types? Will they grasp at once the logic behind the subtyping relation between functions, with all that covariant and contravariant stuff? Will they need it at all?

I hope the internet pipes will resist to the number of "metaclass vs meta namespace vs prototype vs superclass" questions on usenet.

See all comments

More Obscure Features of Ruby

First go look at Nick Sieger's post about strange perlisms in ruby, cause it may be interesting for you. But talking about strange features in ruby, I believe there are some that may result even stranger, cause they are not even inherited from other languages, but are matz' own creation, I believe.

super on builtin methods

if you're redefining a builtin method you can use super to access the original one:

>> def puts(*args)
>>  super(*args)
>>  super(*args)
>> end
=> nil
>> puts 123
123
123

yet if you try with your own:

>> def foo() puts "foo" end
=> nil
>> def foo() super end
=> nil
>> foo
NoMethodError: super: no superclass method `foo'
        from (irb):8:in `foo'
        from (irb):10

super without arguments

I knew this feature for long, but never used it in real coding, even if I keep thinking it's handy. super, when used without arguments, passes all the arguments, so the above example would become:

>> def puts(a,b)
>>  super
>>  super
>> end
=> nil
>> puts 345, 567
345
567
345
567
=> nil

The evil context-sensitive while and until modifiers:

Can you tell the difference between this two lines?

 puts 'hello' while false
 begin puts 'hello' end while false

Well, the difference is that the latter will print "hello", because the while modifier always does at least one evaluation of the expression when preceded by a begin..end block. The same happens with the until modifier. Luckily, this evil behaviour is supposed to disappear.

do something, handle exceptions.. or else?

Did you know that ruby's begin allows the definition of an else clause? The else clause is actually part of the rescue part of the block,. and is executed whenever an exception is not raised:

>> def check(x)
>>  begin
?>   raise if x==2
>>  rescue
>>   "error"
>>  else
?>   "ok"
>>  end
>> end
=> nil
>> check(1)
=> "ok"
>> check(2)
=> "error"

..and of course, you do not need begin..

because, I believe you know this, you can write:

>> def check(x)
>>   raise if x==2
>>  rescue
>>   "error"
>>  else
?>   "ok"
>> end
=> nil
>> check(1)
=> "ok"
>> check(2)
=> "error"

methods with whitespace, yay!

I find this quite useful, actually. If you define a method bypassing the ruby parser with define_method you can give it any name you want.

>> class K
>>  define_method("method: hard to get") { 'got!' }
>> end
=> #
>> K.new.send 'method: hard to get'
=> "got!"

You won't be able to call the method with normal ruby syntax, which makes it less or more useful depending on your problem space.

And finally, there is a whole slant of strange things that I believe are remnants of matz' experimenting with the language (is alias $global $other_global used somewhere outside of english.rb?), and even if some of them are quite frightening (if /regex/ anyone?) they will be phased out, so it's ok. If I recall correct ruby also had a glob operator ( <*.c> ) that was killed in the early days, so I trust matz to fix the oddities of the language sooner or later :)

See all comments

Spell Corrector in AliceML

It seems that writing a spell correcor is becoming my new post hello world program, so well, after having this in my hard disk for a while I thought I would publish my new solution, wrote using AliceML.

I'm really not good at ML, so this code must not be considered a good example of how to code, but I thing it's a good thing to keep track of my learning :)


import structure MkRedBlackSet  from "x-alice:/lib/data/MkRedBlackSet";
import structure MkRedBlackMap from "x-alice:/lib/data/MkRedBlackMap";

structure Set = MkRedBlackSet String;
structure Map = MkRedBlackMap String;

val wordlist = "/home/rff/wordlist.txt";
val alphabet = String.explode "abcdefghijklmnopqrstuvwxyz";

open List;

fun take2(i1,i2) = (fn cl => (take (cl,i1), drop (cl,i2)))

fun delete cl i = 
  let
    val (first, second) = take2 (i,i+1) cl
  in
    String.implode (first @ second)
  end

fun deletions cl = tabulate ((length cl), (delete cl));


fun substitute cl i = 
  let 
    val (first,second) = take2 (i, i+1) cl
    fun sub_char c = String.implode (first @ [c] @ second)
  in  
    map sub_char alphabet
  end 

fun substitutions cl =  concat( tabulate ((length cl), (substitute cl)));

fun insert cl i = 
  let 
    val (first, second) = take2 (i,i) cl
    fun ins_char c = String.implode (first @ [c] @ second)
  in  
    map ins_char alphabet
  end 

fun insertions cl =  concat( tabulate ((length cl)+1, (insert cl)));

fun transpose cl i = 
  let 
    val (first,second) = take2 (i,i+2) cl
    val c1 = nth (cl,i)
    val c2 = nth (cl,i+1)
  in  
    String.implode (first @ [c2] @[c1] @ second)
  end 

fun transpositions cl =  tabulate ((length cl)-1, (transpose cl));


fun edits1 word = 
  let 
    val w = String.explode word 
    val ws = deletions w @ transpositions w @ insertions w @  substitutions w
  in 
    foldl (Fn.flip Set.insert) (Set.empty) ws
  end 

fun insertOrUpdate map key = 
  case Map.lookup (map, key) of
     NONE => Map.insert (map, key, ref 1)
    |SOME value => (value := (!value +1) ; map)


(* train *)
fun train io =
  let 
    val chop = fn str => String.substring (str, 0, (String.size str) - 1)
  in
    case TextIO.inputLine io of
       NONE => Map.empty
      |SOME line => insertOrUpdate (train io) (chop line)
    end  

val nwords = train ( TextIO.openIn wordlist);

fun known_edits2  word = 
  let 
    fun set_edits set = Set.fold (fn (str,set) => Set.union (set, known (edits1 str))) Set.empty set
  in
    set_edits (edits1 word)
  end

fun look k = !(valOf (Map.lookup  (nwords, k)));


fun correct word =
  let 
    val candidates = find (not o Set.isEmpty) 
                           [ known (Set.fromList [word]),
                             known (edits1 word),
                             known_edits2 word]
     val sort = sort (fn (a,b)=>Int.compare ((look b), (look a))) o Set.toList
   in
     case candidates of
       SOME words => hd(sort words)
      |NONE     => word
   end  

correct "ciao" (*ciao match*);
correct "cao"  (*ciao edit 1*);
correct "mioo" (*miao edit 1*);
correct "miaoo"(*miao edit 1*);
correct "mooo" (*mao  edit 2 max value*);
correct "quux" (*quux no match*);

The worse part is the string manipulation, I think, and maybe the fact that I use references (aka mutable variables) when building the frequency count Map, but in the end I'm quite satisfied. Feel free to point out obvious errors, if you see them.

See all comments

AddThis Social Bookmark Button