Python to get Google-backed speedup

johnnybgoode · on June 13, 2009

This is a serious question. Can someone explain to me why higher-level languages like Python, Ruby, and JavaScript can't share a fast VM? It doesn't have to be the JVM or .NET, but why is everybody duplicating so much work? I know the Perl guys were moving in this direction, but nobody else seems to be onboard. Even within Google, they have v8 for JavaScript already.

etal · on June 13, 2009

It's usually one or more of:

1. Using a simple prototype VM seemed easier during initial development

2. When the language was developed, no suitable common VM existed

3. The language developers thought they could do it better for their specific case, i.e. the point of the implementation is to showcase new VM strategies

Remember that a language and an interpreter are separate things, and usually an author who wants to try an idea for one has to find some suitable stand-in for the other in order to get started.

johnnybgoode · on June 13, 2009

Thanks. Your #1 and #2 seem to be very common.

It looks like we're at a point now where many dynamic languages are attempting to transition to a more advanced VM. This includes Perl, Python, Ruby, and some implementations of JavaScript. I assume they all want to take advantage of the latest VM techniques, and they're all making major changes anyway, so again I just wonder why they don't seriously consider collaborating now.

etal · on June 14, 2009

I think it's interesting how the success of projects like this sometimes seems to depend completely on a single person having the right combination of interest and skill.

The Jython project was just the coolest darn thing until, from what I can tell, the lead developer got tired of maintaining a mature project with a hairy codebase and left to join (start?) PyPy, an even more ambitious project. So Jython was stuck in a coma with a pre-alpha 2.2 release for like half a decade, while CPython marched on, and no new maintainer was able to bring it back to consciousness -- despite huge demand for it, since Jython at the time was the perfect solution to problems in Java and Python. Finally Frank Wierzbicki came along a couple years ago, squeezed out the stubborn 2.2 release and is on the verge of version 2.5, at which point we can consider Jython back open for business. There's a kind of similar story with Psyco, and I think that one also concludes with PyPy being the "right answer" to the original problem.

But back on topic -- in some ways, the dynamic-language communities are collaborating on common VMs. The JVM team is trying to make things easier for other languages with "invokedynamic", the .NET world is keeping up at least (DLR? dunno), and LLVM has good traction as a compilation target.

However, I doubt we'll see anyone devote significant effort to porting Python or Ruby to Parrot until Perl 6 lands and provides a compelling demonstration. Meanwhile, PyPy is getting close enough to its goal that we might just want to wait and see if their JIT-compiler-generator solves all of these problems in one shot.

johnnybgoode · on June 14, 2009

Good info, thanks. I think you're right about Perl 6 needing to be finished first. It's too bad, because it looks like some opportunities might be missed by then.

briansmith · on June 13, 2009

Why not just write a program that can convert any program written in one language to any of the others? It is basically the same problem and about the same level of difficulty.

ori_b · on June 13, 2009

How much of the source level semantics do you want to preserve? If you don't care about many of them then it's not too hard to do (although quite pointless). Just compile to an array of bytes, and a turing machine emulator or equivalent, in your target language of choice, and you've converted.

The hard part comes when you want to preserve source-level semantics and provide a natural-feeling translation so that a human can understand, read, and write the source.

However, with a VM you don't have that problem. With a VM, a lossy, one-way transform is sufficient. You can easily throw away information as you compile, since you're not targeting humans consumption for the output. You can replace, say, overloaded functions with mangled names. You can turn multiple dispatch into decision trees of conditional statements. And so on. With a VM, you would want to keep enough high-level detail to efficiently JIT stuff, and it's a bit of a challenge to decide what detail is sufficiently useful to include, but it's not insurmountable.

johnnybgoode · on June 13, 2009

Edit: Were you being sarcastic? I don't think it's exactly the same, because being able to compile many languages down to a single language is easier than also having to go in the other direction.

Original comment: Yeah, that would be one solution, but I think the shared VM helps so you can easily access libraries written in different languages. I remember why (of Ruby fame) trying to run Ruby code on Google App Engine back when it was Python-only. He discovered that Ruby bytecode is nearly identical to Python bytecode. Like me, he didn't understand why Python and Ruby still need to be on separate runtimes.

briansmith · on June 14, 2009

No, I wasn't being sarcastic. If you want to use libraries written in Python in Ruby (just as an example), you need to be able to convert all the Pythonisms to Rubyisms and/or Rubyisms to Pythonisms. That is the case whether you are doing source-to-source rewriting or using a shared VM.

Note that the JVM and .NET runtimes do pretty much exactly what was asked--they run programs written in multiple languages on the same VM. Because the "native" VMs for Python and Ruby are so bad, the JVM and .NET VMs can even out-perform the native ones for most tasks.

karanbhangui · on June 13, 2009

That wasn't sarcasm; it was a rhetorical question to illustrate a point.

johnnybgoode · on June 13, 2009

Thank you for adding so much to the discussion.

(That was sarcasm.)

tlrobinson · on June 14, 2009

V8 goes down to machine code, not bytecode, btw.

kingsley_20 · on June 13, 2009

Isnt that the JVM? don't know about fast though, but fast enough for me.

pronoiac · on June 14, 2009

I've heard there was an attempt to pursue this a few years ago, but there were very serious fundamental incompatibilities, with examples given that went way over my head.

johnnybgoode · on June 14, 2009

Interesting, do you remember anything else, like which languages were involved?

pronoiac · on June 16, 2009

I associate it with the Perl Apocalypses. I think it predated Parrot, whose popularity makes this hard to search for, & the languages I'm surest of are Python & Perl, unfortunately. I think it took place at a conference, possibly OSCON.

ori_b · on June 13, 2009

Look up the "Parrot" VM. That seems to be their goal.

johnnybgoode · on June 13, 2009

I know the Perl guys were moving in this direction, but nobody else seems to be onboard.

That's what I meant. As far as I know, nobody else is taking it seriously as a primary VM for their language.

granular · on June 13, 2009

There are a number of Parrot languages at various stages of development. As soon as they mature a bit, you might find people taking them pretty seriously. :)

johnnybgoode · on June 13, 2009

That would be great. But I'm wondering why the authors of other major dynamic languages haven't chosen to collaborate and use Parrot as their primary VM.

granular · on June 14, 2009

Parrot 1.0.0 was released in March of this year. Give it time.

phr · on June 14, 2009

A new technology needs to be MUCH better than any technology it hopes to supplant. Of course "better" can have many different meanings.

Is Parrot 1.0.0 radically better than other (non-perl) VMs?

ori_b · on June 13, 2009

Mostly, because they already have a VM that works.

johnnybgoode · on June 13, 2009

I guess, but many of them seem to want a more advanced VM. (http://news.ycombinator.com/item?id=656594)

llimllib · on June 13, 2009

why ask this question on an article about python possibly moving to LLVM? Rubinius and MacRuby target LLVM as well, for example.

ori_b · on June 13, 2009

LLVM isn't so much a VM for those languages to use directly as a portable assembly toolkit.

It's too low level to do compile to directly. It'd occupy a place in the stack somewhere near assembly, but with an optimizer in place. It's more of a VM toolkit than a VM itself.

bayareaguy · on June 13, 2009

For people who already know all about Python, LLVM, etc the http://code.google.com/p/unladen-swallow/wiki/ProjectPlan page is more informative.

sker · on June 13, 2009

I think the most important point here is not the speedup itself, but the fact that Google is officially backing Python. That'd give Big Corps confidence when adopting the technology.

johnnybgoode · on June 13, 2009

I think hiring Guido and making Python one of the few allowed languages at Google already counts as officially backing the language. But yes, this does seem to be another step in that direction.

ghshephard · on June 13, 2009

My favorite line (buried a few hundred lines deep) --IN 2009 Q3 and Beyond:

"In addition, we intend to remove the GIL and fix the state of multithreading in Python."

thorax · on June 13, 2009

The GIL has really frustrated me for multiple years. When you're embedding Python into an application, it's really difficult to take advantage of the best multi-core options Python has because those options are all multi-process tricks.

At Pycon last year, when I reminded Guido about the difficulty for embedded apps (that can't fork 200MB processes every time a thread is needed), he shrugged and recommended Jython or IronPython usage.

It's great to see that we'll have this problem solved in a few years. (I assume it will take that long to be production-worthy.)

I'm sure this LLVM integration and change will pretty much break/invalidate all of the embedding API, but to me it'd be worth the rewrite.

stcredzero · on June 14, 2009

I'd like to see an optimization involving threads and GC. Some Smalltalk VMs only spawn real threads when there is a system call involved. However, the stack frames involved in the calls are momentarily exempt from GC. This means that the GC never has to wait for threads to synchronize to proceed with operations. But the language gets all of the benefits of real threads.

limmeau · on June 14, 2009

I don't understand the "stack frames are exempt from GC" thing -- do you mean the stack frames do not contribute reachability roots to the garbage collection or that the stack frames themselves are not garbage-collected?

stcredzero · on June 14, 2009

Nothing directly involved in the call participates in GC. In VisualWorks Smalltalk, these calls are special. (These are only calls out through the DLLCC facility. I believe they are marked by pragmas.) They might contribute reachability roots, though that might be problematic. I seem to remember that objects created in that stack frame are not garbage collected. Since these methods are marked, the compiler can treat these calls specially. I'm pretty sure that the objects are spawned in a different space. I think this would mean that those stack frames can't mutate objects created before the call. (This would make these calls functional in flavor.) They should be able to refer to such objects, as this will only cause an exception if things go wrong.

The only thing that happens in these methods in VisualWorks Smalltalk is marshalling to call out to DLLs implemented in C.

silentbicycle · on June 14, 2009

Have you looked at Lua? It's pretty similar to Python (with a bit more Scheme influence), but is very deliberately designed for embedding. Also, all state is contained in the Lua_state pointer, so you can concurrently run several Lua VMs when it suits you to. You don't need to wait years for Python to work its concurrency issues out.

nc · on June 13, 2009

Tbh what is really needed is optimized message passing ala the actor model. It is a leap forward and avoids a lot of problems that threads face (subtle concurrency bugs).

fauigerzigerk · on June 14, 2009

Yes it does avoid many problems, but it's unsuitable for other important scenarios. We need both. Message passing should be a design choice, not an excuse for a limitation in some language runtime.

nc · on June 14, 2009

That's interesting, which scenarios do you consider it unsuitable for?

fauigerzigerk · on June 14, 2009

The scenario I deal with most of the time is a large, read-mostly, in memory custom data structure that is accessed by multiple clients. More and more data fits into memory nowadays so that's going to be a very common scenario for many data analysis tasks in the future.

ralph · on June 14, 2009

You mean CSP-style, like Alef and Limbo from Bell Labs? Have you come across PyCSP? It has multiple backends, including greenlets which are nice and lightweight. http://code.google.com/p/pycsp/

euccastro · on June 15, 2009

Stackless is inspired in that paradigm too.

Locke1689 · on June 13, 2009

This is truly very exciting, but the more I see of parallel language development, the more I want to focus on languages like Haskell, which have no side effects and therefore have some substantial implicit parallelization effect. Don't get me wrong, I love Python (and C) but I wouldn't be surprised if we were all using a Haskell derivative in 5 years instead.

pieceofpeace · on June 14, 2009

More and more languages are acquiring functional-programming features, so, may be we will see a hybrid of Python & Haskell.

  languages like Haskell, which have no side effects

Haskell does have side-effects. Most programs have side-effects. Functional languages like Haskell are designed to separate side-effects and pure functions. It gives the interpreter/compiler the ability to parallelize the pure function.

blhack · on June 13, 2009

On UNIX/Linux it is also viewed by many as a modern replacement for Perl

I still have a LOT of perl in use simply because of my hatred of python's regular expressions and I suspect that I'm not the only one...

DannoHung · on June 14, 2009

I dunno man, my way of thinking is that if you're not screen scraping, regexes shouldn't show up all that much and if they're not showing up that often, Python's way of doing it isn't so bad.

Not much of a condolence if you're screen scraping, but still.

llimllib · on June 13, 2009

what do you hate about python regexes?

ghshephard · on June 13, 2009

It's difficult to explain what it is, precisely, about python RegExes that make them awkward, and perhaps if I had never encountered the syntatic regex sugar that is perl, I would have never noticed. But, as a total python advocate and frequent writer of python scripts - I still find my brain hitting wait-states that don't occur in perl when I need to pattern match. I can live with that because of the 50+ things that make python so much better (for me) than Perl (Hash of Array / Array of Hash at the top of the list) - but it's sad that I still don't regex in Python as fluidly as perl.

I think the first hint that Python wasn't developed with RegEx users in mind would be that in the 700 Page "Learning Python, 3rd Edition" - Regular expressions aren't even mentioned in the Index.

llimllib · on June 13, 2009

well, I don't disagree that there's a tradeoff there - the s/ and =~ syntaxes are super neat, as opposed to the readable and consistent python style.

I can see that as a reason for preferring perl's syntax, but I just don't understand hating python's regex, which is what my question was about.

blhack · on June 13, 2009

It probably has a lot to do with my lack of programming knowledge (I've never taken any formal programming classes [err...I took java in college but never actually went to the scheduled class because it was a lot of covering what these cRaZy things called variables were])...

A replacement on python looks like this...

foo = re.compile("<") bar = foo.replace(">", bar)

this works, but it just feels less awkward to me than

$bar = s/</>/

llimllib · on June 13, 2009

hmmm, I would write the equivalent to your perl as:

    >>> re.sub("<", "&lt;", "something < other")
    'something &lt; other'

which seems much more readable and consistent to me than s/a/b, though the s/ syntax is great for vim users like me.

I don't think this has as much to do with your programming knowledge as your python knowledge?

dfsdfsdsfdsf · on June 14, 2009

More likely: s = "something < other" s = s.replace("<", "<") Even more likely: import cgi s = cgi.escape(s)

In Python, much string manipulation is done with builtins (especially string methods) and library functions. The latter work especially well when you target a specific domain (such as HTML) and will often be implemented in regexps. Writing one's own regexps are for when you actually need a small custom state machine. Many problems are too small or too large to justify the effort.

CUViper · on June 13, 2009

One perl nicety is that the VM will optimize to only compile the expression once. Python's re.sub() has to compile the expression every time, since it's just a string argument rather than a language construct. You can manually cache the result of re.compile() -- it's just not as sugary that way.

amix · on June 13, 2009

That's actually untrue. If you look in the implementation of the `re` module you would see that it caches the compilation of any regular expression.

CUViper · on June 14, 2009

Aha -- I almost suggested that they could do some memoization, and indeed they are doing just that. The last 100 unique patterns are compiled and stored in a dictionary for future use. You could still thrash badly if your regex "working set" is more than 100 patterns, but for most people it's probably Good Enough. (Plus you could override re._MAXCACHE if you really want.)

Thanks for pointing out my error, and getting me to read the actual code instead of speculating. :)

granular · on June 13, 2009

I disagree. Python regexen are a royal pain compared to how they work in Perl. You know what would be pretty funny is if a rogue group of guerilla programmers hacked Perl style regex match and replace syntax into Python.

Heheheh, boy would that get some bees in some bonnets on the Python dev ML. :)

ubernostrum · on June 13, 2009

It's been done, by Andrew Dalke, using a Python extension toolkit he developed:

http://dalkescientific.com/Python/python4ply-tutorial.html#p...

He presented it at PyCon a couple years ago, response seemed to be "that's cool".

kevbin · on June 13, 2009

More LLVM can't be a bad thing.

yason · on June 14, 2009

I hope they would incorporate the Stackless Python branch into this one as well.

scorpioxy · on June 14, 2009

So once the port to LLVM is done, how would that compare to say something like Jython?

I always thought(not based on actual experience) that running Jython or IronPython would give you all the mentioned benefits.

epall · on June 14, 2009

Zope? The most popular web framework? Cool! I had no idea it was still in use that much. Can anybody shed light on who uses it these days?

Adlai · on June 14, 2009

I'm disappointed that Ruby isn't getting this type of attention. I think that Ruby strikes a good balance by allowing highly readable code like Python, but tolerating denser syntax for when you don't feel like talking to a 3-year-old.

However, it is good that Google is putting it's weight behind this project, because anybody who uses anything built on Python (which has weaseled its way into virtually everything nowadays) is going to experience a performance boost.

jjames · on June 14, 2009

I resent the suggestion that python has weaseled its way into anything. It rather slithers, wouldn't you say?

FraaJad · on June 14, 2009

Weasled?

A programming language is what it's developers and users make of it. Python people have worked very very hard in the last 2 years to be ahead when everybody and their uncle were going ga-ga over Rails.

Adlai · on June 15, 2009

I actually don't have that much experience with Rails. I prefer Ruby over Python because of its flexibility -- Ruby doesn't force you to approach problems in a specific way just because the language designer decided that was the only way to do it.

FraaJad · on June 19, 2009

Let's say, your brain groks ruby better than python and you feel more productive with it's "flexibility". No need to dismiss the hard work of python developers over the last couple of years as "weaseled".

Maybe this "flexibility" you talk of is not all that cracked up to be for most of the programmers. Python flourishes because it too is opinionated in it's ways and a lot of people tend to agree with the set of choices made by the language designer.

TweedHeads · on June 14, 2009

Python would be perfect if it wasn't for the underscores that plague its syntax.

I __hate__ that

cloudhead · on June 14, 2009

My dislike of underscores is part of what made me chose ruby over python, when moving from php.

euccastro · on June 15, 2009

Underscores are there to mark 'magic' stuff. That's a good reason for the slight ugliness.

c00p3r · on June 14, 2009

investing in "new visual basic"...