This is a serious question. Can someone explain to me why higher-level languages like Python, Ruby, and JavaScript can't share a fast VM? It doesn't have to be the JVM or .NET, but why is everybody duplicating so much work? I know the Perl guys were moving in this direction, but nobody else seems to be onboard. Even within Google, they have v8 for JavaScript already.
1. Using a simple prototype VM seemed easier during initial development
2. When the language was developed, no suitable common VM existed
3. The language developers thought they could do it better for their specific case, i.e. the point of the implementation is to showcase new VM strategies
Remember that a language and an interpreter are separate things, and usually an author who wants to try an idea for one has to find some suitable stand-in for the other in order to get started.
It looks like we're at a point now where many dynamic languages are attempting to transition to a more advanced VM. This includes Perl, Python, Ruby, and some implementations of JavaScript. I assume they all want to take advantage of the latest VM techniques, and they're all making major changes anyway, so again I just wonder why they don't seriously consider collaborating now.
I think it's interesting how the success of projects like this sometimes seems to depend completely on a single person having the right combination of interest and skill.
The Jython project was just the coolest darn thing until, from what I can tell, the lead developer got tired of maintaining a mature project with a hairy codebase and left to join (start?) PyPy, an even more ambitious project. So Jython was stuck in a coma with a pre-alpha 2.2 release for like half a decade, while CPython marched on, and no new maintainer was able to bring it back to consciousness -- despite huge demand for it, since Jython at the time was the perfect solution to problems in Java and Python. Finally Frank Wierzbicki came along a couple years ago, squeezed out the stubborn 2.2 release and is on the verge of version 2.5, at which point we can consider Jython back open for business. There's a kind of similar story with Psyco, and I think that one also concludes with PyPy being the "right answer" to the original problem.
But back on topic -- in some ways, the dynamic-language communities are collaborating on common VMs. The JVM team is trying to make things easier for other languages with "invokedynamic", the .NET world is keeping up at least (DLR? dunno), and LLVM has good traction as a compilation target.
However, I doubt we'll see anyone devote significant effort to porting Python or Ruby to Parrot until Perl 6 lands and provides a compelling demonstration. Meanwhile, PyPy is getting close enough to its goal that we might just want to wait and see if their JIT-compiler-generator solves all of these problems in one shot.
Good info, thanks. I think you're right about Perl 6 needing to be finished first. It's too bad, because it looks like some opportunities might be missed by then.
Why not just write a program that can convert any program written in one language to any of the others? It is basically the same problem and about the same level of difficulty.
How much of the source level semantics do you want to preserve? If you don't care about many of them then it's not too hard to do (although quite pointless). Just compile to an array of bytes, and a turing machine emulator or equivalent, in your target language of choice, and you've converted.
The hard part comes when you want to preserve source-level semantics and provide a natural-feeling translation so that a human can understand, read, and write the source.
However, with a VM you don't have that problem. With a VM, a lossy, one-way transform is sufficient. You can easily throw away information as you compile, since you're not targeting humans consumption for the output. You can replace, say, overloaded functions with mangled names. You can turn multiple dispatch into decision trees of conditional statements. And so on. With a VM, you would want to keep enough high-level detail to efficiently JIT stuff, and it's a bit of a challenge to decide what detail is sufficiently useful to include, but it's not insurmountable.
Edit: Were you being sarcastic? I don't think it's exactly the same, because being able to compile many languages down to a single language is easier than also having to go in the other direction.
Original comment: Yeah, that would be one solution, but I think the shared VM helps so you can easily access libraries written in different languages. I remember why (of Ruby fame) trying to run Ruby code on Google App Engine back when it was Python-only. He discovered that Ruby bytecode is nearly identical to Python bytecode. Like me, he didn't understand why Python and Ruby still need to be on separate runtimes.
No, I wasn't being sarcastic. If you want to use libraries written in Python in Ruby (just as an example), you need to be able to convert all the Pythonisms to Rubyisms and/or Rubyisms to Pythonisms. That is the case whether you are doing source-to-source rewriting or using a shared VM.
Note that the JVM and .NET runtimes do pretty much exactly what was asked--they run programs written in multiple languages on the same VM. Because the "native" VMs for Python and Ruby are so bad, the JVM and .NET VMs can even out-perform the native ones for most tasks.
I've heard there was an attempt to pursue this a few years ago, but there were very serious fundamental incompatibilities, with examples given that went way over my head.
I associate it with the Perl Apocalypses. I think it predated Parrot, whose popularity makes this hard to search for, & the languages I'm surest of are Python & Perl, unfortunately. I think it took place at a conference, possibly OSCON.
There are a number of Parrot languages at various stages of development. As soon as they mature a bit, you might find people taking them pretty seriously. :)
That would be great. But I'm wondering why the authors of other major dynamic languages haven't chosen to collaborate and use Parrot as their primary VM.
LLVM isn't so much a VM for those languages to use directly as a portable assembly toolkit.
It's too low level to do compile to directly. It'd occupy a place in the stack somewhere near assembly, but with an optimizer in place. It's more of a VM toolkit than a VM itself.
I think the most important point here is not the speedup itself, but the fact that Google is officially backing Python. That'd give Big Corps confidence when adopting the technology.
I think hiring Guido and making Python one of the few allowed languages at Google already counts as officially backing the language. But yes, this does seem to be another step in that direction.
The GIL has really frustrated me for multiple years. When you're embedding Python into an application, it's really difficult to take advantage of the best multi-core options Python has because those options are all multi-process tricks.
At Pycon last year, when I reminded Guido about the difficulty for embedded apps (that can't fork 200MB processes every time a thread is needed), he shrugged and recommended Jython or IronPython usage.
It's great to see that we'll have this problem solved in a few years. (I assume it will take that long to be production-worthy.)
I'm sure this LLVM integration and change will pretty much break/invalidate all of the embedding API, but to me it'd be worth the rewrite.
I'd like to see an optimization involving threads and GC. Some Smalltalk VMs only spawn real threads when there is a system call involved. However, the stack frames involved in the calls are momentarily exempt from GC. This means that the GC never has to wait for threads to synchronize to proceed with operations. But the language gets all of the benefits of real threads.
I don't understand the "stack frames are exempt from GC" thing -- do you mean the stack frames do not contribute reachability roots to the garbage collection or that the stack frames themselves are not garbage-collected?
Nothing directly involved in the call participates in GC. In VisualWorks Smalltalk, these calls are special. (These are only calls out through the DLLCC facility. I believe they are marked by pragmas.) They might contribute reachability roots, though that might be problematic. I seem to remember that objects created in that stack frame are not garbage collected. Since these methods are marked, the compiler can treat these calls specially. I'm pretty sure that the objects are spawned in a different space. I think this would mean that those stack frames can't mutate objects created before the call. (This would make these calls functional in flavor.) They should be able to refer to such objects, as this will only cause an exception if things go wrong.
The only thing that happens in these methods in VisualWorks Smalltalk is marshalling to call out to DLLs implemented in C.
Have you looked at Lua? It's pretty similar to Python (with a bit more Scheme influence), but is very deliberately designed for embedding. Also, all state is contained in the Lua_state pointer, so you can concurrently run several Lua VMs when it suits you to. You don't need to wait years for Python to work its concurrency issues out.
Tbh what is really needed is optimized message passing ala the actor model. It is a leap forward and avoids a lot of problems that threads face (subtle concurrency bugs).
Yes it does avoid many problems, but it's unsuitable for other important scenarios. We need both. Message passing should be a design choice, not an excuse for a limitation in some language runtime.
The scenario I deal with most of the time is a large, read-mostly, in memory custom data structure that is accessed by multiple clients. More and more data fits into memory nowadays so that's going to be a very common scenario for many data analysis tasks in the future.
You mean CSP-style, like Alef and Limbo from Bell Labs? Have you come across PyCSP? It has multiple backends, including greenlets which are nice and lightweight. http://code.google.com/p/pycsp/
This is truly very exciting, but the more I see of parallel language development, the more I want to focus on languages like Haskell, which have no side effects and therefore have some substantial implicit parallelization effect. Don't get me wrong, I love Python (and C) but I wouldn't be surprised if we were all using a Haskell derivative in 5 years instead.
More and more languages are acquiring functional-programming features, so, may be we will see a hybrid of Python & Haskell.
languages like Haskell, which have no side effects
Haskell does have side-effects. Most programs have side-effects. Functional languages like Haskell are designed to separate side-effects and pure functions. It gives the interpreter/compiler the ability to parallelize the pure function.
I dunno man, my way of thinking is that if you're not screen scraping, regexes shouldn't show up all that much and if they're not showing up that often, Python's way of doing it isn't so bad.
Not much of a condolence if you're screen scraping, but still.
It's difficult to explain what it is, precisely, about python RegExes that make them awkward, and perhaps if I had never encountered the syntatic regex sugar that is perl, I would have never noticed. But, as a total python advocate and frequent writer of python scripts - I still find my brain hitting wait-states that don't occur in perl when I need to pattern match. I can live with that because of the 50+ things that make python so much better (for me) than Perl (Hash of Array / Array of Hash at the top of the list) - but it's sad that I still don't regex in Python as fluidly as perl.
I think the first hint that Python wasn't developed with RegEx users in mind would be that in the 700 Page "Learning Python, 3rd Edition" - Regular expressions aren't even mentioned in the Index.
It probably has a lot to do with my lack of programming knowledge (I've never taken any formal programming classes [err...I took java in college but never actually went to the scheduled class because it was a lot of covering what these cRaZy things called variables were])...
A replacement on python looks like this...
foo = re.compile("<")
bar = foo.replace(">", bar)
this works, but it just feels less awkward to me than
More likely:
s = "something < other"
s = s.replace("<", "<")
Even more likely:
import cgi
s = cgi.escape(s)
In Python, much string manipulation is done with builtins (especially string methods) and library functions. The latter work especially well when you target a specific domain (such as HTML) and will often be implemented in regexps. Writing one's own regexps are for when you actually need a small custom state machine. Many problems are too small or too large to justify the effort.
One perl nicety is that the VM will optimize to only compile the expression once. Python's re.sub() has to compile the expression every time, since it's just a string argument rather than a language construct. You can manually cache the result of re.compile() -- it's just not as sugary that way.
Aha -- I almost suggested that they could do some memoization, and indeed they are doing just that. The last 100 unique patterns are compiled and stored in a dictionary for future use. You could still thrash badly if your regex "working set" is more than 100 patterns, but for most people it's probably Good Enough. (Plus you could override re._MAXCACHE if you really want.)
Thanks for pointing out my error, and getting me to read the actual code instead of speculating. :)
I disagree. Python regexen are a royal pain compared to how they work in Perl. You know what would be pretty funny is if a rogue group of guerilla programmers hacked Perl style regex match and replace syntax into Python.
Heheheh, boy would that get some bees in some bonnets on the Python dev ML. :)
I'm disappointed that Ruby isn't getting this type of attention. I think that Ruby strikes a good balance by allowing highly readable code like Python, but tolerating denser syntax for when you don't feel like talking to a 3-year-old.
However, it is good that Google is putting it's weight behind this project, because anybody who uses anything built on Python (which has weaseled its way into virtually everything nowadays) is going to experience a performance boost.
A programming language is what it's developers and users make of it. Python people have worked very very hard in the last 2 years to be ahead when everybody and their uncle were going ga-ga over Rails.
I actually don't have that much experience with Rails. I prefer Ruby over Python because of its flexibility -- Ruby doesn't force you to approach problems in a specific way just because the language designer decided that was the only way to do it.
Let's say, your brain groks ruby better than python and you feel more productive with
it's "flexibility". No need to dismiss the hard work of python developers over the last couple of years as "weaseled".
Maybe this "flexibility" you talk of is not all that cracked up to be for most of the programmers. Python flourishes because it too is opinionated in it's ways and a lot of people tend to agree with the set of choices made by the language designer.