PyPy and the future of interpreted languages

How PyPy is compiled and how does it work?

Right off the bat: What is PyPy? PyPy is Python implemented in Python. This means that the interpreter is written in Python, but what started as an experimental idea turned into a fast, efficient Python implementation with a JIT compiler that has the potential to implement other dynamic interpreted languages as well.

Implementation of the Python 2 specification is done in RPython – the so-called restricted Python, which is a stripped down version. The constraints that RPython has on objects make it easier to infer their types. The PyPy tool chain then has an easy time translating the RPython program to C, JVM bytecode or CIL. It also builds a JIT compiler in the resulting interpreter.

PyPy is capable of running any common Python code.

Why PyPy?

Speed

First and foremost – speed. PyPy's website maintains a page that's updated at each PyPy version which contains the accomplishments – various benchmarks and how much faster it is than those compared with CPython. These are kinda assumed, since PyPy has a JIT that compiles code bits to machine code at runtime, but what you can't assume is that PyPy can be even faster than C. This has been proven twice in edge cases where the dynamic nature of the problem combined with JIT compilation can be used to improve speed compared with a standard static compilation: here and here.

Portability

We mentioned this earlier, but the toolchain used to build PyPy to a JIT-powered C build can also build the same to JVM bytecode or CIL flavors, which are the rough equivalent of Jython and IronPython (this process is often called translation). For more information, read the chapter in the second volume of the book The Architecture of Open Source Applications.

Memory optimization

The garbage collection has been improved, and it is still a field of development for PyPy, some programs might take less memory when working under it.

Sandboxing

A different approach to running untrusted Python code. Rather than limiting language and standard library features that can possibly harm the system, PyPy reroutes the system calls to an external process handling the safe policy.

Numpy

A substantial portion of Python users actually live in the worlds of academia and data analyis doing numerical computations. Numpy was created in that world and spawned a multitude of libraries like scipy, matplotlib, scikits and pandas which replaced Matlab and Mathematica as slow and/or expensive.

But…anyone who has used some of these libraries knows that compiling Numpy requires the Fortran language, as well as the immensely optimized libraries such as blas and lapack, written in Fortran which are de facto standards in numerical computations.

For CPython, there is a tool f2py for creating wrappers around Fortran libraries, and this is how the CPython's Numpy is created. Similarly, a f2pypy is a tool that creates PyPy wrappers, but it's still not complete and has some rough edges.

Numpy remains one of the greatest challenges for the PyPy developers, but there is a great progress already on the way.

What is cffi?

When you write extensions for Python that uses some kind of C library, you need to write using C and additionally you need to know Cython, SWIG or ctypes. For Python programmers not properly introduced to C, this can have great learning curve. But for PyPy, this is practically impossible.

Fortunately, cffi came to relieve this situation. It is basically a Python interface for foreign functions in C. It is not a part of PyPy, meaning that cffi can be used from CPython as well. The user simply calls C code from Python, there is no need to learn specific mediator APIs, and there is no C code that needs to be written. The examples can provide an additional demonstration.

Sprawl

A couple of months ago, PyPy saw its first release on the ARM platform. So you can run PyPy on Raspberry Pi. This leaves the ARM door open for any language optimal enough as PyPy, and there are some languages already implemented with the RPython toolchain.

Most notably, there is Topaz, which is a PyPy-like implementation of Ruby. An interesting implementation of PyPy is created with JavaScript as a backend, PyPy.js. Conversely, a GSOC project was done as Javascript implementation in Python, lang-js, but currently is not in development. As a pet project, the people from PyPy also implemented an incomplete PHP implementation with the RPython translator. And we can see that someone did their bachelor thesis implementing Pyrolog – a Prolog implementation in RPython.


Originaly published at blog.gigavoice.com

comment? published: 44 months ago. tags: cffi, jit, numpy, pypy, python


Python 3.3 released

The long awaited python 3.3 version is finally here! There are quite lot improvements and additions:
  • explicit unicode literals - revived from the python2 series, the intent here is supposed to be easier porting from pythom2 to python3. [PEP 414]
  • yield from expression - something that will ease writing of coroutines [PEP 380]
  • ChainMap collection - in collections module of course, see docs
  • ipaddress module - a wonderful addition for working with IP addresses and networks, with support for IPv6 as well, see docs
... and many more, see the release report. Also, a healthy discussion on hacker news
comment? published: 56 months ago. tags: python, python3


Curiosity rover's software

Well that was an astonishing achievement! The whole humanity is so proud. Here is a programmers @ stackexchange thread that has some information about what its software was written in, supposedly embedded c; as well as that the tests were written in python.
comment? published: 58 months ago. tags: c, curiosity, python, stackexchange


Some performance tips when dealing with data in python

One of the main obstacles of python achieving domination in the machine learning / data mining field is probably the talk of it being not efficient enough. There is however, way of achieving better performance if you're careful enough. Bellow are some excellent suggestions, some of them I have personally tried (learned the hard way), such as using namedtuples instead of classes, or parsing csv with int/float instead of the csv parser from the standard library; as well as some of the numpy's more obscure routines for searching in arrays. Also, once you get used to profiling - you easily become addicted.

Expensive lessons in Python performance tuning
comment? published: 58 months ago. tags: data-mining, machine-learning, optimization, pandas, python, scipy


python notes