How PyPy is compiled and how does it work?
Right off the bat: What is PyPy? PyPy is Python implemented in Python. This means that the interpreter is written in Python, but what started as an experimental idea turned into a fast, efficient Python implementation with a JIT compiler that has the potential to implement other dynamic interpreted languages as well.
Implementation of the Python 2 specification is done in RPython – the so-called restricted Python, which is a stripped down version. The constraints that RPython has on objects make it easier to infer their types. The PyPy tool chain then has an easy time translating the RPython program to C, JVM bytecode or CIL. It also builds a JIT compiler in the resulting interpreter.
PyPy is capable of running any common Python code.
First and foremost – speed. PyPy's website maintains a page that's updated at each PyPy version which contains the accomplishments – various benchmarks and how much faster it is than those compared with CPython. These are kinda assumed, since PyPy has a JIT that compiles code bits to machine code at runtime, but what you can't assume is that PyPy can be even faster than C. This has been proven twice in edge cases where the dynamic nature of the problem combined with JIT compilation can be used to improve speed compared with a standard static compilation: here and here.
We mentioned this earlier, but the toolchain used to build PyPy to a JIT-powered C build can also build the same to JVM bytecode or CIL flavors, which are the rough equivalent of Jython and IronPython (this process is often called translation). For more information, read the chapter in the second volume of the book The Architecture of Open Source Applications.
The garbage collection has been improved, and it is still a field of development for PyPy, some programs might take less memory when working under it.
A different approach to running untrusted Python code. Rather than limiting language and standard library features that can possibly harm the system, PyPy reroutes the system calls to an external process handling the safe policy.
A substantial portion of Python users actually live in the worlds of academia and data analyis doing numerical computations. Numpy was created in that world and spawned a multitude of libraries like scipy, matplotlib, scikits and pandas which replaced Matlab and Mathematica as slow and/or expensive.
But…anyone who has used some of these libraries knows that compiling Numpy requires the Fortran language, as well as the immensely optimized libraries such as blas and lapack, written in Fortran which are de facto standards in numerical computations.
For CPython, there is a tool f2py for creating wrappers around Fortran libraries, and this is how the CPython's Numpy is created. Similarly, a f2pypy is a tool that creates PyPy wrappers, but it's still not complete and has some rough edges.
Numpy remains one of the greatest challenges for the PyPy developers, but there is a great progress already on the way.
What is cffi?
When you write extensions for Python that uses some kind of C library, you need to write using C and additionally you need to know Cython, SWIG or ctypes. For Python programmers not properly introduced to C, this can have great learning curve. But for PyPy, this is practically impossible.
Fortunately, cffi came to relieve this situation. It is basically a Python interface for foreign functions in C. It is not a part of PyPy, meaning that cffi can be used from CPython as well. The user simply calls C code from Python, there is no need to learn specific mediator APIs, and there is no C code that needs to be written. The examples can provide an additional demonstration.
A couple of months ago, PyPy saw its first release on the ARM platform. So you can run PyPy on Raspberry Pi. This leaves the ARM door open for any language optimal enough as PyPy, and there are some languages already implemented with the RPython toolchain.
Originaly published at blog.gigavoice.com