PyPy and the future of interpreted languages

How PyPy is compiled and how does it work?

Right off the bat: What is PyPy? PyPy is Python implemented in Python. This means that the interpreter is written in Python, but what started as an experimental idea turned into a fast, efficient Python implementation with a JIT compiler that has the potential to implement other dynamic interpreted languages as well.

Implementation of the Python 2 specification is done in RPython – the so-called restricted Python, which is a stripped down version. The constraints that RPython has on objects make it easier to infer their types. The PyPy tool chain then has an easy time translating the RPython program to C, JVM bytecode or CIL. It also builds a JIT compiler in the resulting interpreter.

PyPy is capable of running any common Python code.

Why PyPy?

Speed

First and foremost – speed. PyPy's website maintains a page that's updated at each PyPy version which contains the accomplishments – various benchmarks and how much faster it is than those compared with CPython. These are kinda assumed, since PyPy has a JIT that compiles code bits to machine code at runtime, but what you can't assume is that PyPy can be even faster than C. This has been proven twice in edge cases where the dynamic nature of the problem combined with JIT compilation can be used to improve speed compared with a standard static compilation: here and here.

Portability

We mentioned this earlier, but the toolchain used to build PyPy to a JIT-powered C build can also build the same to JVM bytecode or CIL flavors, which are the rough equivalent of Jython and IronPython (this process is often called translation). For more information, read the chapter in the second volume of the book The Architecture of Open Source Applications.

Memory optimization

The garbage collection has been improved, and it is still a field of development for PyPy, some programs might take less memory when working under it.

Sandboxing

A different approach to running untrusted Python code. Rather than limiting language and standard library features that can possibly harm the system, PyPy reroutes the system calls to an external process handling the safe policy.

Numpy

A substantial portion of Python users actually live in the worlds of academia and data analyis doing numerical computations. Numpy was created in that world and spawned a multitude of libraries like scipy, matplotlib, scikits and pandas which replaced Matlab and Mathematica as slow and/or expensive.

But…anyone who has used some of these libraries knows that compiling Numpy requires the Fortran language, as well as the immensely optimized libraries such as blas and lapack, written in Fortran which are de facto standards in numerical computations.

For CPython, there is a tool f2py for creating wrappers around Fortran libraries, and this is how the CPython's Numpy is created. Similarly, a f2pypy is a tool that creates PyPy wrappers, but it's still not complete and has some rough edges.

Numpy remains one of the greatest challenges for the PyPy developers, but there is a great progress already on the way.

What is cffi?

When you write extensions for Python that uses some kind of C library, you need to write using C and additionally you need to know Cython, SWIG or ctypes. For Python programmers not properly introduced to C, this can have great learning curve. But for PyPy, this is practically impossible.

Fortunately, cffi came to relieve this situation. It is basically a Python interface for foreign functions in C. It is not a part of PyPy, meaning that cffi can be used from CPython as well. The user simply calls C code from Python, there is no need to learn specific mediator APIs, and there is no C code that needs to be written. The examples can provide an additional demonstration.

Sprawl

A couple of months ago, PyPy saw its first release on the ARM platform. So you can run PyPy on Raspberry Pi. This leaves the ARM door open for any language optimal enough as PyPy, and there are some languages already implemented with the RPython toolchain.

Most notably, there is Topaz, which is a PyPy-like implementation of Ruby. An interesting implementation of PyPy is created with JavaScript as a backend, PyPy.js. Conversely, a GSOC project was done as Javascript implementation in Python, lang-js, but currently is not in development. As a pet project, the people from PyPy also implemented an incomplete PHP implementation with the RPython translator. And we can see that someone did their bachelor thesis implementing Pyrolog – a Prolog implementation in RPython.


Originaly published at blog.gigavoice.com

comment? published: 45 months ago. tags: cffi, jit, numpy, pypy, python


CCC breaks the Apple TouchID

Germany's Chaos Computer Club announced that their biometrics team successfully bypassed Apple's TouchID biometrics login mechanism, thus showing that the fingerprint method is unsuitable for access control.

A quote:

“Biometrics is fundamentally a technology designed for oppression and control, not for securing everyday device access. ... Forcing you to give up your (hopefully long) passcode is much harder under most jurisdictions than just casually swiping your phone over your handcuffed hands.“

Chaos Computer Club breaks Apple TouchID
comment? published: 45 months ago. tags: apple, biometrics, ccc, hacking, touchid


Postgresql 9.3 preview

A release candidate for the newest version of Postgres was shipped this week. For me personally, the difference with this release is not that drastic compared to those new features added on 9.0, 9.1 and 9.2 which marked a new era, not just for Postgres, but for all RDBMS, especially now in the NoSQL hype. Heck, some of the features, like LISTEN/NOTIFY publish notification on events, are still missing from most of the so-called high-performance key-value or document datastores.

What I admire most about the Postgres community is its ability to develop on different features and not just focus on what some BDFL think is an important goal. Thus, changes in the new version, as well as always, occurred on all fronts: SQL semantics, special datatypes features, view-related features, replication and administration.

JSON

The 9.2 version saw the addition of JSON datatype, perceived to give great boost to schema-less data usage of Postgres. Of course, we could have always just dump any serialized JSON in a text field, but this datatype also bears the burden of maintaining valid JSON values, as well as opening new possibilities for building functions around it.

The 9.3 version brings just that – new JSON functions and operators, so instead of building JSON support functions in your favourite language (perl, python, java, javascript, tcl, ruby, R, php, scheme), you have them implemented natively. Apart from getting specific parts from the value with a given index, key, or a complete path, you can also generate row set expanded from the outermost items, convert a row to JSON and vice versa and other things. The operators allowing you to access given key or path, open the possibility to index the column optimized for that key or path (remember: Postgres has expression indexing).

Note: I was once asked if one can update a single key in a JSON field . The answer is: No, any update needs to have the complete new JSON value serialized and passed. If you have a field that is changed frequently – refactor it outside the JSON field; for any other wild uses I recommend datastores like Redis or Cassandra.

SQL Semantics

LATERAL JOIN. It has an especially esoteric use. Put simply, you do a join with a table and a subquery, while at the same time, in the subquery you want to use columns from the table. Of course, you can complete a LATERAL JOIN by simply writing more complex subqueries, but people lived without WINDOW and WITH for a long time, yet they use these now without complaints.

Administration and replication

Parallel pg_dump. Basically it will dump more than one table in separate processes by specifying -j jobs parameter.

Shared memory – This one I like pretty much, since every time I need to install Postgres I also need to change the shared_buffers parameter and tweak the system’s kernel parameters like SHMAX. Now, the 9.3 version switched to mmap for memory management and you won’t need to do this anymore.

Replication has also been improved, now with the possibility of re-mastering (the process of one replica taking over as the master), in a streaming-only mode. There are still details to be announced on the replication improvements.

Triggers

The DDL workflow is apparently another active field for the Postgres community, and in this version there’s the addition of triggers on DDL. There are actually only three events: ddl_command_start, ddl_command_end and sql_drop, but then in the trigger function there is an implicit parameter called tg_tag, which is the name of the DDL command issued.

This can be very useful for preventing some DDL commands in some cases, or mounting additional responsibilities and features coupled with any changes to the structure of the database. And, coupled with the NOTIFY command, it can provide the build systems greater insight when debugging.

Views

There is constant work done on the views front. This versions big feature are the materialized views, Basically, they create a physical table instead of executing selection on each access, however, the data can become stale, because the user needs to reissue a refresh manually. I would say it still has some way to go until it works as intended.

As for another esoteric corner of use cases, another feature delivers creation of recursive views.

Finally, for those who use views extensively and know that the only way to make non-read-only view is to rig it with insert, update and delete triggers, this version bring the updateable views – which only works for simple views for one table, but it can save us from a lot of boilerplate triggers.

External data

The foreign data wrappers were added in 9.1, but only until now everyone can safely use them, since in the new version they are not read-only anymore, but writable. The catch is, however, that the foreign data wrapper feature is basically an API, and for each type of foreign datastore there should be a driver. There are quite a lot wrappers already (including the nosql datastores Couchdb, Mongodb, Redis and even Neo4j), but the only wrappers that support write, aside the Postgres, is Redis.

Postgres continues to move forward

I know it seems like a lot of new stuff, but except for the new JSON features, I would say this minor release saw less than its predecessors. However, it still shows the Postgres’ community dedication to adjust our favorite database to the new times which require faster retrieval, flexible data structures and scalability. The commitment can be seen in the promises for the 9.4 version, which includes background processes, MVCC improvements, partial aggregates and many more.

Originaly publised at: blog.gigavoice.com
comment? published: 46 months ago. tags: postgresql, rdbms


Data analysis extension for postgresql

I was trying to build an in-database recommendation system using collaborative filtering and postgresql was appealing because its support of array types. But quickly I found myself in need of even basic linear algebra functions, and I only needed summation (both in-line and aggregate), scalar multiplication as well as dot product. I did these in pl/python just to see if my concept was working (it was!), but, as you can guess, it was quite slow.

A quick search revealed MADlib, an extension that can do a lot more than basic linear algebra. It also does descriptive and inferential statistics, linear and logistic regression, k-means clustering and a lot more.

You can check the code on github, and there is a rpm binary package for CentOS. (I work on arch linux, so I just needed to extract the package with rpmextract and then copy it to my root.) After installation, look for the bin/madpack binary for deployment to your database.
comment? published: 48 months ago. tags: data-analysis, machine-learning, postgresql, statistics


Moved to archlinux

So I finally said goodbye to (k)ubuntu. It was a long ride, and lately I didn't like the direction it was heading: pushing unity and ads are just part of the symptoms of a bigger problem of politics. Anyway, I decided on archlinux - a distribution that is focused more on technical simplicity rather than wide-net usability.

It actually works incredibly good, it installs only the packages you need and not a single library more. I use KDE, and I was very suprised to see how well the build integrates with the rest of the system, even better than Kubuntu! I am a big fan of apt based packaging, and I must say that the pacman + aur system is at least on par.

Combined with the power of wiki pages they have, which are in abundance and probably contain solution for every common problem one may encounter, I would say this makes a sound alternative of linux distribution for every advanced user.

Download archlinux
Browse through the wiki
comment? published: 57 months ago. tags: apt, archlinux, kde, pacman, ubuntu


Python 3.3 released

The long awaited python 3.3 version is finally here! There are quite lot improvements and additions:
  • explicit unicode literals - revived from the python2 series, the intent here is supposed to be easier porting from pythom2 to python3. [PEP 414]
  • yield from expression - something that will ease writing of coroutines [PEP 380]
  • ChainMap collection - in collections module of course, see docs
  • ipaddress module - a wonderful addition for working with IP addresses and networks, with support for IPv6 as well, see docs
... and many more, see the release report. Also, a healthy discussion on hacker news
comment? published: 57 months ago. tags: python, python3


Dropbox integration in KDE Dolphin via service menu

One of the criticisms for both KDE and Dropbox is that they can't be integrated - well it's not true at all. You can configure everything in KDE, including the service menu in Dolphin (the local context pop-up menu).

You can download it from here, where you can also find installation instructions. I find the shortcut for copying the public url very handy.
comment? published: 57 months ago. tags: dolphin, dropbox, kde, service-menu


Extensions and options for the GCC compiler

A wonderful article that has aggregated some of the most useful options and extensions of the GCC compiler. It covers:
  • Optimization
  • Compiler warning options
  • GCC-specific extensions to C
  • Standards compliance
  • Options for debugging
  • Runtime checks
  • Additional datatypes
  • Ranges in switch/case
  • Binary literals
... and many more. A must-read for any thriving c developer.

The most useful GCC options and extensions
comment? published: 58 months ago. tags: c, compilers, gcc


tech notes
feed: rss