Data analysis extension for postgresql

I was trying to build an in-database recommendation system using collaborative filtering and postgresql was appealing because its support of array types. But quickly I found myself in need of even basic linear algebra functions, and I only needed summation (both in-line and aggregate), scalar multiplication as well as dot product. I did these in pl/python just to see if my concept was working (it was!), but, as you can guess, it was quite slow.

A quick search revealed MADlib, an extension that can do a lot more than basic linear algebra. It also does descriptive and inferential statistics, linear and logistic regression, k-means clustering and a lot more.

You can check the code on github, and there is a rpm binary package for CentOS. (I work on arch linux, so I just needed to extract the package with rpmextract and then copy it to my root.) After installation, look for the bin/madpack binary for deployment to your database.

start > tech

published 48 months ago

tags: data-analysis, machine-learning, postgresql, statistics

tweet fb +1

- Some performance tips when dealing with data in python
- Postgresql 9.3 preview
- PostgreSQL 9.2 released