Some performance tips when dealing with data in python

One of the main obstacles of python achieving domination in the machine learning / data mining field is probably the talk of it being not efficient enough. There is however, way of achieving better performance if you're careful enough. Bellow are some excellent suggestions, some of them I have personally tried (learned the hard way), such as using namedtuples instead of classes, or parsing csv with int/float instead of the csv parser from the standard library; as well as some of the numpy's more obscure routines for searching in arrays. Also, once you get used to profiling - you easily become addicted.

Expensive lessons in Python performance tuning
comment? published: 56 months ago. tags: data-mining, machine-learning, optimization, pandas, python, scipy


optimization notes