This post is about the Python ecosystem for scientific/ technical computing. Generally, when someone says that he/she is using Python for technical computing, we must interpret it as the “Python ecosystem for scientific/ technical computing”. Vanilla Python, which is a general purpose, versatile language was not designed for and is not suitable for technical computing (such as linear algebra, symbolic computing, vectorized operations etc.) in itself. However, the language provided just the right set of tools, and a framework within which scientists and engineers could easily implement their ideas. Python was quickly embraced by the general scientific community which built several packages using Python that are quite suitable for technical computing. Currently there hundreds of different Python-based libraries. This post is meant to be a basic introduction to a core set of scientific packages in the Python ecosystem, for someone new to Python (though I highly doubt I have any such audiences).
Python is a very powerful language for doing all sorts of things, and at all stages of research — from general computing, system programming, design of experiments, building device interfaces, connecting and controlling multiple hardware/software tools, to heavy scientific number crunching, data analysis and visualization. Python is an interpreted, general purpose, object-oriented, high-level scripting language, which supports multiple programming paradigms — procedural, object-oriented, and functional programming. The core design philosophy of Python are simplicity, code readability, and expressivity.
Python is easy to learn. It is intuitive and simple, yet it is powerful, beautiful and expressive.
What makes Python particularly attractive for scientist and engineers is that it is open-source, highly portable, intuitive to use, and features dynamic and strong typing. It provides both interactive and script based programming environments like MATLAB. It also features automatic memory management and garbage collection enabling scientists and engineers (with or without strong computer science background) to direct their time and energy on their algorithm and let the interpreter handle the low-level stuffs. All the above qualities in addition to its large scientific community-support allows greater opportunity for code-sharing, open and collaborative research, and thus it supports the philosophy of reproducible research.
Python itself is a full featured programming language with a large set of tools in the standard library (sure you have heard “batteries included”!!) What is even more attractive is that there is a whole technical computing ecosystem around Python built by the different scientific communities. Most of the tools are built on top of Numpy, which itself is built on top of Python. Numpy extends Python with capabilities such as vectorization, homogeneous arrays, multi-dimensional arrays, fast element-wise operations, broadcasting and universal functions that are essential for scientific/technical computing. The figure below shows a basic landscape of the scientific python ecosystem. Please note that it is not meant to be a complete, rather a very basic reference to the most common tools around Python for scientific computing.
The ease and interactivity of the language coupled with the availability of good community support and specialized scientific libraries enables a newcomer to quickly learn and do meaningful work using Python. The interested reader can read more about the arguments about advantages of Python and how it compares to other languages here.
When I started learning Python I collected a number of resources related to the use of Python for scientific and numerical computation. I think these resources may help someone who is just getting started using Python for Scientific computing. For this reason I have decided to share a list of those resources here in this blog post.
Before listing out the resources, I would like to recommend someone new to Python to download any of the free distribution packages (especially if you are using Windows), such as Anaconda, Enthought Python Distribution (now known as Canopy), pythonxy and Pyzo. Also, for Windows system there is WinPython which is great as multiple instances of WinPython can be used without interfering with each other. Anaconda from Continumm Analytics also has this capability. Additionally, WinPython may also be used as a portable environment. The above mentioned package distributions come pre-loaded with all the common packages and toolboxes required for scientific computing, IPython (an advanced Python shell and highly interactive editor), plotting libraries like Matplotlib (a powerful 2D plotting library with limited 3D plotting capability) and Mayavi (3D plotting library, using VTK underneath), and it will allow one to quickly get started without having to worry about how and what group of packages to install. For someone interested in using an integrated development environment, I can highly recommend the free and open-source IDE called Spyder. It is specially built for scientific computing with Python. The EDP package comes with Scite IDE and not the Spyder IDE. Although Scite is great I like Spyder more than Scite. One can separately install Spyder along with the EDP package. The new Enthought Canopy comes with the Canopy IDE which is very similar to Matlab’s IDE. (Personally, I use the IPython notebook 80% of the time for developing initial code as my work involves a lot of experimentation. The few times I need an IDE, I use Spyder. Other times, I use the Sublime Text editor. I use EPD, Anaconda and WinPython distributions in my different machines.)
In addition to the above, there is Christoph Gohlke’s Unofficial Windows Binaries for Python Extension Packages. That’s is an excellent source of packages. Hat’s off to him for maintaining such a large repository. (For example, if you want to install Numba on top of EPD/Canopy package distribution, you will require this Numba binaries.)
Finally, without any further holdup, let me enlist the resources that I have collected:
General python resources (not oriented towards scientific or numeric computing):
- A Byte of Python by Swaroop C H (Basic, highly rated) [Updated: 08/03/2015]
- How to Think Like a Computer Scientist – Learning with Python (Complete interactive online textbook)
- The Hitchhiker’s Guide to Python! (Intermediate) [Updated: 08/03/2015]
For getting started, the “fundamental” tagged videos lectures from Marakana is great.[Updated: 08/03/2015]
- Online Python Tutor is a great way to understand the guts of Python by actually “seeing” how python programs get executed.
- pyvideo.org has a huge collection of Python related videos indices (not actual videos) on the internet (including most of the Python conferences).
- Doug Hellmann’s Python Module of the Week (PyMOTW) series is a tour of the Python standard library using short examples. This resource is really useful to quickly understand and put to use any of the modules in the standard library.
Scientific python resources (oriented towards scientific and numeric computing):
- Python Scientific Lecture Notes (Scipy Lecture notes), edited by Valentin Haenel, Emmanuelle Gouillart and Gael Varoquaux (❗ First stop, if you are new). This is a wonderful resource and it can also act as a quick reference. It is certainly worth bookmarking this page.
- Lectures on Scientific Computing with Python by J.R. Johansson. A set of lectures on scientific computing with Python, using IPython notebooks.
- Python for MATLAB Users — Promoting Open Source Computer Vision Research (a CVPR 2012 tutorial)
- Videos (including tutorial videos)
- Ipython in-depth: high-productivity interactive and parallel python (Pycon US 2012 by Fernando Perez, Brian Granger and Min Ragan-Kelley). Someone new to IPython can quickly get a sense of what it is all about, and its powerful features.
- Introduction to NumPy and Matplotlib using IPython by Eric Jones (SciPy 2012) This is a tutorial video, and quite enlightening one. Concepts of slicing and indexing ndarrays are lucidly explained. A brief overview of the low-level memory layout of the numpy arrays is presented towards the end.
- Matplotlib: Lessons from middle age (Scipy 2012 by John Hunter) This video is actually not a tutorial video. In fact it is a keynote speech given by the late John Hunter in which he reflects upon and gives advises on developing successful open-source projects based on his experiences of developing matplotlib. There is not much to learn about using matplotlib itself, in this talk.
- Plotting with Matplotlib (Pycon US 2012 by Mike Muller). I think that the tutorial is good but for quick overview and reference, use Matplotlib plotting @ Scipy Lecture notes, written by Nicolas Rougier, Mike Müller and Gaël Varoquaux.
- Advanced matplotlib from the library’s author John Hunter (Tutorial at 2012 PyData Workshop). Truly wonderful tutorial on customization, event-handling, and few great tips and tricks. A must watch!
- Advanced Matplotlib Covers advanced features of Matplotlib library by Ryan May (SciPy 2012)
- Scipy2011 tutorials including recordings of Introduction to SciPy: Optimization, linear algebra, statistics, and more …, Guide to Symbolic Mathematics with SymPy, and Statistical Learning with scikit-learn.
- Scipy2011 videos @ Internet Archive (includes lot more talks than the Scipy2011 tutorials link)
- Scipy2012 videos @ pyvideo.org (62 videos). The tutorials are priceless!
- Scipy2013 videos @ pyvideo.org (139 videos). The tutorials are of great value.
- SciPy Central (something similar to MATLAB central)
- Cookbook @ Scipy.org (collection of algorithms in Python)
- Numpy & Scipy Documentation
- Numpy Example List With Doc
- Numpy for Matlab users (nicely delineates the similarities and differences between Matlab and Nympy)
- Migration from Matlab to Python-based systems using NumPy and SciPy (rough list of tips and resources for migrating from Matlab to Python)
- Guide to symbolic mathematics with SymPy SymPy tutorial (SciPy 2011 conferences)
- Using Python for Scientific/Engineering software development
- The Glowing Python – A Python snippet almanac for scientific computing and data visualization.
- Python for Signal Processing – Using Python to investigate signal processing concepts.
- Pythonic Perambulations – A blog by Jake Vanderplas.
- Technical Discovery – A blog by Travis Oliphant.
- Scientific Computing with Python – A blog by Greg von Winckel.
- Boom! Python, matplotlib, astropy and other tidbits – A blog by Michael Droettboom.
- Views on Python, Computational Science, … – A blog by Gaël Varoquaux.
- Fabian Pedregosa’s Blog and website
- Peekaboo Andy’s Computer Vision and Machine Learning Blog
- Tropical Software collection at SciPy.org enlists a large number of popular packages classified according to their application domain.
I have also found another excellent reference list of Python learning resources (suitable for scientists) maintained by Gael Varoquaux called Improving your programming style in Python. Bookmark that link too!
Of course, this is just the beginning. I have purposely not included links to many other powerful scientific python tools such as Cython (or Numba) for speeding up performance up to 1000 folds and wrapping C codes, Pandas for data science, or PyTools/h5Py for using HDF5. This post is meant to be a guide for someone staring off on using Python for scientific applications.
This is how my collection looks like mostly. Obviously, I acknowledge that I have surely missed some great tutorial/blog/website/video related to the use of Python for Scientific and Numerical computation. So, I would like to request my reader to please let me know about any such resource that you may be aware of. I would be more than happy to update the above list. Thank you very much.
CAUTION: DO NOT USE
If you are in practice of staring IPython notebook using the
--pylab inline option, immediately stop using it. If you are new, you might find in many places that recommend the use of
--pylab inline option but please don’t get this bad habit. Use
%matplotlib [inline|qt|osx|gtx]. For details see No Pylab Thanks.