Python and DataScience

Python is very mature and robust and has a large and vibrant community. One of the great things about Python is libraries exist for pretty much anything So within UK education Python is used for

  • Web development
  • Databases
  • Text processing
  • Scientific computing
  • Image processing
  • Machine Learning
  • System management
  • Gaming…

One of things I have seen over the past year is the growth on Python within scientific computing.

We have a set of great resources the Introduction to Python and Fundamentals of Data Science with Python – These are interactive Jupyter Notebooks hosted on Azure at https://Notebooks.azure.com

image

Python strives to be simple in design and implementation

image

Dynamically Typed as we all know this can be a curse or a blessing at times

image

  Expressive and succinct - List comprehension & generators

image

Extensible - Operators can be overloaded (but not created, unlike F#)

image

Pedagogical and Python Progressions?

I recently meet with Nicholas Tollervey @ntoll from the Python Foundation, Nicholas recently spoke at Microsoft Future Decoded. on the use of Python. I asked Nicholas his thoughts on Python and how the impact of devices such as the BBC Microbit and Raspberry Pi is having a impact on how school children are now being introduced to Python.

Give me a short introduction to Python?

Python (https://python.org) is one of the world's most popular programming languages.

You inadvertently use Python every day when searching with Google, watching videos on YouTube, posting photos on Instagram, networking via LinkedIn, sharing files with DropBox or when browsing innumerable other web based services.

Python isn't just about the web. Python is used by the likes of Pixar, Disney and Lucasfilm to assemble their films. Financial institutions from the likes of J.P.Morgan and Bank of America to small hedge funds use Python as an essential part of their infrastructure. Science makes heavy use of Python from testing the Mars rover at NASA to controlling equipment to detect gravitational waves at LIGO.

It's easy to learn yet incredibly powerful and has been applied to most areas of human endeavour.

This also includes the world of education.

Guido van Rossum (the inventor of Python) has stated that, "Python is for everyone" and even ran a project in the late 1990s researching how to improve Python in an educational context. This work has borne fruit since Python is one of the most popular languages for teaching

computing: it's the language of choice at MIT, the One Laptop Per Child

(OLPC) code was all written in Python and the Raspberry Pi (perhaps the world's most successful computing-in-education project) was named "Pi"

because it could run Python and many of the educational resources for the Raspberry Pi use Python.

Another educational project that has had a lot of attention recently is the BBC micro:bit. It's a small embedded device about the size of a credit card. A million of these were given to every eleven and twelve year old in the UK. Both Microsoft and the Python Software Foundation (who maintain and support the Python programming language) were partners in this project see https://www.microbit.org

Python is on of the official languages for the BBC Microbit device but how?

Thanks to the amazing work of Damien George, an Australian physicist working in his spare time, a version of Python called MicroPython

(https://micropython.org/) is able to run on the small and highly constrained microcontrollers at the heart of devices like the BBC micro:bit. An international community of volunteers have worked on the version of MicroPython for the BBC micro:bit and what follows is an exercise in continuity.

It is, quite literally, a map for how an eleven year old beginner programmer can progress from simple Python scripts on the BBC micro:bit to running cloud based infrastructure with Python on Microsoft's Azure cloud.

The modus operandi for programming the BBC micro:bit is easy - just plug it into your computer via a USB cable. Use a beginner friendly code editor such as Mu (https://codewith.mu) to write your code and then "flash" (copy) your program onto the device and watch it run.

OK So I know a lot about the BBC Microbit, The device has a 5x5 matrix of lights (LEDs) to show pictures and display text. How can you use Python to make the device scroll a friendly message:

from microbit import display

display.scroll("Hello, World!")

That's it!

The first line tells Python we're going to use the micro:bit's display, and the second says "use the display to scroll the message 'Hello, World!'".

It's also possible to create animations:

from microbit import display, Image

display.show(Image.ALL_CLOCKS, delay=100, loop=True)

So how does this work?

Once again the first line tells Python what it'll need to work and the second line uses the display to show a list of images called ALL_CLOCKS (the hands of a clock pointing at each hour), with a delay of 100 milliseconds between each frame and doing so in a continuous loop.

The effect is something like a very simple radar screen.

A more advanced script turns the device into a fun toy:

from microbit import *

import random

images = [Image.HAPPY, Image.SILLY, Image.GHOST, Image.SKULL,

Image.DUCK, Image.UMBRELLA, Image.GIRAFFE, Image.RABBIT,

Image.HEART, Image.STICKFIGURE]

while True:

sleep(20)

x = random.randint(0, 4)

y = random.randint(0, 4)

brightness = random.randint(0, 9)

display.set_pixel(x, y, brightness)

if button_a.was_pressed():

display.show(random.choice(images))

sleep(500)

if button_b.was_pressed():

display.scroll("Hello, World!", delay=100)

if accelerometer.was_gesture('shake'):

display.show(Image.ANGRY)

sleep(1000)

The display sparkles in a repeatedly random manner. The device has two buttons (labelled A and B) and an accelerometer. If button A was pressed a random picture is displayed for half a second. If button B was pressed the message 'Hello, World!' is scrolled across the display. If the accelerometer detects the device was shaken, an angry face is displayed for a second.

If you read the code, it's almost as if you're reading an English summary of what's going on. This is one of the reasons Python is so powerful

it's very easy to express precisely what you want in an intuitive manner. It's also why Python is easy to learn: Python is a "high level" language making it very close to how we humans think. Another aspect of Python is its flexibility. Many programs need to be compiled before you can run them and once compiled cannot be changed. Python is a dynamic language - it's not compiled and it's possible to interact with your code while it is run.

Can you programme the Microbit directly using Python?

It is  possible to "talk Python" directly with your computer using something called a REPL (an interface that Reads, Evaluates, Prints and the Loops over the code you interactively enter into it). This is built into the BBC micro:bit too. It means it's possible to "live code" with the device and experiment to see what works.

In the Mu editor you just need to plug in your micro:bit and click the "REPL" button. You'll see three chevrons (>>>) and a blinking cursor.

This is Python waiting for you to type something.

For example:

>>> print("Hello, World!")

This command is read by Python and evaluated (i.e. Python works out what you want to do). The result is printed (i.e. we told Python to print

something):

Hello, World

...and the three chevrons appear again as the REPL loops back to await your next instructions:

>>>

Simple!

Sometimes, there is no result to print after evaluating your command.

Other times the chevrons won't immediately appear as Python is busy evaluating what it is you want it to do (and this may take some time).

This is evident when you type in the following:

>>> display.scroll("Hello")

The display will, indeed, scroll "Hello", but the chevrons won't appear until Python has finished scrolling things for you.

Another compelling aspect of the BBC micro:bit are the GPIO (general purpose input / output) pins that run along the bottom of the device.

These are how you plug things into and onto the device. For example, if you connect a speaker to Pin0 and GND you can make the device play music:

>>> import music

>>> music.play(music.WEDDING)

Alternatively, if you attach a speaker to Pin0 and Pin1 you can make the device talk:

>>> import speech

>>> speech.say("Hello there!")

The most important point is we've made the cute looking BBC micro:bit come alive in a compelling and easy to understand way. Our intention is

So its simple: to inspire a new generation of coders.

Kids should feel inspired to take the very simple speech synthesiser and do something amazing like create a Dalek poetry recital program. Kids should feel inspired to take the display and built in radio to make a simple two player PONG clone. Kids should feel inspired to explore their digital world with tools that allow them to graduate to the next level (be it a Raspberry Pi or something else).

What could that something else be?

Remember the REPL? It turns out that Python is very popular with data scientists and many of them use a Python project called Jupyter

(https://jupyter.org) to display their results. Jupyter is a way of presenting code, text and other digital assets in a sort of interactive notebook. Such notebooks record an interactive and engaging record of the author's movement of thought. They're a sort of interactive REPL on steroids. If you're not technical, imagine if Leonardo's notebooks were interactive and reacted to a reader's modifications - that's what reading a Jupyter notebook feels like. I hope the educational potential of this tool is obvious. The scientists from the LIGO project certainly realised this since their results announcing the discovery of gravitational waves were announced in the form of Python code, diagrams and text embedded in a Jupyter notebook. Colleagues would take the raw data and follow the proof of discovery by reading through the notebook and interactively playing with the proof themselves.

It's a very small step from using the REPL on the BBC micro:bit to using the REPL built into a Jupyter notebook for presenting a repeatable movement of ideas.

If you're interested in Jupyter notebooks one of the easiest ways to try them out is on Microsoft's Azure platform (https://notebooks.azure.com/). There are a wealth of resources and demonstrations for all levels of programmer.

Who knows, perhaps when the current crop of eleven year olds graduate from using Python on the micro:bit they're progress to Python in Jupyter.

All of us in the micro:bit partnership will know we've achieved our goals when, for example, kids start using services like Jupyter notebooks on Azure to present data from their Geography projects. Such continuity from first steps to data scientist is essential if we're to help the next generation of programmers flourish. Mind you, it's not just about the next generation of programmers: we need doctors, teachers, barristers, architects, musicians, scientists and other professions to have the coding skills to adapt and adopt the digital world to their needs.

Exciting times ahead and this is why Python for Data Science/Machine learning is so important

So in conclusion

Python is very expressive and easy to read and write

Refer back to the Zen of Python

Very large and active Machine Learning community with standard “stack” emerging…

Common coding style and tools

Most statistical and Machine Learning models and techniques exist

Comprehensive visualisation, parallelisation…

Very Fast… (alternative VMs, Cython, C/C++ integration)

The Jupyter notebook is invaluable for reproducibility and now available and maintenance free at https://notebooks.azure.com

Visual Studio support for Python via Python Tools for Visual Studio

Loads of libraries and resources NumPy / SciPy

Efficient multi-dimensional array operations with Python syntax

MATLAB-like complement of tools (linear algebra, numerical integration, optimization, etc.)

Data Processing

Pandas: R-like DataFrame with split-apply-combine operations and much more (a very fast groupby())

PyTables: loading large hierarchical data set

SciKits

Packages built on top of NumPy/SciPy

For Machine Learning & Stats, there is scikit-learn and Statsmodels and Stan

Plotting

Matplotlib: a MATLAB-like plotting interface with OOP API for ultimate flexibility

ggplot: R-inspired grammar of graphics, seaborn

bokeh: interactive visualization

Deep learning

CNTK has Python bindings as of 2.0 (open source created by Microsoft)

TensorFlow has a first-class Python interface

Theano written in Python

Keras wraps TensorFlow/Theano

Other tools

SymPy: a symbolic solver

Pretty much everything… PyMC, Bayesian belief network library (eBay), Gradient Boosting, etc.

Jupyter+ IPython

The glue environment making all of the above nicely integrated

Jupyter Notebooks https://notebooks.azure.com

Intended as step towards reproducible analysis

Similar to knitr and Mathematica notebook

Runs as web application and user interaction happens in browser

Runs local Python server but can be hosted on the Internet

Python code is executed from the browser and server renders results

Pictures, tables and LaTeX work out of box

The Jupyter Notebook

Renders a “kernel”, for us it’s IPython

The same power accessible from browser

Data visualization support

Multiple viewers (some interactive)

Connectivity with other languages

R, Perl, Ruby…

Python for Visual Studio

IPython

Completes the Machine Learning stack

Focus on reproducible analysis and productivity

Better command line shell

“Magic” commands, pretty presenting, colour output, better stack-trace, debug support, save state on exit…

https://microsoft.github.io/PTVS/