Python 3 is Winning Library Developer Support


Launch Notebook Now!

https://notebooks.azure.com/library/rJUgQ81mnpo

In 3 months, Python 3 will be better supported than Python 2.

Are you using Python 3 for your development? It has been out for 7+ years at this point. So, if you aren’t using it, why not? Since December of 2008, the initial release of Python 3, it seems the new version of Python has lived in the shadow of Python 2. And here we are, 7 years later, still looking at a world where people are using Python 2 and talking about how Python 3 doesn’t work for them.

This made me wonder. Is Python 3 really inferior to Python 2? If not, why aren’t people moving? I mean, there has to be a reason people are still clinging to the older technology. After some thought, it seemed to me the most recurring statement for why people are continuing to use Python 2 was “the packages I need just aren’t on Python 3”. In attempt to get a statement I believed I could measure I framed this as “is Python 3 supported by library developers well enough for me to move?”.

So, I wrote a worker role in Azure, collected a bunch of data from PyPI, and got to analyzing all of it in a Jupyter Notebook (Cortana Analytics Gallery). Is the Python 3 library space being maintained well or is Python 2 still the kingpin? Python 3 has around 4 years to become the only supported version of Python, as Python 2 Support is being discontinued as the version goes EOL (End of Life). Does PyPI indicate that we will be ready? Let’s examine the data and determine what the future may hold.

To see the full analysis you can get the Jupyter Notebook for this post from the Cortana Analytics Gallery.

Getting the Data: PyPI, and Trove Classifiers, and an Azure Service

PyPI allows package maintainers to mark their releases with trove classifiers to indicate what they support. I wrote an Azure Cloud Service to gather the data from PyPI and store it in Azure Table Storage for analysis later. This service, by monitoring the RSS feed for PyPI, uses Requests and the Azure SDK for Python.  If you are interested further in the collection process you can find the source on GitHub.

The versions/releases show the activity of any package, which admittedly means that if a single package made a huge proportion of releases within a month it could skew the data. Since I think this is unlikely, and not really worse than a single package being downloaded a disproportionate amount (which is the problem with most analysis on the consumption side), I ran with it. Once collected the data can indicate, based on what package authors are releasing support for, an overall trend of adoption.

There are a variety of blogs using user download trends from the Python Package Index (PyPI) when determining the success of Python 3. To figure out if Python 3 is better supported we will look at the package maintenance and publishing activity instead of looking at download counts. This way, we get to look at a leading indicator of adoption, the package creation, instead of a trailing indicator, the package download. Using package creation we can find that a world where Python 3 is better supported isn’t far away.

Analyzing the Data

Once collected we can investigate the data using Jupyter and Pandas and chart our findings. The question we hope to answer is whether we are approaching the goal of the vast majority of active Python packages supporting Python 3 fast enough to meet the 2020 EOL of Python 2.

By using a Jupyter Notebook, Pandas, and the data we collected earlier, Python 2 and Python 3 package development can be charted to determine what trends may exist. If you want to see further charts and analysis you can view the complete Jupyter Notebook. For this post though I want to focus on two charts; uploaded packages over the past five years, and the trend lines found for Python 2 and Python 3 package version uploads.

Takeaway #1: Uploaded Packages are increasingly containing Python 3 Support and decreasingly containing Python 2 support.

pypi-chart-1

Above is a chart showing the past 5 years of package versions. What we can see is Python 2 (Only) packages are decreasing and Python 3 support is gaining. Soon enough, we will see a larger number of Python 3 packages than Python 2 packages.

Takeaway #2: Python 3 support is set to overtake Python 2 support and it isn’t as far away as you might think.

Below is one chart from this analysis. It contains the percentage of packages published to PyPI marked with either Python 3 or Python 2 support. Since packages can be marked both these don’t add to 100%. The red line represents Python 2 and the blue line represents Python 3 charted from September 28, 2012 (1.35e9) to present day (1.46e9) using Epoch Time.

Percentage of Python 2 and Python 3 Packages over Time (Epoch)

pypi-chart-2

What you can see in the chart is Python 3 support is converging with Python 2 support. If you project those fit-lines out and do some simple algebra we find that they cross around May of this year.

Conclusion: The state of things

More precisely, when run with the data available as of February 22, 2016 the analysis returned a date of May 21, 2016 for Python 3 to be better supported than Python 2.  Now, this doesn’t include packages that aren’t classified with trove classifiers. It also gives higher weight to maintained packages. This date seems optimistic but the data is certainly pointing at Python 3 taking over in the very near future.

It also seems safe to say that, by 2020, there will be plenty of Python 3 packages and that dependencies blocking one’s transition to Python 3 should not be a significant problem for most people.

Analyze the data yourself!

I think it would be unfair though to just give you two charts. You can find the complete data and analysis in a Jupyter Notebook published to the Cortana Analytics Gallery. It goes into more depth about the data analysis and contains many more charts/plots than what is shown in this post. I encourage you to run the notebook, experiment with the data yourself, and come to your own conclusions. If you find something interesting feel free to share!

Launch Notebook Now!

Comments (18)
  1. James says:

    Great post, it’s nice to see this data. In the spirit of the question “If not, why aren’t people moving?” and “the packages I need just aren’t on Python 3″ – when will Microsoft’s own Azure Machine Learning support Python 3? Currently it seems it’s only available in version 2.7.7

    3, and especially the latest 3.5, have brought some great features and improvements that would be great to use.

  2. set_trace says:

    The fact that it’s being put in these terms says more than the article ever could. Python 3 “winning” is a loss for the community.

    A trove classifier says nothing about the quality of running a polyglot library on Python 2 or Python 3. I say this as someone who maintains polyglot libraries where I’ve had Python 3 bugs persist for months, because everyone actually uses Python 2. Industry people know that it’s about which packages and how they’re used, not just the number of packages.

    1. I think that is an interesting point. The goal here was to look at a leading indicator of package maintenance. You are correct, it is hard to measure quality of packages themselves. You could say downloads are a synthetic metric of that but we know that PyPI download counts are rather unreliable. I wanted to take a different approach to how we were looking at the problem.
      I believed when I wrote this, as I do now, that package maintenance and support is a leading indicator for later usage.
      As for maintaining polyglot packages I agree that it can be a burden for library developers. It limits you to a subset of both languages and increases maintenance costs. And your issue with existing users not hitting bugs in Python 3 sounds expected. While you wait for Python 3 usage to pick up you will see low usage and it may take longer to find bugs. I would expect that to improve over time as users begin using your library on Python 3.
      The issue is, until you write the support at all, how can you expect users to adopt your library on Python 3?

  3. Tejinder says:

    Are you guys developing for python in windows? How is the support for python packages in windows? How do you compile pycrypto or pillow in virtuslenv? Any tips are really appreciated

    1. HPCToolsGuy says:

      @Tejinder

      For more complex pkgs we recommend using the Anaconda distro from continuum.io.

    2. Short answer for installing Python libs with embedded C code :

      – Install mingw32
      – Add path to mingw32 executables to the PATH environment variable
      – (perhaps) copy then rename the copy of mingw32-make.exe to make.exe
      – Find and edit the “distutils.cfg” file from your Python installation root :

      [build]
      compiler = mingw32

      Note that Pillow has a binary Wheel distro for most Windows / Python combination.

      1. I wish it were that short, but so many packages have other dependencies you need to pull down somehow (and there are a few choices of mingw32, some of which will not work reliably).

        Best recommendation on Windows is getting the version of MSVC that matches your Python version. I have a post on this coming up soon. Until then, the best short answer is at https://packaging.python.org/en/latest/extensions/#building-binary-extensions

  4. Brian Bien says:

    It’s about time. I’m looking forward to the day when I can reply to my fellow dev’s, “no, actually Python 3 now has greater library support!”

  5. pat_bk says:

    Python should be first class language for Win 10 UWP apps.

    This will be a win win !

  6. Tim says:

    Nice analysis.

    What do “Polyglot (both)”, “Polyglot + Python 3”, and “Polyglot + Python 2” mean?

    1. Hey Tim. Here is a brief description of the different buckets I made. You can see how I group these in the notebook as I just look at columns for Python 2 Marked and Python 3 Marked.

      Polyglot (Both) – All packages that have Python 2 and Python 3 marked
      Polyglot + Python 3 – All packages marked Python 3
      Polyglot + Python 2 – All packages marked Python 2
      Python 2 (Only) – Packages marked Python 2 but not marked Python 3
      Python 3 (Only) – Packages marked Python 3 but not marked Python 2

      1. Kevin Broch says:

        Great article and analysis Christopher! You went into more detail in the Jupyter notebook about trove classifiers which helped my understanding of your categories better.
        I’m curious if you considered graphing the “Unclassified category”?
        Also got me thinking about trove classifier validation. For example I believe no package developer should make their package both:
        “Programming Language :: Python :: 2 :: Only” & “Programming Language :: Python :: 3 :: Only”

        But if a developer tagged it just: “Programming Language :: Python :: 2”
        That might just imply it hasn’t been tested on py3?

        1. That is a good point Kevin. This analysis relies heavily on the package developer to accurately mark their library. As far as ‘Python 2’ vs ‘Python 2: Only’ labels I am unsure everyone uses them in the same way. I am sure if it says Python 2 it works on Python 2 at least which is what I rely on here.

  7. When will we see Python3 bundled by default in Windows?

    As a migrant from .NET -> Python it’s great to see Microsoft getting into Python. Keep it up. 🙂

  8. William Payne says:

    It would be good to remove from consideration “dead” packages that are not actively being maintained, as well as, perhaps, providing a figure that is weighted by package popularity — number of downloads over the past 12 months, if that number is available.

    I feel that this would more accurately reflect the degree to which popular, actively maintained packages (i.e. the ones that we care about) – are moving to support Python 3.

    I suspect that the results of this analysis will paint a picture that is even more in Python 3’s favor.

  9. dajolt says:

    My biggest issue with python 2 vs 3 is that I currently have no safe way to install python3 on a windows 7 machine that needs to keep running python2.7 in production at the same time. Last time I tried installing python 3 it broke the python 2 installation so I had to first deinstall python 3 and and then deinstall and reinstall python 2. I don’t mind trying python 3 for new projects, but the old code should continue to run.

    1. I’m interested in how Python 3 broke your Python 2 installation? I am currently responsible for the Python installers on Windows, so if there is an issue there I’d like to fix it. Did you report the problem on http://bugs.python.org/?

  10. John Fabiani says:

    Let me know when a stable version of wxPython can support python3 and I’ll move. You see there always seems to be some parkage or module missing.

Comments are closed.

Skip to main content