In 3 months, Python 3 will be better supported than Python 2.
Are you using Python 3 for your development? It has been out for 7+ years at this point. So, if you aren’t using it, why not? Since December of 2008, the initial release of Python 3, it seems the new version of Python has lived in the shadow of Python 2. And here we are, 7 years later, still looking at a world where people are using Python 2 and talking about how Python 3 doesn’t work for them.
This made me wonder. Is Python 3 really inferior to Python 2? If not, why aren’t people moving? I mean, there has to be a reason people are still clinging to the older technology. After some thought, it seemed to me the most recurring statement for why people are continuing to use Python 2 was “the packages I need just aren’t on Python 3”. In attempt to get a statement I believed I could measure I framed this as “is Python 3 supported by library developers well enough for me to move?”.
So, I wrote a worker role in Azure, collected a bunch of data from PyPI, and got to analyzing all of it in a Jupyter Notebook (Cortana Analytics Gallery). Is the Python 3 library space being maintained well or is Python 2 still the kingpin? Python 3 has around 4 years to become the only supported version of Python, as Python 2 Support is being discontinued as the version goes EOL (End of Life). Does PyPI indicate that we will be ready? Let’s examine the data and determine what the future may hold.
To see the full analysis you can get the Jupyter Notebook for this post from the Cortana Analytics Gallery.
Getting the Data: PyPI, and Trove Classifiers, and an Azure Service
PyPI allows package maintainers to mark their releases with trove classifiers to indicate what they support. I wrote an Azure Cloud Service to gather the data from PyPI and store it in Azure Table Storage for analysis later. This service, by monitoring the RSS feed for PyPI, uses Requests and the Azure SDK for Python. If you are interested further in the collection process you can find the source on GitHub.
The versions/releases show the activity of any package, which admittedly means that if a single package made a huge proportion of releases within a month it could skew the data. Since I think this is unlikely, and not really worse than a single package being downloaded a disproportionate amount (which is the problem with most analysis on the consumption side), I ran with it. Once collected the data can indicate, based on what package authors are releasing support for, an overall trend of adoption.
There are a variety of blogs using user download trends from the Python Package Index (PyPI) when determining the success of Python 3. To figure out if Python 3 is better supported we will look at the package maintenance and publishing activity instead of looking at download counts. This way, we get to look at a leading indicator of adoption, the package creation, instead of a trailing indicator, the package download. Using package creation we can find that a world where Python 3 is better supported isn’t far away.
Analyzing the Data
Once collected we can investigate the data using Jupyter and Pandas and chart our findings. The question we hope to answer is whether we are approaching the goal of the vast majority of active Python packages supporting Python 3 fast enough to meet the 2020 EOL of Python 2.
By using a Jupyter Notebook, Pandas, and the data we collected earlier, Python 2 and Python 3 package development can be charted to determine what trends may exist. If you want to see further charts and analysis you can view the complete Jupyter Notebook. For this post though I want to focus on two charts; uploaded packages over the past five years, and the trend lines found for Python 2 and Python 3 package version uploads.
Takeaway #1: Uploaded Packages are increasingly containing Python 3 Support and decreasingly containing Python 2 support.
Above is a chart showing the past 5 years of package versions. What we can see is Python 2 (Only) packages are decreasing and Python 3 support is gaining. Soon enough, we will see a larger number of Python 3 packages than Python 2 packages.
Takeaway #2: Python 3 support is set to overtake Python 2 support and it isn’t as far away as you might think.
Below is one chart from this analysis. It contains the percentage of packages published to PyPI marked with either Python 3 or Python 2 support. Since packages can be marked both these don’t add to 100%. The red line represents Python 2 and the blue line represents Python 3 charted from September 28, 2012 (1.35e9) to present day (1.46e9) using Epoch Time.
Percentage of Python 2 and Python 3 Packages over Time (Epoch)
What you can see in the chart is Python 3 support is converging with Python 2 support. If you project those fit-lines out and do some simple algebra we find that they cross around May of this year.
Conclusion: The state of things
More precisely, when run with the data available as of February 22, 2016 the analysis returned a date of May 21, 2016 for Python 3 to be better supported than Python 2. Now, this doesn’t include packages that aren’t classified with trove classifiers. It also gives higher weight to maintained packages. This date seems optimistic but the data is certainly pointing at Python 3 taking over in the very near future.
It also seems safe to say that, by 2020, there will be plenty of Python 3 packages and that dependencies blocking one’s transition to Python 3 should not be a significant problem for most people.
Analyze the data yourself!
I think it would be unfair though to just give you two charts. You can find the complete data and analysis in a Jupyter Notebook published to the Cortana Analytics Gallery. It goes into more depth about the data analysis and contains many more charts/plots than what is shown in this post. I encourage you to run the notebook, experiment with the data yourself, and come to your own conclusions. If you find something interesting feel free to share!