In the article Manage your open source usage and security in your pipeline, we introduced you to WhiteSource and how it can be used to help manage open source libraries within your projects. We cover the core features of WhiteSource, the primary benefits of using it and how to implement it within your CI/CD processes.
Where Are We Now?
We have had several months now to gain experience with WhiteSource since we published the above article. Here are some quick stats that reflect the products that we are scanning. Today, there are:
- A total of 21 products being actively scanned by WhiteSource, as part of our automated build process.
- A total of 31 different OSS licenses being utilized across those 21 products.
- Over 1,600 open source libraries utilized across these 21 products.
As we've gained experience we've also run into a few challenges that we've been working to solve. We want to share these challenges with you, and what we've done to work toward overcoming them, in this post.
There are two primary challenges that we've been facing: How to deal with the myriad OSS licenses identified by WhiteSource and the craziness that is npm.
Let's start by looking at OSS licenses. As mentioned above, we currently have 31 distinct OSS license types being used across the 21 products that we are scanning today. You might be asking yourself, What's the big deal? They're open source licenses… who cares?
Well, in some cases, it simply might not matter at all. However, in an organization that relies on their software and related intellectual property to generate revenue, the exact OSS licenses being utilized might make all the difference in the world.
For example, should we be concerned about copyleft licenses? In WhiteSource the GPL 3.0 license has the following notice:
Imagine this scenario… Your company pours millions of dollars into creating software that revolutionizes the financial portfolio analysis tools market space. Several months later, a competitor gets word that you've made liberal use of GPL-based components within your software, some of which you've modified heavily. They instruct their legal team to request the related source code from your company.
If you think this is a far-fetched scenario, head on over and check out "Lawsuit threatens to break new ground on the GPL and software licensing issues". While I do not know what the outcome of this lawsuit was, it resulted in multiple lawsuits being filed and was no doubt very expensive.
If your organization has a general policy of open sourcing the software that you create to begin with then the exact OSS licenses you choose might not be so important. However, it's still worth keeping an eye on the licenses that you make use of in case an opportunity arises to convert one or more of your products to commercial offerings (which is not an unusual business practice).
To help with this issue, it's important that your organization creates an Approved OSS License Cheat Sheet. This might be a simple table that lists each OSS license that you might come across with a simple yes, no or maybe next to it. A "yes" implies the OSS license is pre-approved for use within the organization whereas a "no" indicates that the license in question should not be utilized. A "maybe" would imply that the legal team needs to provide guidance on whether the license can be used.
Within WhiteSource you can setup policies based on this matrix that can reject offending OSS licenses during a scan. For example, to exclude GPL licenses, your policy configuration might look something like this:
Once the policies have been setup to match your license matrix the rest is automatic. Knowing that you're complying with the guidance of your company's legal team can be reassuring to all involved.
WhiteSource has provided us with a ton of information regarding OSS usage within our projects. That's a good thing! However, our experience has been that at times there is almost too much information. This is not necessarily WhiteSource's fault but rather just how it is.
In our case, this excessive "noise" is caused almost entirely by npm library references. If you've worked with npm then you are likely familiar with the sheer number of extraneous libraries that are downloaded when a single npm package is installed.
As an example, one of our projects that we've configured to be scanned with WhiteSource has 16 npm libraries configured within its package.json file. However, when the project is built within VSTS and npm has finished downloading all referenced npm packages, there are a total of 687 libraries being scanned by WhiteSource. 687!
Let’s put this into perspective and look at two of the extension projects we’re scanning:
- Folder Management extension with 0 known OSS libraries (according to the team)
- Countdown Widget with 5 known OSS libraries (according to the team)
After scanning, we are presented by the following picture in WhiteSource:
Less than 1% of the scanned OSS libraries were expected. That’s potentially a lot of noise … “we cannot see the forest for the trees”.
Having to manage this many libraries (per project) is not for the faint-hearted! It takes time and patience. Ideally, we would reduce this number of libraries to the bare minimum that needs to be reviewed. There are a couple of options here.
Separate Your Dependencies
Within your package.json file be sure you split out your npm dependencies between devDependencies and (production) dependencies. The key part is that you must then make use of the --production flag when installing the npm packages. The --production flag will exclude all packages defined in the devDependencies section.
Here's an example package.json file:
As you can see in the above example, the dependencies have been split between devDependencies (eight of them) and dependencies (the remaining three).
This is a good first step in reducing some of the noise. However, when scanning your production dependencies, you must still deal with the additional libraries that were downloaded because they were dependencies of the initial list of defined (production) packages.
Dealing with the additional dependencies might not be a terrible thing but you might also have a desire to scan only what you have explicitly referenced. While there is no easy answer for this just yet, WhiteSource has just released a NPM Plugin that should alleviate this issue. Once the NPM Plugin is working for our projects we will publish another article showing how to configure your builds to utilize the new plugin for a better scanning experience.
While we did not pursue this avenue, it is possible to pass in a list of folders that you would like WhiteSource to exclude when scanning your code base. If your code base is somewhat static without a substantial number of folders that you'd like to exclude, this might be a viable option for you. However, in our case, the number of folders is quite large and would be a veritable nightmare to manage.
Scanning our projects with WhiteSource has been an eye-opening experience and has provided us with some very valuable information. That said, it has also presented us with a few challenges as you have seen above. While it is taking us a bit of time to fully work through these challenges we still see value in utilizing WhiteSource to ensure we are providing the best products possible while making use of the vast array of open source libraries at our disposal. I have no doubt that we'll see even more improvements in our processes as we continue to gain further experience.
In the next post, we’ll continue the discussion on how we deal with the myriad OSS licenses identified by WhiteSource and the craziness created by NPM. Watch the space!