The Open Source Security Foundation (OpenSSF), as its name plainly states, aims to help make open source software more secure, but improvements flowing from its efforts are hard to find.
Computer scientists at North Carolina State University have put one of its tools to the test by evaluating software package registries npm and PyPI using OpenSSF Scorecards.
As security expert Bruce Schneier observed two decades ago, “Security is a process, not a product.”
Yet the processes used appear not to have worked very well – certainly not well enough to prevent the supply chain attack on SolarWinds’ Orion software, the ransomware attack on the Colonial Pipeline network, or the exploitation of the Apache Log4j bug. Nor have the blizzard of responses – executive orders, government initiatives, academic engagement, and private sector efforts – made malicious hacking much more of a challenge.
Packages could achieve a score of 10 even if it had multiple unpinned dependencies
The open source ecosystem, upon which so much public and private sector code depends, has been a remedial focal point. The Linux Foundation’s August 3 2020 establishment of the OpenSSF, backed by IBM, GitHub, Google, JPMorgan Chase, Microsoft, NCC Group, and Red Hat, among others, represents one recent attempt at amelioration.
The OpenSSF Scorecard project was launched in November 2020, to provide an automated tool to determine whether particular security practices are followed. Scorecards rate 18 different heuristics, or checks, with a score ranging from 0 to 10. These include things like Binary Artifacts, Branch Protection, and Dangerous Workflow, to name a few.
In a preprint paper distributed via ArXiv, NCSU researchers Nusrat Zahan, Parth Kanakiya, Brian Hambleton, Shohanuzzaman Shohan, and Laurie Williams applied the OpenSSF Scorecard to software packages within npm and PyPI in order to see what security practices could be identified among the developers using those registries.
“Our study shows a gap in security practices for both ecosystems,” said Nusrat Zahan, corresponding author of the study and a doctoral student at North Carolina State University, in an email to The Register. “Code-Review, Maintained, Binary Artifacts, License, and Branch Protection are practices that assess a repository’s security posture. Aside from Binary Artifacts, you will notice that both ecosystems failed to implement these practices at scale.”
“On the contrary, practices like Dangerous Workflows and Token Permission scan GitHub workflows to verify the presence of good practices. But what would happen if a repository did not contain GitHub workflows? The tool would still give a high score to that package because it could not detect any bad practices. Hence, even if these metrics had a high percentage of packages with good practices, it also opens up the debate about whether Scorecard should check for the existence of GitHub workflows before verifying the good or bad practices for accurate results.”
The researchers’ results – which they expect to update later this week in a revised draft – show both the value and limitations of automated security testing.
Both npm and PyPI scored well in the “Dangerous Workflow” check, the only metric rated “Critical” in terms of importance. This check looks for untrusted code checkout and for script injection with untrusted context variables in packages’ GitHub workflows as a result of misconfigured GitHub Actions (automation scripts).
“More than 99 percent of packages passed the check,” the researchers’ paper says. “However, we found 1,938 npm packages and 508 PyPI packages where Scorecard found vulnerable code patterns.”
An attacker could abuse a vulnerable package, for example, by crafting a malicious GitHub issue title that injects code and opens a reverse shell connection. The fact that 99 percent of packages dealt with this risk is heartening but as security types often observe, “Defenders have to be right 100 percent of the time and attackers have to be right once…”
“It shows that we can use the Scorecard tool to detect open vulnerabilities in a potential dependency,” explained Zahan. “The statistic might not accurately reflect the number of packages with open vulnerabilities. OSV [Open Source Vulnerabilities database] only contains a list of vulnerabilities that have been reported.”
“Packages may contain more vulnerabilities than are listed. For example, Elder et al. showed in a study that they found 95 times more vulnerabilities than reported. Hence, if we do more in-depth studies to detect vulnerabilities, we might find more than we know, and in that case, our finding shows evidence that we need to focus on secure coding. Note that the scorecard tool gives us a way to measure these security practices, but it is up to the practitioners to determine how they can improve package security.”
The “Maintained” check underscores just how much open source software is not attentively maintained. The researchers found “more than 85 percent of packages in npm and 75 percent of PyPI packages were unmaintained in GitHub.”
The “Code Review” check also revealed a useful finding: Only 30 percent of npm packages and 34 percent of PyPI packages declared code review practices in their repositories.
The researchers say that’s to be expected given that, particularly in npm, packages often have only a single maintainer. They point to a study published last year, conducted by some of the same computer scientists, that found 1.5 million npm packages had an average of 1.7 maintainers.
But given the solo nature of so many of these software libraries, those involved in securing the open source ecosystem may want to explore whether cost-effective code reviews can be made available for popular single-person projects.
Both npm and PyPI scored poorly on checks like “Security-Policy,” “Packaging,” “Signed Releases,” and “Fuzzing.” While none of these gaps represent urgent problems, they show how these package ecosystems and participating developers could take security more seriously.
Another heuristic, “Pinned Dependencies,” seems to show npm and PyPI in a good light, with more than 99 percent of packages having at least one pinned dependency. Of these, 81 percent of npm packages and 66 percent of PyPI packages scored 10 – they had no unpinned dependencies, which is generally considered safer.
But those high scores masked frailties. “We found packages could achieve a score of 10 even if the package had multiple unpinned dependencies in the JavaScript package’s package.json file, indicating Scorecard findings do not indicate the accurate status of pinned dependencies in an ecosystem,” the paper explains. “We also observed that Scorecard does not verify the presence of Dockerfiles, shell scripts, and GitHub workflows files in a repository.”
What this suggests is that automation alone isn’t enough. Automated tools have to be able to make accurate measurements and those tools remain works-in-progress.
“Scorecard provides a head start for practitioners to measure package security practices,” said Zahan. “The Scorecard project is evolving based on the findings and recommendations from practitioners. The software industry seeks to standardize supply chain security procedures through initiatives including Scorecard, Alpha-Omega, OSV, and OSI.”
“Research like ours helps to understand how a package performs against other OSS packages in an ecosystem and how the scorecard can improve automated testing. The Scorecard team welcomed our research and agreed to work on these findings to enable automated testing to run more effectively. But it will require community efforts to standardize and implement these tests.” ®