Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Package Scoring v1 #11884

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
219 changes: 219 additions & 0 deletions proposed/2022/package-scoring-v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# Package Scoring v1

- [Jon Douglas](https://github.com/JonDouglas)
- Start Date (2022-05-16)
- [dotnet/designs#216](https://github.com/dotnet/designs/pull/216)

## Summary

<!-- One-paragraph description of the proposal. -->
Having packages of high quality is crucial to the .NET ecosystem. As a first step to increase awareness of the characteristics that make a package high quality, we are proposing the first iteration of an effort known as package scoring.

This proposal and initial iteration will focus on common additions to a NuGet package that benefits the entirity of the .NET developer ecosystem. For reference, majority of developers we survey and chat with talk about 5 key characteristics that they deem a high quality or "healthy" package:

A high quality or "healthy" package is one that follows the following characteristics:

- It is actively maintained. Either with recent commits or an annual update/notice that the package is up-to-date.
- It is documented. It provides enough documentation to install, get started, and has public API documentation on it's members.
- It has a way to report bugs. It provides a centralized location or repository in which issues are regularly triaged & resolved in future releases.
- It resolves security flaws quickly. It fixes known vulnerabilities & releases an unaffected release quickly.
- It is not deprecated. It is in an active state meeting all the criteria above.

This proposal introduces the first iteration of package scoring. While it derives from the original proposal [last year on this topic](https://github.com/dotnet/designs/pull/216), it provides the minimum requirements for a v1.0 of implementing this on NuGet.org.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume "on NuGet.org" means that Visual Studio and dotnet CLI are out of scope. Therefore, there are also no proposed changes to the NuGet protocol (that clients use to talk to servers).

Is it worth making any of this more explicit?


## Motivation

<!-- Why are we doing this? What pain points does this solve? What is the expected outcome? -->

Anyone browsing a package on NuGet.org should be able to quickly evaluate the security and health of any NuGet package. By bringing package scoring to NuGet, we provide benefits of helping deter software supply chain attacks before they happen and gives consumers and package maintainers comprehensive protection by addressing any potential issues with their package(s) before they become a problem. This provides a pro-active approach towards security rather than a reactive one in which we assume that open source code and really any code may be in fact malicious. With helpful scoring indicators, package scoring helps you audit and make the best trust decisions for taking on new dependencies.

## Explanation

### Functional explanation

<!-- Explain the proposal as if it were already implemented and you're teaching it to another person. -->
<!-- Introduce new concepts, functional designs with real life examples, and low-fidelity mockups or pseudocode to show how this proposal would look. -->

Package scoring will contain four unique categories to start off with. The categories are the following:

- Popularity - How popular a package is & recognized in the ecosystem.
- Quality - The completeness of a package following best practices & providing documentation.
- Maintenance - The state of maintenance for a package based on it's update cadence.
- Security - The sense of trust based on a package being free of known security vulnerabilities, license risk, and current supply chain risk.

Each category has the potential to total up to 100 points.

- Categories in which the package is **in control** can always reach the maximum amount of points.
- Categories in which the package is **not in control** will be determined by the state of the ecosystem.

EX: A package is in control of providing ample metadata to ensure the quality of the package reaches the full score. A package is not in control of how popular it may be however.

To scale package scoring for the future, each score will be comprised of an analysis of package performance and potential package/dependency issues. Each of these issues will be categorized into one of these four categories and empower the author and consumer to resolve these issues to the best of their ability and control.

#### Issues

As a means to scale package scoring to it's potential through continuous iteration, each category will have its own set of issues that a user can be aware of when browsing through the package. Keeping the same idea of **being in control** or **not in control** also applies to package issues.

For issues which can be addressed, an issue will include an empowering message regarding how one can take action to help address the issue.

EX: A package might be missing a README which a developer may not be able to easily get started with the package. They might be able to then contribute a new issue or even the inclusion of the README file to the NuGet package if it's open source.

**Issue Examples:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great list.

Wonder if we should take a look at how feasible all the individual bullet points are


- Missing dependency
- Missing README
- No example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this?

- Not enough public API documentation
- Bad semver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does bad semver mean?

- Not v1+
- No website
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this refer to the projecturl metadata?

- No repository
- Unmaintained
- No bug tracker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have dedicated metadata for bug tracker URL. And for the life of me I couldn't find an issue tracking such a feature request. We could potentially render an "issues" URL if a GitHub repository URL is provided by this isn't even perfect. You can turn off issues at the repo level in GitHub (see NuGet/NuGet.Client repo) so we can't even assume {GitHub repo}/issues is a valid link.

- Has CVE (Critical, High, etc)
- Deprecated license
- Missing license
- Non SPDX license
- Unsafe copyright
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would this be?

- Deprecated
- Empty package
- etc

#### Popularity

Popularity is the more unique of the other three categories which are more issue based. Popularity is calculated as how many projects depend on a package over the past 30 days. This is scored as a percentile up to 100% (most used) to 0% (least used).

Since this is not possible with current data pipelines, we will work on providing such an experience in the future and use two proxy values such as total weekly downloads and total count of packages depending on the package to be scored up to a percentile of up to 100%.

- Total Weekly Downloads
Copy link
Contributor

@erdembayar erdembayar Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why weekly downloads? It can be gamed too. Because during XMAS down time someone run bunch of bots to generate 10k downloads for own package, then it it's placed at higher in ranking then it gets more downloads in following weeks. To reduce weight of this kind of spike in finance use moving average last 2-3 months. Actually, implementing moving average is not complicated, simple arithmetic.

- Number of Dependents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be tuned to dependents owned by other others? Or perhaps number of distinct owners depending on the package with their own package?


Total Score - 100

#### Quality

Quality is the combination of:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

  • sources available
  • symbols available
    (likely SourceLink)

i.e. the package is easily debuggable.

Copy link

@maxkoshevoi maxkoshevoi Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "Opted in to Nullable reference types"?

https://pub.dev has this as a metric and even adds this cool badge to the package

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea, and a huge win for package consumers if implemented more broadly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @loic-sharma 😃

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullable reference types require C# 8, which is not supported on .NET Framework. If NRT is scored, then it will lure developers to an unsupported scenario.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's not entirely true. You can use C# 8+ features in .Net Framework as long as they don't require some new types from SDK.

NRT only requires a handful of attributes which can be created manually, or added via something like https://www.nuget.org/packages/Nullable#readme-body-tab

After that NRT can be fully used even in .Net Framework


- Follow NuGet conventions
- Provides a valid README
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would a valid readme mean? Something that renders correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @chgill-MSFT, @lyndaidaii - we've talked a bit about README quality in other contexts. I think any valid, linked readme file should meet the criteria. Perhaps we can basically do a string.IsNullOrWhiteSpace. Once authors have onboarded to the feature, it's very easy to incrementally improve it. It's a better baseline to be in than just a plain text description.

Imagine if you see someone else's repo you want to open a PR against -- it's certainly more approachable to edit a single Markdown file in the PR than do all of the onboarding and potential plumbing to set it in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could validate that the README uses HTTPS for all links and images?

- Provide documentation
- Provides an example
- 20% or more of the public API has xml doc comments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be fixed soon (per #5926) but I believe XML docs don't work with today. It's unclear whether this is a useful quality metric while the E2E is not working. Perhaps @zivkan or @heng-liu can correct me here.

- Platform support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be mindful about how we handle this for packages that are dedicated to a specific platform only, example mac only.
There could be an android matching one that's a separate package potentially and they have the same dependencies.

Not sure if we can easily capture any of my comments in automation, but just wanted to make sure we don't miss out on that perspective :)

- Supports all possible modern platforms i.e. [.NET 5+/.NET Standard](https://docs.microsoft.com/en-us/dotnet/standard/net-standard?tabs=net-standard-1-0#net-5-and-net-standard) (iOS, Android, Windows, MacOS, Linux, etc)
- Pass static analysis
- Code has no errors, warnings, lints, or formatting issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static analysis would be performed on compiled assemblies, not the input source, right? Or would we need to correlate source and run static analysis on that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of these are better applicable to eco-systems where the code is shipped rather than compiled assemblies.
Example formatting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Not sure what static analysis means exactly.

- Support up-to-date dependencies
- All of the package dependencies are supported in the latest version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this.
Because of how compatibility works, if a package supports a certain framework, then all future versions of that framework are likely to be supported as well.

If a package claims to support a framework but their dependencies don't then that's a problem package.
Maybe that's what we should focus on instead, ensure that the package is installable in all of it's declared frameworks.

- Package supports the latest stable .NET SDK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work?
.NET SDK support is not really a package construct right now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know this with the compiler flags metadata included in the pdb's

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that answers the question.

Supports SDK is not really a thing. Supporting certain frameworks is what's expressed in NuGet.

Metadata about what SDK was used to build probably isn't helpful either.


Each of these issues would be associated an arbitrary amount of points to make a total of 100. This definition of quality is debatable and thus we will create a minimal set of quality practices.

Total Score - 100

#### Maintenance

Maintenance is the simplest of them all as it pertains to whether the package has been updated in a reasonable timeframe. Packages that have not been updated in more than a year may be unmaintained and any problems with the package may go unaddressed.

- Package updated in the last year.

The score may also be shown as a decayed value over the year approaching the year mark where it goes stale.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how a package owner can attest "this package is still maintained but it hasn't needed an update in over 1 year". Or is this a case we don't think is common?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that different people have different preferences, but I'm on the more risk adverse side. I highly value mature and stable packages. Packages with high churn increase risk of bugs, so I strongly dislike punishing (reducing the score) for "older" packages.

Consider Newtonsoft.Json for example. Excluding the recent prerelease version, the current stable version (13.0.1) is more than 12 months ago. The time between it and the previous stable version (12.0.3) was more than 12 months. Are we really willing to say that this package is unmaintained and deserves a reduced score?

I understand the intent is to encourage packages that have bug fixes, but I think a simple "how long since the last publish date" is too simplistic and will have worse unintended consequences than the benefits it hopes to bring.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heard a great quote about the Rust packages (crates) today: https://youtu.be/Z3xPIYHKSoI?t=145

They are not abandoned, they are done

I feel like this should be something to strive towards, instead of expecting packages to be constantly updated


Total Score - 100

#### Security

Finally, Security represents the trust indicators representing the package in the current state of the supply chain risk. This includes known vulnerabilities, license risk, and any issues surrounding secure supply chain risk.

- No known security vulnerabilities (Critical, High, Moderate, Low) in top-level or transitive dependencies.
- Specifies a valid license (SPDX, embedded)
- Not deprecated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should packages with installation scripts get flagged for security purposes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concept of "flagged" in this scoring conversation is very interesting. We can have "clearly good" things like having a license and "clearly bad" things like having no content or dependencies. But what about "watch out!" things? Scripts/MSBuild tasks are not necessarily bad but they should probably be noted to the consumer.

But if we do that, then we have to think about transitivity... a sneaky package author could hide a flagged attribute in a dependency package. It gets complicated fast.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making scripts/msbuild tasks more visible on a package details page? Like have a tab with its text, so one could evaluate the contents without downloading a package?


An emphasis would be placed on known security vulnerabilities and license restriction.

Total Score - 100

### Technical explanation

<!-- Explain the proposal in sufficient detail with implementation details, interaction models, and clarification of corner cases. -->

## Drawbacks

<!-- Why should we not do this? -->
- This is a novel area and hasn't been done from the security perspective by any major package ecosystem. There are working groups trying to solve this problem today.
- This can paint packages in an unfair light for issues they may not be able to actually control but perceived that they can.
- This can be gamified to a certain degree.
- This is one model of package issues and scores. There are other models such as all-up scorecards based on best practices.

## Rationale and alternatives

<!-- Why is this the best design compared to other designs? -->
<!-- What other designs have been considered and why weren't they chosen? -->
<!-- What is the impact of not doing this? -->

As mentioned in the [original spec](https://github.com/dotnet/designs/pull/216), there is a significant value add to user needs from regular surveys with regards to:

- It's hard to tell if a package is of high quality and actively maintained. (5.27 Score)
- It has insufficient package documentation (i.e. Readme, changelog, examples, API reference). (4.81 Score)
- It is hard to tell if a package is secure. (4.61 Score)

In addition in recent surveys of 2022, we have seen other themes pop up as well such as:

- Clear metrics that help me evaluate package quality (3rd most popular averaging 16/100 points spent on this area)
- Special badges for key ecosystem projects, trusted packages that significantly contribute to the .NET ecosystem (4th most popular averaging 14/100 points spent on this area)

We then asked how people install a package with the following emphasis in order.

1. Package solves my problem.
2. Package has enough downloads.
3. Package is open source.
4. Package has high quality documentation.
5. Package is maintained by a notable author or organization.
6. Package has been updated recently and is updated regularly.
7. Package has few dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth analyzing or surfacing the total size of a package's dependency graph or just consider the top level. If the later, maybe a score based on this could be gamed/unintentionally obfuscated by having a single meta-package dependency.

8. Package is mentioned in blog posts and platforms like StackOverflow.
9. Package is code signed by the author.
10. Manual code inspection of the package’srepository.
11. Package has a blue checkmark icon.
12. Approval bycomponent governance team or other stakeholder.

Finally, we know the dissatisfaction of browsing for packages to be the following in order of priority:

1. Evaluating the overall quality of a package.
2. Evaluating if I can trust a package or publisher.
3. Discovering new packages.
4. Evaluating if I can legally use a package.
5. Finding necessary package documentation.

With all this said, we believe there is significant impact in doing this work for the security and evolution of the .NET ecosystem.

## Prior Art

<!-- What prior art, both good and bad are related to this proposal? -->
<!-- Do other features exist in other ecosystems and what experience have their community had? -->
<!-- What lessons from other communities can we learn from? -->
<!-- Are there any resources that are relevant to this proposal? -->

- https://socket.dev/
- https://npms.io/
- https://deps.dev/
- https://pub.dev/
- https://www.npmjs.com/

## Unresolved Questions

<!-- What parts of the proposal do you expect to resolve before this gets accepted? -->
<!-- What parts of the proposal need to be resolved before the proposal is stabilized? -->
<!-- What related issues would you consider out of scope for this proposal but can be addressed in the future? -->
- What happened to the community score?
- At this time, it would require a significant amount of work to add GitHub/GitLab/BitBucket and other git providers being supported for a community score. Thus the focus is on the package for these first iterations. In the future this can be revisited with the evolution of security for OSS repositories.

## Future Possibilities

<!-- What future possibilities can you think of that this proposal would help with? -->
- The .NET ecosystem can use exposed score APIs & metadata to create new tooling & experiences with.
- [Package Validation](https://docs.microsoft.com/en-us/dotnet/fundamentals/package-validation/overview) can be added to a future issue check.
- [Reproducible Builds](https://github.com/dotnet/reproducible-builds) can be added to a future issue check.
- [OSSF Scorecard](https://github.com/ossf/scorecard) can be added to a future issue check.
- Score distributions on different issues will vary with implementation and weighing. We will not get this right the first or second time. We will need to iterate constantly to find values that make sense.
- Scoring can be iterated and improved in future versions as we learn more about the overall health of the .NET ecosystem.