Anatomy of a Trust Score: What 23 Signals Tell You About an Open Source Contributor

In early 2024, a contributor named Jia Tan planted a backdoor in xz-utils. It was not a smash-and-grab. It was a two-year campaign: legitimate patches, earned trust, commit access, then a carefully obfuscated payload that shipped to production distributions before anyone noticed.

Every tool in the open source security ecosystem — SCA scanners, container scanners, SBOM generators — detected the vulnerability after the fact. None of them evaluated the contributor before it happened.

This is the problem DevTrace tries to address. Not by claiming it would have prevented xz-utils (we honestly do not know), but by asking a different question: instead of scanning the package, what if you scored the person?

What a trust score actually measures

DevTrace computes a numerical trust score between 0.0 and 1.0, mapped to a letter grade (A+ through F). The score is built from 23 individual signals grouped into five categories:

Category What it captures
Code Provenance Are commits cryptographically signed? How mature is the account?
Identity How old is the account? What is their role in the project? Is the profile populated?
Engagement How much of the work in this repo is theirs? How recently? Are their PRs getting merged?
Community Do other developers follow this person? Do they maintain their own projects?
Behavioral Is their activity consistent over time? Do they review code? Do they contribute across repos?

None of these categories alone tells you much. A brand-new account is not inherently suspicious. A sparse profile does not mean bad intent. But the categories interact. A new account with no profile, no code reviews, commits only to one repo, and a sudden burst of merged PRs — that pattern is worth looking at more closely.

The 23 signals

Here is what actually feeds the model, organized by category.

Code Provenance

This category only applies when DevTrace has repository context (i.e., you are scoring a contributor in the context of a specific repo).

Identity

Engagement

Community

Behavioral

This is where things get interesting. These signals come from GH Archive data, which gives DevTrace a longitudinal view of contributor activity across all of GitHub — not just the repo in question.

Plus one hard gate: if an account is suspended by GitHub, the score is automatically zero regardless of other signals.

How the math works

Each signal is normalized to a 0-to-1 range using one of three functions depending on the signal type:

The normalized signal values are multiplied by their category weights and summed. When a contributor is scored without repository context (no specific repo provided), the repo-dependent weights — code provenance, commit proportion, recency, and author association — are excluded and the remaining weights are rescaled so they still sum to 1.0.

The final number maps to a letter grade. A 0.73 is a B. A 0.57 is a C. Below 0.37 is an F.

What this looks like on the xz-utils timeline

I want to be careful here. DevTrace did not exist during the xz-utils incident, and I am not going to fabricate numbers. But I can walk through which signals would have surfaced concerns based on what we know publicly about the “Jia Tan” account.

Identity signals that would have flagged:

The Jia Tan GitHub account was created specifically for the xz-utils campaign. At the time commit access was granted, the account was roughly two years old — which actually clears the account age signal. This is exactly why no single signal is dispositive. But the profile was sparse: no company, no website, no other visible community presence. The profile completeness signal would have scored low.

Engagement signals that would have flagged:

The account’s PR activity was concentrated in a single project. The commit proportion in xz-utils would have been high relative to the small contributor base, which is not inherently bad — but in combination with limited activity elsewhere, it paints a picture of a single-purpose account.

Community signals that would have flagged:

The account had minimal followers and no meaningful following network. No public repos of their own beyond the target project. The follower ratio and public repo count signals would both have been low.

Behavioral signals that would have flagged:

This is where the model is most relevant. The Jia Tan account showed a pattern that DevTrace’s behavioral category is specifically designed to detect:

What would NOT have flagged:

Account age would have looked reasonable (two years is past the logarithmic inflection point). The PR acceptance rate would have been high — the patches were legitimate and useful. This is the hard part: social engineering works precisely because the work is real. No scoring model catches everything.

The aggregate picture:

No single signal would have raised an alarm. But the composite score — a single-project account with a sparse profile, no community footprint, no code review activity outside the target repo, and low diversity — would have landed well below the scores of typical long-term maintainers. Not an automatic rejection, but a strong signal that this contributor deserved more scrutiny before being granted commit access.

That is the point. DevTrace does not make access control decisions. It provides data so that maintainers can make better-informed ones.

AI sensing: the newer layer

On top of the 23 core signals, DevTrace includes an AI sensing layer that looks for signs of AI-generated or AI-assisted contributions. This is a separate concern from trust scoring, but it is increasingly relevant.

The first tier is metadata analysis: checking for co-authored-by trailers from known AI tools, bot-associated PRs, and known tool signatures. The second tier, available on higher plans, computes behavioral heuristics — velocity anomaly ratios, active hour spread, and a burst-vanish score that flags contributors who appear in a sudden burst of activity and then disappear.

These are not about penalizing AI-assisted development. Copilot-assisted PRs are normal. The concern is synthetic contributor accounts — manufactured identities with fabricated histories. AI lowers the cost of creating these, and the behavioral heuristics are designed to distinguish a real developer who uses AI tools from an account that only exists because of them.

NIST SSDF mapping

For teams that need to demonstrate compliance with NIST SP 800-218 (the Secure Software Development Framework), DevTrace maps its signals to eight SSDF practices across three practice groups. These include code access controls, release integrity verification, contributor identity provenance, code review participation, activity consistency, anomaly detection, and contributor maturity assessment.

The mapping is intentionally conservative — DevTrace describes where its signals are “relevant to” a practice, never that they “satisfy” it. Compliance is a judgment call for your organization, not something a tool can declare on your behalf. But having the data organized against the framework saves time when the auditors come around.

Full methodology is documented at devtrace.thingz.io/compliance.

What the score is not

A trust score is not a background check. It does not tell you whether someone is trustworthy as a person. It tells you whether their observable behavior on a platform matches the patterns of established, embedded contributors — or whether it diverges in ways that warrant closer attention.

A low score does not mean “reject this contributor.” A high score does not mean “trust without review.” The score is one input to a decision, not the decision itself.

We got into this because we watched the xz-utils story unfold and realized that the open source ecosystem had robust tooling for scanning code and packages, but almost nothing for evaluating the people who write them. DevTrace is our attempt to close that gap. It is an imperfect attempt — 23 signals cannot fully capture the complexity of human behavior — but it is a starting point.

Try it

DevTrace is free to use. Score any public GitHub contributor at devtrace.thingz.io. If you want contributor scoring integrated into your PR workflow, there is a GitHub Action that runs on every pull request.

If you are also thinking about the project-level side of this — DevPulse tracks the health metrics (bus factor, contributor retention, review ratios) that often precede the kind of maintainer burnout that made xz-utils vulnerable in the first place. The two tools are complementary: DevPulse evaluates the project, DevTrace evaluates the people.