Thingz Blog

Cross-Platform Identity: When the Same Private Key Signs Two Forges

2026-05-03T00:00:00+00:00

Anyone can register a popular username like torvalds on GitLab. Or Codeberg. Or Sourcehut. The GitHub account belongs to who you think it does. The others are first-come, first-served. Bios are free-text. The “linked accounts” section on a profile is whatever the contributor typed in. None of these are proof of anything beyond what the contributor chose to claim.

The same problem applies in reverse. When DevTrace is asked whether a GitHub contributor also has a GitLab presence, name matching is a guess. Profile-link extraction is a declaration. Neither is verifiable in the cryptographic sense.

There is one signal that is. If the same SSH public key appears on a contributor’s GitHub account and on their GitLab, Codeberg, or Sourcehut account, then the same private key signed operations on each platform. That private key is held by exactly one person — or, at most, exactly one breach. Producing it on two systems means controlling it.

This post walks through how DevTrace surfaces cross-VCS identity matches, what the signal proves, and where its limits are.

The hierarchy of identity claims

Most open source identity tooling treats one of three signals as “verified”:

Declared — the contributor put a link to their GitLab profile in their GitHub bio. Anyone can claim this. Worth treating as a hint, not a fact.
Asserted by a third party — a Keybase proof, a verified email domain, an OIDC issuer. Stronger, because the assertion depends on a system that has its own checks. But the chain is only as strong as the attester, and most of the historically popular attesters are out of scope or defunct.
Cryptographic — the same private key signs on both platforms. This is the only signal that requires the contributor to demonstrate control, not just claim it.

DevTrace’s cross-VCS check sits in the third category. It is not a name match dressed up as a fingerprint comparison. It is the actual fingerprint comparison.

How `.keys` endpoints work

Every major code-hosting platform except Bitbucket exposes a public endpoint that returns the SSH public keys a user has uploaded to their account. The keys are intended for ssh and git clients to verify identities at the protocol level, but the endpoints are unauthenticated and machine-readable:

https://github.com/{user}.keys
https://gitlab.com/{user}.keys
https://codeberg.org/{user}.keys
https://meta.sr.ht/~{user}.keys

Each returns one line per key in OpenSSH format:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIK... user@host
ssh-rsa AAAAB3NzaC1yc2EAAAA... id_rsa

DevTrace fetches each platform’s .keys response for the contributor’s handle, computes the SHA-256 fingerprint of every public key, and compares the sets. A match is a match — the same SHA256:... value appearing in two different platforms’ responses. No platform involvement, no API key, no rate-limit dependency on a vendor’s terms.

The result is cached weekly per platform, so the marginal cost of re-scoring a contributor whose cross-VCS surface has not changed is a single cache lookup, not four HTTP calls.

Why fingerprint matches are categorically stronger

Three reasons, in order of importance.

The signal is not declared, it is demonstrated. A bio link is a string the user typed. A fingerprint match required the user to upload the same public key on each platform. To do that, they held the corresponding private key when they did it.

Spoofing requires compromise, not registration. To fake a GitHub–GitLab match, an attacker would have to either obtain the target’s private key — equivalent to the target reusing a compromised key — or upload their own key to both accounts, which means they already control both accounts, which is a different problem entirely. There is no cheap trick.

The check is reproducible by anyone. The .keys endpoints are public. A reviewer who wants to verify the DevTrace match by hand can run curl github.com/USER.keys and curl gitlab.com/USER.keys and compare the SHA-256 fingerprints themselves. The signal is not DevTrace’s claim. DevTrace just consolidates it into one place.

What a strong cross-VCS profile looks like

A typical long-tenured open source contributor — someone who has been writing code in public for a decade — will often have keys on three or four forges. The kernel mailing-list contributors, the longtime KDE developers, the people who maintain infrastructure projects across multiple ecosystems. Their GitHub .keys matches at least one entry on GitLab and frequently on Codeberg as well.

In the scorecard, this surfaces as a Cross-VCS section showing each platform with a checkmark next to the matched fingerprint. Three matches is a strong signal. Two matches is solid. One match is meaningful but circumstantial — the contributor has at least proven control of the same key on two systems, which is more than a declaration but less than a cross-ecosystem footprint.

A contrasting profile shows zero matches. Either the contributor has no presence on forges other than GitHub, has not uploaded a key to those forges, or rotated keys recently enough that previous matches are gone. None of those is automatically suspicious, but combined with a sparse profile, low repo diversity, and no community footprint, the absence becomes part of the picture.

What this signal does not prove

A cross-VCS match is not a real-name identity check. It does not tell you whether the person behind the keys is who they claim to be in the social sense — only that the same person controls all the matched accounts. The Jia Tan account in the xz-utils incident was a single coherent identity across the surface the attacker chose to maintain. A cross-VCS match between Jia Tan on GitHub and a hypothetical Jia Tan on GitLab would have been real, in the cryptographic sense. It would not have been correct, in the social-engineering sense.

Three other limits are worth naming explicitly:

Bitbucket has no public .keys endpoint. Atlassian gates SSH key access behind an authenticated API. DevTrace cannot include Bitbucket in the cross-VCS check today. If the endpoint ever becomes public, it goes in.
The handle is assumed to be the same. DevTrace v1 fetches .keys for the same handle across platforms. A contributor who is alice on GitHub and alice-h on Codeberg will not show a match even if the underlying keys are identical. This is a known limitation that handle-mapping will address in a later release.
GPG fingerprint correlation is not yet included. GitHub exposes signing keys at github.com/USER.gpg. The same approach — fetch, SHA-256, compare — applies. It is on the roadmap but not in the v1 cross-VCS surface.

How DevTrace exposes the signal

Cross-VCS identity is a Pro-tier capability. Every Pro scorecard for an authenticated contributor includes the section automatically when the relevant handles exist. The matches are cached weekly per platform, the call budget is bounded, and the unauthenticated .keys lookup does not depend on any platform’s API quota.

For programmatic use, the same data is returned in the JSON response from /api/v1/score/{username} on a Pro plan, in a cross_vcs field with one entry per platform and the matched fingerprints listed alongside.

The reason to surface this at all is straightforward. “Verified identity” in open source has been treated as a marketing concept for too long. The strongest signal that does not require a human in the loop already exists, in the public infrastructure of every major forge. DevTrace consolidates it into the scorecard so that a reviewer reading a PR can see “matched on three forges” alongside the rest of the trust signal — without opening four tabs and computing fingerprints in a terminal.

Try it

DevTrace is free to score any public GitHub contributor at devtrace.thingz.io. The cross-VCS identity signal sits on the Pro tier alongside the behavioral signals and synthetic-contributor flags — but the rest of the scorecard, including the 25+ trust signals, AI risk narrative, and enrichment surface, is available on every authenticated plan.

If you are evaluating the projects those contributors maintain rather than just the contributors themselves, DevPulse tracks the project-level signals that complement contributor trust. A project whose top maintainers verify across multiple forges is a different proposition from one whose maintainers do not, and the two views are most useful when read together: DevTrace tells you who, DevPulse tells you how the project is holding up around them.

AI-Generated PRs and the New Shape of Contributor Risk

2026-04-30T00:00:00+00:00

AI-generated code has a 2.7x higher vulnerability density than human-written code. That figure — from CodeRabbit’s analysis of 470 real-world pull requests, and consistent with Veracode’s 2025 GenAI Code Security Report across more than 100 LLMs — gets quoted a lot in supply chain conversations. It changes the threat model on its own.

But the code is the easier half of the problem. The harder half is the contributors.

AI lowers the cost of manufacturing a convincing fake contributor history at scale. Plausible commit messages. Reasonable-looking patches. Activity patterns that mimic a real developer over months instead of hours. The same tools that make legitimate developers more productive make social engineering campaigns cheaper to execute. A campaign that took two years for xz-utils could, in principle, be parallelized across a hundred targets by one person with a script and an API key.

This is not an argument against AI-assisted development. AI in the PR pipeline is normal and net positive. It is an argument that contributor vetting has to evolve alongside the tools contributors are using — because the cheap, plentiful version of “looks like a real developer” is now widely available.

What changed

Three things shifted in the last 24 months that matter for contributor risk:

The cost of plausible activity dropped. Generating a year’s worth of believable commit history — spread across multiple repos, with appropriate language, with realistic PR descriptions — is no longer labor-intensive. It is a prompt and a loop.

The cost of plausible identity dropped. Bios, READMEs, blog posts, and even conversational issue replies can be produced at volume. The textual artifacts that used to take time to fabricate now take minutes.

Detection got harder for humans. Reviewers cannot reliably distinguish AI-generated code from human-written code in a single PR. Pattern-matching at scale across a contributor’s entire history is the only reliable place to look, and humans do not do that pattern-matching by default.

What still holds up: longitudinal behavioral signals. The thing AI is bad at faking is the texture of how a real developer interacts with the open source ecosystem over years — the rhythm of activity across timezones, the messy inconsistency of real human attention, the fingerprints of being embedded in multiple unrelated communities.

That is where DevTrace’s AI sensing layer focuses.

Two tiers of detection

DevTrace’s AI sensing operates in two tiers. Tier 1 is metadata — explicit and implicit signatures that an AI tool was involved. Tier 2 is behavioral — patterns that distinguish a real developer (with or without AI assistance) from a synthetic account.

The two tiers answer different questions. Tier 1 asks “was this PR produced with AI help?” Tier 2 asks “is this contributor a real person?” Those questions are not the same, and conflating them is how teams end up either flagging legitimate Copilot users or missing actual synthetic accounts.

Tier 1: metadata analysis

Tier 1 runs on every contributor scan. It looks at the artifacts that AI tooling leaves behind in commits, PRs, and account configuration.

Bot detection. GitHub exposes an account type field that distinguishes user accounts from bot accounts. Known bots — Dependabot, Renovate, GitHub Apps — are flagged automatically and excluded from contributor trust analysis. This is table stakes, but it matters because dashboards that ignore bot status tend to inflate contributor counts.

Commit trailers. AI coding assistants increasingly add Co-authored-by: trailers identifying themselves. Claude Code, Cursor agents, GitHub Copilot Workspace, and several others use predictable trailer formats. DevTrace parses commit messages for these trailers and tracks the proportion of a contributor’s commits that were co-authored by an AI tool.

This is informational, not punitive. A contributor whose commits are 80% co-authored by Claude Code is not less trustworthy — they are just transparent about their workflow. The signal becomes interesting when combined with other tier 2 signals.

Tool signatures. Beyond trailers, AI tools leave fingerprints in commit message formatting, branch naming conventions, and PR description structure. These signatures are weak individually but useful as a population-level indicator: a contributor whose PRs all match a single AI tool’s default output format is using that tool heavily, which is a fact worth knowing.

PR authenticity classification. This combines the above into a per-PR classification: human-authored, AI-assisted, or AI-generated. The classification is probabilistic, not absolute. The point is to give reviewers context, not to gate merges.

Tier 2: behavioral analysis

Tier 2 is the harder problem and the part that actually distinguishes real from synthetic. These signals come from GH Archive data, which gives DevTrace longitudinal visibility into contributor activity across all of GitHub — not just the repo in question.

Velocity anomaly ratio. Real developers have rhythm. They commit in bursts during focused work, go quiet during meetings, sleep, take weekends. Their velocity over a 30-day window has texture — peaks, valleys, patterns that correlate with calendar days.

A synthetic account often does not. Either the velocity is too smooth (an automated loop producing one PR per day at consistent intervals) or it is too spiky in suspicious ways (zero activity for weeks, then a flood of PRs across multiple repos in 48 hours). DevTrace computes the ratio of actual velocity to expected velocity for that account’s age and activity level. Ratios well outside the typical range are flagged.

Active hour spread. A real developer working a normal job tends to commit during a recognizable subset of the 24-hour day — their working hours, plus or minus their personal habits. The histogram of their commit timestamps has a shape: usually two or three peaks (morning, afternoon, occasional evening) with quiet zones for sleep and lunch.

A synthetic account often has either a uniformly flat distribution (commits at every hour, because the script does not sleep) or a too-narrow distribution (commits only in a 2-hour window, because that is when the operator runs the script).

This signal is noisier than it sounds — contributors travel, change jobs, work across timezones with collaborators — so DevTrace uses it as one input among many. But the failure modes (flat or narrow distributions) are distinctive enough to be useful.

Burst-vanish score. This is the signal most directly aimed at the xz-utils playbook: an account that appears, builds visible activity rapidly, then either vanishes or pivots to a single high-value target. DevTrace computes a burst-vanish score that captures the ratio of active days to total account age, weighted by the concentration of activity in specific time windows.

A real developer with five years of patchy activity gets a low burst-vanish score. An account that is six months old, was active for the last 60 days, has commits in 12 unrelated repos, and is now requesting commit access to a critical library — that account gets a high score.

Synthetic contributor flag. When tier 2 signals combine in ways consistent with a manufactured account (low diversity, high burst-vanish, narrow active hour spread, no review participation, sparse profile), DevTrace surfaces a synthetic contributor flag. The flag is not a verdict. It is a “this contributor warrants closer review before granting elevated trust” signal.

What a real high-velocity contributor looks like

The hardest population to score correctly is the legitimate high-velocity contributor. These are real developers — often working full-time on open source, or using AI assistants aggressively — who can produce more PRs in a week than a typical contributor produces in a quarter.

They share several characteristics that distinguish them from synthetic accounts:

Long account history with consistent rhythm. The high velocity is part of a multi-year pattern, not a recent spike.
Activity across many repositories. They review code, file issues, and comment in unrelated communities. They are embedded.
Realistic active hour distribution. A shape, with quiet zones, that holds up over months.
Other signals of human presence. Followers, profile, blog posts, conference talks, real-world identity that can be cross-referenced.
Conversational depth. Their issue comments and PR replies show familiarity with project history, with the maintainers, with prior decisions. Synthetic accounts struggle here because the context window is too small to fake.

DevTrace’s behavioral category is specifically tuned to surface this distinction. A contributor with 50 PRs in 30 days but a five-year account history, broad repo diversity, consistent review participation, and a normal active-hour shape will score well. The same 50 PRs from a 6-month-old account with no review participation, no repo diversity, and a flat hourly distribution will score poorly — and trigger the synthetic contributor flag.

What still slips through

No detection layer catches everything. The honest list of what does not work yet:

Hybrid accounts. A real developer with a mature account who decides to run a malicious campaign is the hardest case. Their behavioral fingerprint looks normal because it is normal. This is where DevTrace’s repo-context signals — code provenance, commit signing, organizational role — carry more weight than the behavioral category.

Slow-burn synthetic accounts. A patient adversary willing to invest two years in building authentic-looking history can defeat the burst-vanish signal by simply not bursting. The xz-utils attack worked partly because the timeline was patient. The trade-off is that this kind of attack does not scale — it is back to being expensive, which is itself a deterrent.

Coordinated networks. A cluster of synthetic accounts that interact with each other to fake community signals (followers, code reviews, mutual mentions) can defeat per-contributor analysis. Detecting these requires graph-level analysis that DevTrace does not yet do at the public-tier level. It is on the roadmap.

This is the same dynamic as any other detection problem. The cheap, common attacks get caught. The expensive, patient attacks remain hard. The point is to raise the floor.

How to use this in practice

For most teams, the practical question is not “did AI write this code?” It is “is this contributor someone whose elevated access I should trust?”

A reasonable workflow:

For first-time contributors: run a DevTrace scan as part of PR review. The trust score and any synthetic contributor flags surface in the GitHub Action output. Most first-time contributors are fine. The point is to catch the cases that warrant a second look before merging.

For contributors requesting commit access: run a deeper scan that includes the behavioral signals. Anyone whose burst-vanish score is high or whose active hour spread looks synthetic should not get commit access on the strength of a few good PRs alone, regardless of how good those PRs are.

For your own contributor base: periodically audit the contributors with elevated access against current trust scores. Account behavior changes. A contributor who scored well three years ago may have changed accounts, gone dormant, or been compromised. Trust is not a one-time check.

What this is not

DevTrace’s AI sensing is not a tool for penalizing AI-assisted development. The proportion of legitimate PRs touched by AI tools is high and rising, and any detection system that treats AI involvement as suspicious by default will produce mostly false positives.

It is also not a substitute for code review. The point is to surface contributors whose pattern of behavior diverges from what real developers look like — so that human reviewers know where to spend their attention. The reviewer still does the review.

The asymmetry that matters: synthetic contributor accounts are cheap to create and expensive to vet manually. Behavioral signals are the only way to keep the vetting cost from blowing up as the creation cost falls. That is the gap DevTrace is trying to close.

Try it

DevTrace is free to use. Score any public GitHub contributor at devtrace.thingz.io — the AI sensing layer runs automatically as part of the trust score. Tier 1 metadata signals are available on the free tier. Tier 2 behavioral signals are available on higher plans. The GitHub Action integrates the same scoring into your PR workflow.

If you are also watching the project-level side of this — DevPulse detects sudden shifts in project contribution patterns that can signal automated activity at the repository level: contributor mix changes, velocity spikes, review ratio drops. Project-level anomalies and contributor-level anomalies are usually correlated. Looking at both at once is how you tell the difference between “the project just got popular” and “something is off.”

Bus Factor 1: The Metric Your Dependency Review Is Missing

2026-04-25T00:00:00+00:00

Stars, forks, and last-commit-date. These are the three numbers most developers check before adopting a dependency. None of them answer the question that actually matters: what happens if one person stops contributing?

Bus factor — the number of contributors who account for 50% or more of recent work — is the single strongest predictor of project fragility. A project with 10,000 stars and a bus factor of 1 is one resignation, one burnout, one job change away from becoming unmaintained.

This is not hypothetical. It is the pattern behind most of the open source failures that make the news.

The metric nobody checks

When engineering teams evaluate a new dependency, the review process typically looks like this: check the star count (social proof), check the last commit date (is it alive?), maybe glance at the issue tracker (is anyone responding?). Some teams go further and check the license, the test coverage, or the SBOM.

Almost nobody checks how many people are actually doing the work.

The problem is that GitHub’s default metrics create a misleading picture. A repository can show 200 contributors in its history while only 1 of them has committed anything in the last 90 days. The contributor list is a cumulative count. It tells you who has ever touched the project, not who is keeping it alive right now.

Bus factor corrects this by measuring concentration of recent work. The computation is straightforward: rank contributors by their share of recent activity, then count how many you need before you cross 50% of total contributions. If the answer is 1, your dependency has a single point of failure.

How DevPulse computes bus factor

DevPulse computes bus factor from actual contribution events — pull requests, reviews, issues, and comments — over a configurable time window (default: 90 days). The query ranks each contributor by event count, computes a running cumulative sum, and returns the number of contributors whose cumulative total first exceeds 50% of all activity.

This is not an approximation or a heuristic. It is a direct measurement: how many people would need to leave before more than half the project’s recent work output disappears?

DevPulse also computes a parallel metric called pony factor, which applies the same 50% threshold at the organization level. A bus factor of 3 sounds reasonable until you realize all three contributors work at the same company. If that company deprioritizes the project, the effect is the same as losing one person.

Both numbers appear on the Health tab alongside an overall health grade that combines them with demand, throughput, and responsiveness scores.

What bus factor 1 actually looks like

A project transitioning from healthy to fragile rarely announces itself. The pattern is slow and the symptoms are easy to miss if you are not watching the right metrics. Here is what to look for:

Contributor retention drops first. Before a project becomes a one-person show, it loses its returning contributors. DevPulse tracks new versus returning contributors over time. A healthy project converts newcomers into repeat contributors. When returning contributor count trends downward while new contributor count stays flat, the community is not retaining people.

Time to first response climbs. When fewer people are doing the work, incoming issues and PRs wait longer for attention. DevPulse measures the hours until a PR gets its first review and until an issue gets its first comment. This is the most direct signal for contributor experience — slow responses drive external contributors away, which accelerates the retention problem.

PR review ratio deteriorates. Healthy projects maintain a review culture where PRs get reviewed before merge. When the reviewer pool shrinks, either PRs go unreviewed or the remaining maintainer becomes the bottleneck. DevPulse tracks reviews per PR over time. A declining ratio often precedes the bus factor dropping to 1.

Lead time inflates. Days from PR creation to merge stretch out. External contributors submit work, wait, and eventually stop contributing. DevPulse’s velocity charts show this clearly: lead time and first-response time trend upward together.

The pattern reinforces itself. Fewer maintainers leads to slower responses, which leads to fewer contributors, which leads to fewer maintainers. By the time bus factor hits 1, the project has usually been in decline for months.

Stars hide fragility

It is tempting to treat star count as a health indicator. It is not. Stars are a measure of historical interest, not current maintenance capacity.

Consider the dynamics: a project gets popular, accumulates stars over years, and becomes a transitive dependency for thousands of other projects. The original maintainer gets a new job, starts a family, or simply burns out. Stars keep accumulating because people keep discovering the project through blog posts, tutorials, and Stack Overflow answers that were written when the project was actively maintained.

The star count says 15,000. The bus factor says 1. The last response to an external PR was 4 months ago. The project is technically alive (there was a commit last month) but functionally unmaintained for anyone outside the single remaining contributor.

Forks tell a similar misleading story. A rising fork count without rising PRs often means people are forking to work around issues they cannot get addressed upstream. DevPulse tracks forks alongside activity events specifically to surface this divergence.

The xz-utils connection

The xz-utils incident (CVE-2024-3094) is primarily discussed as a social engineering attack, and it was. But the precondition for the attack was a project health problem: a critical compression library, depended on by most Linux distributions, maintained by a single burned-out developer.

The attacker did not need to compromise a large team. They needed to earn the trust of one overwhelmed person. The bus factor was already 1 before the attack began. The social engineering succeeded in part because the maintainer needed help and someone showed up offering it.

This is where project-level health metrics and contributor-level trust scoring are complementary. DevPulse would have shown the fragility: bus factor 1, no meaningful contributor retention, a single-person bottleneck. DevTrace would have provided the per-contributor analysis of the new person stepping in. Neither tool alone tells the full story. Together, they surface the combination of conditions that makes a project vulnerable.

How to use bus factor in dependency decisions

Bus factor is most useful as a triage metric — a fast way to separate dependencies that need deeper investigation from those that are reasonably healthy.

Before adopting a dependency:

Open the project on DevPulse and check the Health tab. Bus factor and pony factor tell you the concentration risk. But do not stop there. Check contributor retention (are people sticking around?), time to first response (are external contributors getting attention?), and PR review ratio (is there a review culture?). A bus factor of 3 with strong retention and fast response times is a healthy project. A bus factor of 3 where all three contributors are from the same org and retention is declining is a project heading toward fragility.

For dependencies you already use:

DevPulse tracks metrics over time, so you can watch for trends. Set up your portfolio to include your critical dependencies and check the Health Scorecard periodically. The scorecard grades three categories — Demand (community interest and growth), Throughput (how efficiently work moves), and Responsiveness (how quickly the team reacts) — each on an A through F scale. A project whose responsiveness grade drops from B to D over two quarters is telling you something, even if the star count keeps climbing.

At the organizational level:

For OSPO leads and engineering managers, the portfolio view shows bus factor across all tracked repositories in one table. Sort by bus factor to identify which dependencies carry the most key-person risk. Cross-reference with your dependency graph to understand blast radius: a bus-factor-1 library that appears in 3 services is a different risk than one that appears in 300.

What the metric is not

Bus factor measures concentration of recent work. It does not measure code quality, security posture, or whether the project’s architecture is sound. A project with a bus factor of 5 can still have unreviewed PRs, no test suite, and known vulnerabilities.

Bus factor also does not account for contribution quality. A maintainer who reviews every PR and triages every issue contributes more to project health than someone who commits frequently but only touches CI configuration. DevPulse weights all event types equally in the bus factor computation because distinguishing “important” from “routine” contributions requires judgment that a metric cannot automate.

Treat bus factor as one input to your risk assessment, not the assessment itself. It answers a specific question — how concentrated is the work? — and that question matters more than most teams realize. But it is not the only question.

Check your dependencies

DevPulse is free to use. Track any public GitHub repository at devpulse.thingz.io and see bus factor, pony factor, contributor retention, velocity, and 30 other metrics across the Health, Activity, Velocity, Quality, and Community dashboards.

If you are also evaluating the contributors behind the bus factor number — DevTrace scores individual contributors across 23 signals covering identity, engagement, community standing, and behavioral patterns. The two tools are complementary: DevPulse tells you the project has a bus factor of 1, DevTrace helps you evaluate the one person it depends on.

Anatomy of a Trust Score: What 23 Signals Tell You About an Open Source Contributor

2026-04-22T00:00:00+00:00

In early 2024, a contributor named Jia Tan planted a backdoor in xz-utils. It was not a smash-and-grab. It was a two-year campaign: legitimate patches, earned trust, commit access, then a carefully obfuscated payload that shipped to production distributions before anyone noticed.

Every tool in the open source security ecosystem — SCA scanners, container scanners, SBOM generators — detected the vulnerability after the fact. None of them evaluated the contributor before it happened.

This is the problem DevTrace tries to address. Not by claiming it would have prevented xz-utils (we honestly do not know), but by asking a different question: instead of scanning the package, what if you scored the person?

What a trust score actually measures

DevTrace computes a numerical trust score between 0.0 and 1.0, mapped to a letter grade (A+ through F). The score is built from 23 individual signals grouped into five categories:

Category	What it captures
Code Provenance	Are commits cryptographically signed? How mature is the account?
Identity	How old is the account? What is their role in the project? Is the profile populated?
Engagement	How much of the work in this repo is theirs? How recently? Are their PRs getting merged?
Community	Do other developers follow this person? Do they maintain their own projects?
Behavioral	Is their activity consistent over time? Do they review code? Do they contribute across repos?

None of these categories alone tells you much. A brand-new account is not inherently suspicious. A sparse profile does not mean bad intent. But the categories interact. A new account with no profile, no code reviews, commits only to one repo, and a sudden burst of merged PRs — that pattern is worth looking at more closely.

The 23 signals

Here is what actually feeds the model, organized by category.

Code Provenance

This category only applies when DevTrace has repository context (i.e., you are scoring a contributor in the context of a specific repo).

Verified commit ratio — what fraction of this contributor’s commits are cryptographically signed. Signing does not prove good intent, but it ties commits to a verifiable identity. The xz-utils attacker’s commits were not signed.
Account maturity — how old the account is, applied as a maturity factor on the verification signal. A two-year-old account with signed commits carries more weight than a two-week-old one.

Identity

Account age — days since account creation, mapped through a logarithmic curve that saturates around two years. This means the difference between 30 days and 180 days matters more than the difference between 3 years and 5 years.
Author association — the contributor’s relationship to the repository: owner, member, collaborator, prior contributor, or first-time contributor. Each maps to a different trust level.
Org membership — whether the contributor belongs to the repository’s organization, and whether they are in a trusted role.
Profile completeness — five boolean signals: bio, company, location, website, and public email. Each present field adds to the score. This is a weak signal individually (anyone can fill in a profile), but its absence in combination with other red flags becomes meaningful.

Engagement

Commit proportion — what fraction of the repo’s total commits belong to this contributor, adjusted for the number of contributors (a 10% share in a 3-person project means something different than 10% in a 200-person project).
Commit recency — days since last commit, modeled as exponential decay with a half-life that adjusts based on project size. Larger projects tolerate longer gaps.
PR acceptance rate — the ratio of merged PRs to total closed PRs. A high merge rate with a reasonable sample size suggests the contributor’s work is consistently accepted by reviewers.

Community

Follower ratio — followers divided by following, capped at a saturation point. This is a rough proxy for whether other developers know and trust this person. Not a strong signal by itself, but a useful data point.
Public repositories — how many repos the contributor maintains. More original repos (not forks) suggest someone who builds and ships their own work, not just drive-by patches.

Behavioral

This is where things get interesting. These signals come from GH Archive data, which gives DevTrace a longitudinal view of contributor activity across all of GitHub — not just the repo in question.

Consistency score — how regular is this contributor’s activity over time? Steady, sustained contribution patterns score higher than erratic bursts followed by silence.
Review participation — how many code reviews has this contributor performed in the last 30 days? Contributors who review others’ code tend to be embedded in a project’s community, not just passing through.
Repo diversity — how many distinct repositories has this contributor been active in over the last 90 days? Broader engagement suggests a real developer with real interests, not a single-purpose account.
Burst rate — recent PR activity relative to account age. A six-month-old account that suddenly opens PRs across 15 repos is not the same as a five-year-old account doing the same thing.
Fork ratio — what fraction of the contributor’s public repos are forks versus original work? Accounts that consist almost entirely of forks may not have a meaningful development history of their own.

Plus one hard gate: if an account is suspended by GitHub, the score is automatically zero regardless of other signals.

How the math works

Each signal is normalized to a 0-to-1 range using one of three functions depending on the signal type:

Logarithmic curve for signals with diminishing returns (account age, repo count). The first year of account age matters a lot; the tenth year barely moves the needle.
Exponential decay for freshness signals (commit recency). Recent activity scores high; stale activity fades.
Linear ratio for bounded signals (consistency, review count, diversity). Capped at a saturation ceiling so outliers do not distort the score.

The normalized signal values are multiplied by their category weights and summed. When a contributor is scored without repository context (no specific repo provided), the repo-dependent weights — code provenance, commit proportion, recency, and author association — are excluded and the remaining weights are rescaled so they still sum to 1.0.

The final number maps to a letter grade. A 0.73 is a B. A 0.57 is a C. Below 0.37 is an F.

What this looks like on the xz-utils timeline

I want to be careful here. DevTrace did not exist during the xz-utils incident, and I am not going to fabricate numbers. But I can walk through which signals would have surfaced concerns based on what we know publicly about the “Jia Tan” account.

Identity signals that would have flagged:

The Jia Tan GitHub account was created specifically for the xz-utils campaign. At the time commit access was granted, the account was roughly two years old — which actually clears the account age signal. This is exactly why no single signal is dispositive. But the profile was sparse: no company, no website, no other visible community presence. The profile completeness signal would have scored low.

Engagement signals that would have flagged:

The account’s PR activity was concentrated in a single project. The commit proportion in xz-utils would have been high relative to the small contributor base, which is not inherently bad — but in combination with limited activity elsewhere, it paints a picture of a single-purpose account.

Community signals that would have flagged:

The account had minimal followers and no meaningful following network. No public repos of their own beyond the target project. The follower ratio and public repo count signals would both have been low.

Behavioral signals that would have flagged:

This is where the model is most relevant. The Jia Tan account showed a pattern that DevTrace’s behavioral category is specifically designed to detect:

Low repo diversity — activity concentrated in one or two repositories
No review participation outside the target project — a contributor embedded in a real community typically reviews code across multiple projects
Low consistency — the activity pattern was purpose-driven, not the organic rhythm of someone who codes across different projects over years

What would NOT have flagged:

Account age would have looked reasonable (two years is past the logarithmic inflection point). The PR acceptance rate would have been high — the patches were legitimate and useful. This is the hard part: social engineering works precisely because the work is real. No scoring model catches everything.

The aggregate picture:

No single signal would have raised an alarm. But the composite score — a single-project account with a sparse profile, no community footprint, no code review activity outside the target repo, and low diversity — would have landed well below the scores of typical long-term maintainers. Not an automatic rejection, but a strong signal that this contributor deserved more scrutiny before being granted commit access.

That is the point. DevTrace does not make access control decisions. It provides data so that maintainers can make better-informed ones.

AI sensing: the newer layer

On top of the 23 core signals, DevTrace includes an AI sensing layer that looks for signs of AI-generated or AI-assisted contributions. This is a separate concern from trust scoring, but it is increasingly relevant.

The first tier is metadata analysis: checking for co-authored-by trailers from known AI tools, bot-associated PRs, and known tool signatures. The second tier, available on higher plans, computes behavioral heuristics — velocity anomaly ratios, active hour spread, and a burst-vanish score that flags contributors who appear in a sudden burst of activity and then disappear.

These are not about penalizing AI-assisted development. Copilot-assisted PRs are normal. The concern is synthetic contributor accounts — manufactured identities with fabricated histories. AI lowers the cost of creating these, and the behavioral heuristics are designed to distinguish a real developer who uses AI tools from an account that only exists because of them.

NIST SSDF mapping

For teams that need to demonstrate compliance with NIST SP 800-218 (the Secure Software Development Framework), DevTrace maps its signals to eight SSDF practices across three practice groups. These include code access controls, release integrity verification, contributor identity provenance, code review participation, activity consistency, anomaly detection, and contributor maturity assessment.

The mapping is intentionally conservative — DevTrace describes where its signals are “relevant to” a practice, never that they “satisfy” it. Compliance is a judgment call for your organization, not something a tool can declare on your behalf. But having the data organized against the framework saves time when the auditors come around.

Full methodology is documented at devtrace.thingz.io/compliance.

What the score is not

A trust score is not a background check. It does not tell you whether someone is trustworthy as a person. It tells you whether their observable behavior on a platform matches the patterns of established, embedded contributors — or whether it diverges in ways that warrant closer attention.

A low score does not mean “reject this contributor.” A high score does not mean “trust without review.” The score is one input to a decision, not the decision itself.

We got into this because we watched the xz-utils story unfold and realized that the open source ecosystem had robust tooling for scanning code and packages, but almost nothing for evaluating the people who write them. DevTrace is our attempt to close that gap. It is an imperfect attempt — 23 signals cannot fully capture the complexity of human behavior — but it is a starting point.

Try it

DevTrace is free to use. Score any public GitHub contributor at devtrace.thingz.io. If you want contributor scoring integrated into your PR workflow, there is a GitHub Action that runs on every pull request.

If you are also thinking about the project-level side of this — DevPulse tracks the health metrics (bus factor, contributor retention, review ratios) that often precede the kind of maintainer burnout that made xz-utils vulnerable in the first place. The two tools are complementary: DevPulse evaluates the project, DevTrace evaluates the people.

Thingz Blog

Cross-Platform Identity: When the Same Private Key Signs Two Forges

The hierarchy of identity claims

How .keys endpoints work

Why fingerprint matches are categorically stronger

What a strong cross-VCS profile looks like

What this signal does not prove

How DevTrace exposes the signal

Try it

AI-Generated PRs and the New Shape of Contributor Risk

What changed

Two tiers of detection

Tier 1: metadata analysis

Tier 2: behavioral analysis

What a real high-velocity contributor looks like

What still slips through

How to use this in practice

What this is not

Try it

Bus Factor 1: The Metric Your Dependency Review Is Missing

The metric nobody checks

How DevPulse computes bus factor

What bus factor 1 actually looks like

Stars hide fragility

The xz-utils connection

How to use bus factor in dependency decisions

What the metric is not

Check your dependencies

Anatomy of a Trust Score: What 23 Signals Tell You About an Open Source Contributor

What a trust score actually measures

The 23 signals

Code Provenance

Identity

Engagement

Community

Behavioral

How the math works

What this looks like on the xz-utils timeline

AI sensing: the newer layer

NIST SSDF mapping

What the score is not

Try it

How `.keys` endpoints work