Neynar Scores: Tackling Onchain Reputation

Sep 30, 2025

Neynar

Neynar User Score is quietly one of our most used products that we offer. It's a value between 0 and 1.0 that we assign to every account on Farcaster, and has become the primary way that developers leverage reputation analysis in their mini apps, agents, and other products. If you’re on Farcaster, it’s likely you’ve seen people on the feed asking our @neynar agent what they’re score is, or had your own score evaluated by an app or product on the network.

Agents like @bracky and our own @neynar bot leverage the score to determine who to reply to since it’s infeasible to respond to every single user account. Apps with built-in incentives like Eggs and QR also leverage it to determine who to allocate tokens to to ensure that distribution is fair and not overwhelmingly in the hands of sybils and farmers. Clients like the Base app also leverage our score to filter notifications, replies, and engagement metrics.

Today, we want to share a bit more about how the Neynar Score works under the hood. While the specific algorithm that determines the score is proprietary (if we revealed this, the score could get gamed and thus defeat its own purpose), we do want to highlight some of the interesting engineering and machine learning work that enables the score in the first place.

Before diving in, it’s worth noting that the score’s purpose is not proof-of-humanity - meaning, the score makes no assessment on the likelihood of a user being a human or a bot/agent. There are agents with high scores (like Bankr, Gina, Bracky, and Neynar) and humans with low scores. The purpose of the score is primarily to assess spam - in other words, the quality of a user in terms of the value they add to the network.

Quality and spam are nuanced topics that don't have a one-size-fits-all solution. Generally we might think a user is high quality if other user's find their content useful or entertaining, but this also has high variance across the network. One user might be posting quality content about his pet fish, but much of the rest of the network may not be interested - that doesn't necessarily mean this content is spam. Another user might be reposting memes from X with the explicit intention of engagement farming - some users might find this spam, but others might find the memes to be high quality.

Spammy users are also constantly adapting and iterating to the new meta - because the cost to spam is so low, and the reward can be very high in crypto especially, there's a large incentive for spammy users to attempt to game algorithms so they might be able to build up their reputation and be eligible for rewards, airdrops, and more.

Because quality and spam are dynamic and can't be measured so linearly, we can leverage machine learning to identify deeper relationships in the data and build a reliable score across all the vectors that might inform whether a user is high quality or spammy. Here's more on how we do it:

Training the model

The first step is to establish the initial dataset for the model to be trained on. This begins with a proprietary dataset with users assigned scores to begin with, based on existing public signals, protocol data, and onchain data. The dataset does *not* include any features tied to demographics of a user or opinions around their actual cast content. The dataset serves as the baseline for the model to train on and also what we can then use to evaluate its accuracy.

We then use a machine learning algorithm trained on the aforementioned data to assign each user a score between 0 and 1.0, with new users starting with a default score of 0.5.

Evaluating the model

The next step is evaluating the model for accuracy. We use Root Mean Squared Error (RMSE) to check how close the model’s output is to the initial scores we identified in the human-labeled training dataset, and R-squared lets us know how much of the variation in scores can be attributed to the selected features. We also use cross-validation on different subsets of the data to ensure the model is robust and generalizable instead of just “memorizing” what to assign scores to.

The scores are recalculated every week for most users, and for new users (<30d old), scores are recalculated every hour for rapid iteration - because many apps gate behavior based on Neynar scores, it’s particularly critical for quality new users to ramp up quickly or else they risk churning because of an inability to use different apps on the network.

Updating the model

Farcaster is a dynamic network where user behaviors are constantly shifting and adapting. The meta has changed a lot over the last year and a half, from tipping to Frames to Clankers to mini apps to in-app trading, to everything in between and more.

With constantly changing behaviors also means the need for the model to stay up-to-date. A model that might have been effective in January of 2025 could be outdated in October 2025. So while we update scores weekly, we're also always monitoring for new reputation signals and features that we can introduce to the machine learning model itself to make it more accurate alongside the evolving network.

Understanding user quality and reputation is an infinite game. We'll likely never have a "final" model, but we can always focus on building systems that are able to quickly adapt and grow to meet the needs of the network. See one such model update here.

—

As always, if you find any inconsistencies or have feedback on the Neynar score, don't hesitate to reach out to us via Farcaster or our developer Slack - we're always looking to improve the Score's accuracy and value it can offer to the Farcaster community.