Search by Categories

image
  • January 23, 2026
  • Arth Data Solutions

How Each Bureau Scores You Differently

How Each Bureau Scores You Differently (And Why It Matters More Than You Admit)

The tension usually shows up in a room that wasn’t meant to talk about scorecards.

It’s a collections strategy review.

On the screen: a slide titled “Roll Forward Behaviour vs Score Bands”.

The analytics lead points to a chart:

·         “For this unsecured retail book, we’re seeing higher roll rates in the 750–780 band than we expected.”

Someone asks the obvious question:

“Which score is this?”

The answer:

“CIBIL, for accounts onboarded in the last 18 months.”

A business head frowns:

“But our origination engine also uses Experian in some journeys. And the co-lending pool is on Equifax. Are we seeing the same issue there?”

Silence for a second.

Then the risk team uses the line everyone falls back on:

“Broadly yes. Each bureau has a different scale, but they rank risk in a similar way. If you’re good in one, you’ll be good in the others. Differences are mostly noise.”

The room accepts it, because it sounds reasonable and nobody has time to unpack the details.

Three months later:

·         A co-lending partner pushes back on losses in a segment where their bureau of choice was less optimistic.

·         A fintech partner claims their customers “score fine on Experian, you’re over-rejecting based on CIBIL”.

·         An RBI team asks, in passing, whether the bank has observed any systematic differences in scores across CICs for certain borrower types.

Inside the institution, the working assumption is still:

“Scores differ a bit, but rank order is the same. As long as we set the right cut-offs, it all averages out.”

That belief is more fragile than it looks.

 

The belief: “Different scales, same story – scores are basically equivalent”

When you strip away the technical language, the internal belief in many lenders sounds like this:

“Each bureau has its own scorecard, but they all rank risk in roughly the same way.

If a customer is good, they’ll be good everywhere.

Once we calibrate cut-offs for each CIC, we don’t need to think about it too much.”

You can see why this belief survives:

·         Each CIC sells “generic risk scores” with familiar ranges.

·         Internal decks show neat mappings:

o   “CIBIL 750 ≈ Experian X ≈ Equifax Y ≈ CRIF Z”.

·         Validation reports talk about “good rank-ordering” for all four.

·         Multi-bureau policy slides compress it into two bullets:

o   “Use bureau A as primary.”

o   “Use bureau B/C as secondary with equivalent cut-offs.”

Under time pressure, this is comforting.

It lets everyone behave as if:

·         Scores are interchangeable labels on the same risk.

·         Differences are statistical noise, not structural.

·         Most problems can be solved by moving a cut-off by 10–20 points.

What actually happens in live books is more awkward.

Each bureau’s score:

·         Sees slightly different data about the same borrower.

·         Uses different modelling choices and training histories.

·         Interacts differently with your product mix and sourcing channels.

The result isn’t chaos.

But it is enough to make “scores are basically equivalent” a risky simplification.

Early on, the gap is invisible because:

·         Validation decks are written to show pass/fail, not nuance.

·         Dashboards bucket everything into “<700 / 700–750 / 750+”.

·         Partner and regulator questions are infrequent.

The cost shows up later, in places that don’t have “score” in the title:

·         Portfolios that behave differently from what your score bands implied.

·         Co-lending disputes about “who should have known better”.

·         Questions on whether your multi-bureau usage is conscious or accidental.

 

What actually changes when each bureau scores the same borrower

If you sit with raw files instead of slides, three practical differences start to matter.

1. The input tapes are not identical

On paper, all four CICs are part of the same ecosystem.

In practice, their view of the same borrower is not always the same.

In one internal back-testing exercise we watched, a bank did the following:

·         Took a sample of 100,000 existing customers.

·         Pulled reports from two CICs for each: the primary and a secondary.

·         Matched them on PAN + name + date of birth as best as possible.

They found:

·         A meaningful minority where one bureau had a hit and the other had a thin or no hit.

·         Cases where the number of open tradelines differed by one or two accounts.

·         Small timing differences in DPD streaks – a 30+ that appeared in one CIC a cycle earlier than in another.

None of this was dramatic.

But when they overlaid the bureau scores from both CICs:

·         Some customers sat in “safe” bands in one bureau and borderline bands in the other.

·         The differences were not random: certain channels and geographies showed bigger gaps.

From the model’s point of view, this is not mysterious:

·         If the input view of a borrower is different, the score will differ.

·         Even with good rank-ordering, classification around your cut-offs can shift.

If your internal line is “they all see the same data”, you’ll struggle to explain these patterns when someone eventually asks.

2. The modelling history is not shared

Every bureau has its own:

·         Data partners and anchors.

·         Time horizon of performance data.

·         Portfolio mix used to train and refresh generic scores.

From the outside, these models are sold under similar labels.

From the inside, they are products of different histories.

We’ve seen one lender’s validation deck where the analytics team had quietly written, in a comment box never shown to the Board:

·         “Bureau X score shows stronger discrimination in our salaried urban portfolios.”

·         “Bureau Y score performs better in MFI / JLG segment; likely due to deeper past coverage.”

·         “For new-to-credit digital journeys, both scores are usable but exhibit different stability over time.”

The official slide in the Credit Policy Committee meeting was softer:

·         “Both bureau scores demonstrate acceptable rank-ordering; cut-offs calibrated accordingly.”

Technically true.

Not very helpful when, a year later, someone asks:

“Why did this segment behave worse when we switched primary bureau in the new origination flow?”

3. Your own design choices interact with each bureau differently

Scores don’t live in isolation.

They sit inside:

·         Your origination engine logic.

·         Your policy override habits.

·         Your pricing and limit-setting rules.

·         Your EWS and collection treatment strategies.

In one NBFC, we watched a Product Approval Committee discussion on a new unsecured loan journey:

·         The digital team proposed using Bureau A’s score in the front-end pre-approval.

·         Risk preferred Bureau B for final underwriting due to “more stable behaviour in our vintage curves”.

·         Tech wanted to reuse an existing integration with Bureau C for operational simplicity.

The compromise was messy:

·         Use Bureau A for eligibility,

·         Bureau B for final cut-off and pricing,

·         Bureau C only for specific exception cases.

Nobody in the room paused to ask:

“What does this cocktail do to our understanding of how each score behaves over time?”

A year later, in a vintage performance review, the analytics team struggled to cleanly explain which score had actually driven decisions in different cohorts.

From the portfolio’s point of view, “scores” had become a blur.

From the regulator’s and partner’s point of view, the institution looked less in control of its own tools than its slide decks suggested.

 

Why this remains invisible in most dashboards

If each bureau’s score behaves differently in practice, why doesn’t it show up earlier?

Because most institutions don’t track the differences in a way that can be seen.

Validation reports are written to close an item, not open a conversation

When a new score or bureau is adopted, a validation report is produced.

The report usually shows:

·         Gini / KS statistics

·         Bad rate by score band

·         Population stability indices

·         A few charts with smooth downward curves

The Model / Score Validation Committee minutes typically say:

“Validation completed. Score demonstrates adequate discriminatory power and stability. Approved for continued use.”

What that report rarely does, in a way that reaches decision-makers, is:

·         Compare two bureaus’ scores on the same population side by side.

·         Show where classification around the cut-off differs.

·         Highlight segments where score behaviour diverges between CICs.

Those pieces may sit in an appendix.

They rarely become part of the institution’s working understanding.

Dashboards bucket scores into broad bands

In regular portfolio and risk MIS, you’ll see:

·         Exposure by score band: “<700”, “700–750”, “750–800”, “800+”.

·         Sometimes, a split by bureau: “CIBIL score band vs GNPA”.

This compresses a lot of behaviour into a few bins.

Missing:

·         Accounts that are 750+ in one CIC and 710–730 in another.

·         Score drift over time for the same borrowers across CICs.

·         Interaction with source channel (DSA vs digital vs branch) and product.

At that resolution, the natural conclusion is:

“They’re all behaving similarly enough. Let’s not complicate this.”

Until a partner or RBI shows you a sharper view and asks why you haven’t looked at it that way.

No one is mandated to own “cross-bureau score behaviour”

Responsibility is split:

·         The risk modelling team owns validations.

·         The credit policy team owns cut-offs and usage.

·         The tech and operations teams own integrations.

·         Procurement owns commercials.

There is rarely a named task like:

“Once a year, tell us how each bureau’s score sees our portfolio compared to the others, and what that implies for decisions.”

So nobody does it in a structured way.

The belief that “scores are basically equivalent after calibration” remains unchallenged, not because it was tested and passed, but because it was never tested properly.

 

How more experienced teams quietly deal with score differences

The institutions that look less surprised in these conversations don’t have secret bureau models.

They just accept that:

·         Scores are not identical lenses,

·         Their own choices amplify differences,

·         And they need a deliberate view of what that means.

A few behaviours stand out.

They build a simple cross-bureau view for at least one important book

In one bank, the CRO asked the analytics team for something very specific:

·         “Take one meaningful retail book.

·         For a decent sample, pull scores from two CICs.

·         Show me, on one page, how the bands line up and where we’re seeing materially different classification.”

The resulting internal note (never shown outside risk) contained:

·         A scatterplot of Bureau A score vs Bureau B score for the same accounts.

·         A table showing, for the band Bureau A called “safe”, how many accounts Bureau B placed in borderline territory.

·         A few comments on which channels and segments drove most of the disagreement.

The conclusion wasn’t dramatic:

·         “Rank-ordering is acceptable.

·         There is a non-trivial pocket of customers who are treated differently depending on which bureau we lean on.”

The important thing was not the numbers.

It was that the bank could now answer, with some honesty, when a partner or supervisor asked:

“Have you observed any systematic differences in score behaviour across bureaus?”

They are explicit about which score is in control for a decision

Instead of letting architecture decide by accident, some teams document things clearly:

·         For each major segment, they state:

o   Primary bureau score used for onboarding.

o   Any secondary bureau being used, and how conflicts are resolved.

o   Whether score overrides are allowed, and on what basis.

This shows up in:

·         Credit policy documents.

·         Product notes.

·         Sometimes, the sanction note template, where a field indicates which CIC’s score was relied on.

It doesn’t make the models perfect.

It makes the institution less likely to discover, after a problem, that nobody quite knew which score was driving decisions in a contested case.

They treat score differences as a diagnostic, not an embarrassment

In one NBFC, the analytics team produced a small, recurring “score mismatch” report:

·         For a sample of accounts, they compared scores from two CICs.

·         They flagged accounts where one score was above the internal cut-off and the other significantly below.

·         They looked at how those accounts actually performed.

Sometimes this reinforced confidence:

·         “Even when CIC scores differ, our internal behaviour score caught the risk.”

Sometimes it raised questions:

·         “This pocket of customers looks fine in our primary bureau but worse in secondary; we should check our reporting and exposure to that pocket.”

The report was not used to criticise any CIC.

It was used to refine their own understanding — and occasionally, their reporting quality.

 

A quieter way to think about “how each bureau scores you”

It’s tempting to keep the belief that:

“Each bureau’s score is just a different scale on the same risk.

Once we set cut-offs per CIC, we can treat them as equivalent.”

If you stay with that, your processes will continue to:

·         Bundle scores into broad bands.

·         Use whatever score the integration wiring makes easiest.

·         Defend differences as “noise” when challenged.

You will still pass most validations.

You will still sound convincing in short meetings.

But you will also be living with blind spots:

·         Portfolios where another bureau’s view of the same customers would have told you something you didn’t want to see.

·         Partnerships where the other side is quietly using a different mirror and trusting it more.

·         Conversations with supervisors where they assume you’ve thought about cross-bureau behaviour more deeply than you actually have.

If you accept that:

·         Each CIC’s score is a different lens shaped by its own data and history.

·         Your own integrations, policies, and overrides interact with those lenses in specific ways.

·         It is your job, not the bureau’s, to understand what that means for your book.

then the question changes.

It stops being:

“Have we calibrated the cut-offs for each bureau correctly?”

and turns into something a little less comfortable:

“If we took a honest look at how each bureau scores the same customers we already have,

would we still be as confident that ‘it all averages out’ —

or would we see patterns we’ve been too busy to admit are there?”