Virtue and value in mobile operator big data

Marissa Dean

This guest blog was jointly authored by Marissa Dean (FiDA) and Jake Kendall. Jake Kendall is the Director of the Digital Financial Services Lab (DFS Lab), ​an early-stage accelerator delivering innovative fintech solutions to the developing world.

Mobile network operator (MNO) data has been the first bloom of big data’s potential on the African continent. Widespread adoption and use of MNO services underpin the belief that patterns in data can reveal ability and willingness to pay for financial services, as well as demographic and other segmentation signals.

With this in mind, the Mastercard Foundation Partnership for Finance in Digital Africa (FiDA) interviewed 30 leading financial services organizations to understand how they think about big data and analytics and how they use (and don’t use) big data. Of the 30 organizations that use big data, more than half agreed to share details about the types of big data they used in their business. This blog post focuses on how these organizations use and perceive MNO data in conjunction with lessons learned from the DFS Lab’s three cohorts of FinTech startups, many of which also use this kind of mobile data.

The key learnings from the interviews with the organizations and startups are:

  • The risk of fraud likely outweighs credit risk in unsecured lending;
  • Mobile behavior data has been heavily hyped but has significant limitations (discussed in detail below)
  • More specifically, mobile operators don’t share behavior data freely, and for good reasons;
  • While credit reference bureaus are adapting, they are nevertheless creating an unintended consequence of blacklisting many people for trivial default amounts.

Fraudsters, not farmers, are the real risk when it comes to unsecured lending.

Can data points like the patterns in the frequency and amount of mobile voice or data top-ups actually demonstrate an applicant’s ability to pay? Arguably, providers that use mobile behavior data this way are not actually looking at hundreds of micropayments to assess whether the noise indicates this individual can pay back 100 Kenyan Shillings, but rather performing Know Your Customer (KYC) analytics to suss out whether the applicant acts like a real person or like a fake account created by a fraudster. Indeed, premeditated fraud (especially “at scale fraud” conducted by criminal syndicates) may be a greater threat to the business model of digital lending than default by applicants who had intended to pay but run into financial difficulties after taking the loan, particularly for small denominations. As discussed in FiDA’s Focus Note, one digital lender relying on MNO KYC data encountered an instance of fraud wherein more than 100 loans were taken out by a fraud ring using the same name of a popular fictitious character for every SIM card registration. Another research participant noted that as more people become aware that mobile money, data, and voice behaviors are factored into credit decisions, the data becomes distorted as people figure out ways to game the system.

As algorithms based on behavioral data become more prevalent, providers will find themselves constantly playing a game of cat and mouse with fraudsters who want to take advantage of the all-digital nature of these products in order to scale successful fraud schemes to high volume (this is the flip side of scale, which is normally thought of as a way to scale successful products to high volume). There is money to be made and it’s likely that  fraud can’t be eliminated entirely.

Despite the excitement, there are many limitations to mobile behavior data.

There are a number of reasons why mobile behavior data does not give a holistic picture of an individual’s spending or device behavior.

One factor is that many low-income customers carry multiple SIM cards to take advantage of the different promotions offered and reduce their expenses from making off-network calls. They also turn off the data plan, wifi, or GPS to save battery power; delete heavy apps to conserve limited space; or use feature phone versions of apps like Facebook Lite that capture less data. Further, low-income people, especially women in rural areas, often share phones, further limiting the data’s usefulness. For example, although 93% of Kenyans had access to a mobile phone in 2016, only 78% owned a phone implying many were sharing phones owned by others.

Second, mobile behavior offers only a limited view of someone’s financial life. Most transactions in the informal sector are still conducted using cash, even in countries with successful mobile money schemes like Kenya. Therefore, relying solely on mobile money data means missing significant information about an individual’s income and expenditures. Daniel Goldfarb from Lendable mentioned these challenges in an interview with FiDA conducted in August 2017: Just looking at M-PESA only shows you a small percentage of someone’s total transactions. So M-PESA only gives you a small band of information. This works for MNOs who are offering very small loans, but the second you try to give out larger, meaningful sized loans you can’t just rely on M-PESA data to understand someone’s cash flows.

Changes in the environment or in MNOs’ own systems or marketing schemes will change the relationship between the real variables of interest that lenders want to predict (e.g., income or free cash flow) and information gathered from mobile behavior that might be used to build such models (e.g., top up behavior or numbers of contacts or calls). Thus, models need continual rebuilding.

Finally, machine-learning algorithm-based models can employ data on mobile activity, airtime top-ups, and online behavior to score, for example, the probability of credit default on a statistical basis. However, there are some instances where well-intentioned machine learning algorithms may discriminate against a particular group,such as women or an ethnic minority. These specific algorithms are trained to recognize and leverage statistical patterns in the data. So, if the data used contains bias or historical discrimination, the model will incorporate it into future predictions, thus potentially creating an unfair system that perpetuates the same bias and discrimination.

Machine learning in financial services is a nascent but growing field, and identifying algorithmic unfairness should be a priority in the development of future models. Recently, the Federal Reserve System in the US offered FinTechs an evaluation framework to guide their early thinking on the use of alternative data and fair lending risk. They argue that alternative data that doesn’t have an obvious link to creditworthiness may have a higher probability of fair lending risk, and unfairness (in algorithms) can only be detected by very careful analysis of the data and outcomes.

MNOs haven’t been proactively sharing data with the ecosystem.

The Cambridge Analytics data fiasco combined with Facebook’s complicity and complacency regarding how its software treats personal data have brought concerns around privacy and personal data usage to living rooms and boardrooms alike. What was legal yesterday is considered unethical today. The spectre of being sued or shutdown is looming globally as consumers become more aware of the reality of what using free apps and services actually means.

To date, MNOs have been very careful about sharing data. In interviews, they expressed that they were prohibited from sharing any kind of raw data due to regulation: their interpretation was that they could not give individualized information to third parties without consent from both the individual and the regulator. Further, they believe that selling raw customer data would erode hard earned trust with the customer,  especially in light of recent events.

Often the primary generators of data (i.e., customers) don’t know what kind of data they are generating and how it is being used, as the Facebook episode demonstrated. Moreover, the proliferation of relatively accessible digital credit providers, with more than 20 in Kenya, coupled with the advent of digital data trails created by smartphones, has given rise to consumer protectionist movements, such as the SMART Campaign, to advocate for  practices that protect consumer data by providing clear customer opt-in and -out services for data usage, mining, and reuse of data by third-party services. Such campaigns also push regulators to establish and enforce legal frameworks that safeguard financial consumers’ welfare.

Credit reference bureaus are starting to receive information from alternative digital lenders.

Whether credit reference reporting by digital lenders is helping or hurting low income individuals is debatable. Certainly, if managed properly,positive reporting particularly could help previously excluded people build credit records. However, anecdotal reports indicate that many digital lenders are not reporting positive data to CRBs, although it is not always easy to determine which are reporting and what they are reporting.

Additionally there is a growing risk that many individuals will be blacklisted for very small defaults. Yet, arguably, such defaults should be treated differently and forgiven more quickly because they  usually stem from circumstances other than intentional default or fraud. A case in point is that over 400,000 Kenyans are blacklisted with the CRBs for outstanding mobile loans of less than 200 Kenyan Shillings (about $2).

Where does the digital finance community go from here?

Despite obvious privacy concerns, the sector shouldn’t be over-regulated. Digital lending on the back of alternative data is still a relatively new experiment, and many are still hopeful that the business incentives will drive providers to target farmers and small businesses with attractively priced credit. A light-touch approach, such as personal data privacy standards and recourse for people who have been blacklisted for low value loans, would help to make the market more predictable and tell providers exactly what they need to do to comply.

For MNOs, it would be interesting to see more providers exploring how they might employ differential privacy techniques, so that the learnings that come from mobile behavior patterns can be shared ethically and responsibly with the DFS community for product development or market research, and with the development sector more broadly, without exposing the privacy of customers. While mobile behavior data will never reveal a complete picture of life, there still is utility in generating learning that can be used to better tailor products. Moreover, MNOs should focus on transparency with customers, particularly clear terms and conditions for customers who do want to share individual data with credit providers. More broadly, the sector would benefit from MNOs agreeing to standards for data access and data sharing with third parties to assess credit risk.

For Fintech digital lenders, there are several considerations: first, the amount of energy expended to use complex techniques to determine default risk due to lack of income versus fraud; second, finding the right threshold and pattern for reporting positively and negatively to credit bureaus; and third, building algorithmic models without statistical bias.