It’s not rocket science, it’s data science

AnalyticsNew York

I’m Habib, and I work as a Data Scientist in our New York office. In this blog post, I’ll give you a feel for what working in analytics is like at TransferWise, some general problems common to the FinTech world, and share some of my experiences here that highlight how we use data to drive solutions to some of our most important and challenging problems.

I joined TransferWise three years ago as our first Data Scientist.

Since then, I’ve been trying to figure out how to help solve some of our biggest problems with data. Over the past eight years, millions of our customers have made many more millions of money transfers, which makes us very ‘data-rich’.

With this data, we’ve realised we’re able to predict things related to our customers’ transfer behaviour that helps us to optimise our service. For example, we’re able to reasonably predict when a customer will need to transfer their money, the amount they’ll want to send, and where they might be sending it. Understanding this behaviour helps us to prepare for our customers’ future activity using our service. We can ensure we’re holding money in the right markets today, that will cover the needs of where our customers will ask us to send their money to tomorrow.

 

These are the kind of problems I spend my days getting stuck into. Problems, that if solved, can help us create a better experience for our customers. When we’re ‘data-rich’ meaning, we have loads of data that is directly relevant to the question at hand – the data acts as fuel for our solutions.

Although my role often gets technical, there’s plenty of creative thinking involved in identifying problems that can be solved effectively with data, designing the solutions, and finally in figuring out what information is relevant to the problem. I often describe my work as running a school where the students are all computers, the learning algorithms are the teachers, and data is the curriculum.

When it’s all a bit random.

There are plenty of opportunities for data-powered solutions at TransferWise, yet it’s the ones with a degree of randomness that machine learning helps us with the most.

Continuing the example above, each of our customers has different needs and uses our service in their own specific way. This makes it trickier to make predictions about transfer behaviour, although we can detect patterns in the data we have. The nice thing about being ‘data-rich’ is that we can use machine learning to detect underlying patterns and detect new ones as they present themselves in the data – which often forms the basis to solving some of our most significant problems. We’re generally trying to find patterns associated with specific outcomes, and the context (from which the outcomes arose) – allowing us to predict future outcomes based on the current (observable) context.

Solving problems using machine learning.

We’ve found several applications for machine learning at TransferWise that support our product pillars of convenience, speed, coverage and price.

Using our data to recognise and predict patterns in customer behaviour has directly helped us solve a number of problems, including:

  • Predicting future customer behaviour
  • Estimating transfer delivery speed
  • Detecting system abuse
  • Automated verification
  • Identifying customer confusion
  • Designing chatbots for contact resolution

Project overview: building a customer retention model.

Problem overview

One of the most interesting projects I’ve worked on to date involved building a customer retention model that helps us determine which customers have likely stopped using our service, meaning they’ve ‘churned’ (and which customers are still actively using us for their international money transfers).

This problem is often very clear-cut, especially for subscription-based services. For example, in telecommunications, providers are notified when a customer decides to cancel their account(s), or when they stop paying their subscription fee. This gives a clear signal to telecommunications companies that the customer is no longer interested in using their service.

The challenge for TransferWise is that when a customer stops using our service, they don’t inform us of their decision to do so (for example, by deactivating their account). Instead, the only signal we typically receive is an absence of transfers in our data from that point onwards. In addition to this, because each of our customers has a unique need when it comes to transferring money, we don’t want to force-fit a definition relating to how regularly an ‘active’ customer (i.e. a customer we believe to still be using our service) should/would have made money transfers in the past. Some customers need to transfer money every day, while others only need to do so every now and then. Our goal is to be the service our customers choose to use when the need to transfer money internationally arises.

Investigating customer churn to understand net growth

When at TransferWise, we say that we’re growing our customer base, we want to be sure that the growth is real. What we truly care about in terms of growth, is in fact, net growth. In other words, whether the new customers we’ve acquired outnumber the customers who’ve stopped using our service.

It’s simple to see how many new customers started using our service, but we can only be sure that we’re growing our ‘active’ customer base if we can quantify how many of them have stopped using us. The secondary benefit of solving this problem is that being able to identify the churned customers puts us in a position to try to uncover the main reasons that they no longer require our services.

This project aimed to accurately flag customers who’ve stopped using our service as quick as possible. The rationale being that if we could solve this, we’d be able to get a clearer view of our growth, and take measures to improve it by addressing the reasons that cause our customers to use another provider for their money transfer needs.

Understanding customer churn

We designed a solution involving a model that predicts the time until a customer’s next transfer attempt – more specifically the model predicts the probability distribution of the timing of when a customer’s next transfer attempt will arrive. To do so, the model estimates the parameters of a Weibull distribution which control the shape of the distribution function that serves as our best estimate of the probability distribution of time to next transfer attempt.

distribution approach infographic

The intuition behind using this to predict churn is that we can use the model to identify customer accounts where we don’t expect a transfer attempt in the foreseeable future, and we can use the predicted distribution to figure this out for any customer at any time.

The varying levels of customer churn

With our model, we can assess the ‘degree’ of churn (the event in which a customer stops using TransferWise) of any customer at any point in time.

customer churn graphic

This chart displays the varying levels of customer churn – the shading of each block represents how ‘churned’ a customer (each row is a customer profile) is in a given month.

  • The diagram shows churn classification status (each blue tile) of a small sample of individual profiles (each row)
  • The dots show the last observed event (registration or latest transfer attempt)
  • The t_25_window (which is reflected by the colour of the tiles) indicates the number of days into the future in which the model believes there’s a 25% chance of a customer’s next transfer attempt occurring. For example, if t_25_window = 10 days, the model believes there’s a 25% chance that the user will make a transfer between now and 10 days time (conversely, there’s a 75% chance that they will not make a transfer during that time frame)
  • The lower the t_25_window value is, the sooner we expect the next transfer attempt to arrive and vice-versa

At this point, we can then choose between a variety of different churn definitions, and try to find one that is optimal based on our criteria (recall we want it to be accurate, but also time-sensitive). Most definitions have their merits, as you generally need to sacrifice time-sensitivity for accuracy – overly aggressive churn definitions result in many customers being (erroneously) flagged as churned too soon, while overly accurate definitions generally require us to wait a long time before we’re confident enough to consider them as churned. Thankfully we can analyse the performance of each definition based on how it would have performed on past data (backtesting) – at which point it’s up to us to choose the definition which best suits our needs.

cutoff and confidence stats

Using the output above the 10_pct, 50-day cutoff definition achieves an error rate of 46% with an average delay of 225 days. This definition, in plain English, would read:

“We define a customer as churned the moment our model suggests there’s a 90% chance that they will not make a transfer attempt in the next 50 days”.

The result: a new way to look at growth

Once we’ve classified users into ‘churned’ and ‘active’ groups, we can calculate net growth, which reflects the growth of our active customer base. Based on the example definition, the model suggests that our active customer base has grown over time. In contrast to the standard view of customer growth (where we simply consider the total number of customers), we can compare how many of our customers are considered ‘active’ vs ‘churned’.

 

Empowered by autonomy.

Compared to the rest of my career, what sets working at TransferWise apart is the autonomy I experience here. At TransferWise, we work in autonomous teams – meaning that each team is responsible for deciding which problems they focus on solving, and the order in which they do so.

It’s not always easy. Although working in autonomous teams allows me to prioritise the problems I’m focusing on, the same goes for other teams. At times it can be challenging to justify why another team should support you on your project and spend time implementing your models or even using them in the first place. Thankfully, being a data-driven business means that people are quite data-savvy, and receptive to data – so being able to show supporting data is very helpful when trying to enlist support from other teams.

So, while working in this way might be challenging at times, I find it a huge advantage, especially in an analytics function. I can explore our data, size up various problems and prioritise them, and design tailored solutions to them.

Similarly, at TransferWise, you have the freedom to choose the technology you want to work with in building out your solutions: Python, Java, Spark, R, essentially whatever you’re most comfortable with. Recently we began hitting the limits of our previous database, so we’ve just migrated to Snowflake, which processes large amounts of data very quickly. So it’s safe to say, we’ve evolved based on our needs and we value making data easily accessible throughout the business.

Challenged by hyper-growth.

As with many fast-growing businesses, we occasionally experience growing pains as we scale. It’s a nice problem to have, but it creates some challenges. The customer retention model outlined above was built at a time when TransferWise was very different – our customers just used our ‘send money’ product to send money across borders. Today, we have a suite of products, including our newly released Borderless accounts and debit cards – allowing customers to store balances in a myriad of currencies and get a great exchange rate wherever and whenever they spend money around the world. As a result, the behaviour of our customers has changed, and there’s plenty of new data (that previously didn’t exist) that we should be considering.

While the old model was not built to know about these new behaviours; it now has to make sense of new behaviours. So we’re constantly adjusting our models as the business grows.

It takes a lot of work to keep up with our business and product changes and to figure out how to come up with flexible solutions and models, that can be adjusted promptly as and when the need arises. But we really value our data here, and are afforded the time and resources needed to build out effective solutions with a long shelf life. At TransferWise, we believe data makes us wiser, and the wiser we get, the wiser use our data to power the solutions that cement our position as the world’s best money transfer service.

Interested in helping us to build the future of finance? We’re hiring Analysts in Budapest, London and Tallinn. Click here to find out more.