Embarking on Machine Learning Journey – The 5 things you must look out for from a Data Scientist point of view
When introducing Machine Learning into your organization’s processes, it is imperative to have a thorough implementation strategy, as well as a calculated data management strategy.
GDS Link’s Chief Data Scientist, Florian Lyonnet, compiled 5 Best Practices to ensure your organization is leveraging its machine learning efforts into tangible action plans.
1. Focus on what you want to achieve
ML models have applications all around the credit life cycle. At GDS Link we have built several models using ML for loan origination fraud models, default models and improved policy decisioning, customer management, and collection. Another area we have found benefit is using ML technique to build derived attributes and factor down the number of data points. However, if you are looking to get started with machine learning, I recommend concentrating on the low hanging fruit; for instance, starting with originations might not be the quickest way to get internal buy in as uncertainties from the regulator around adverse actions will most likely slow acceptance. On the other hand, areas such as fraud or customer management might be better options as they don’t require the same level of adoption.
2. Invest in Your Infrastructure
It is critical to think about the infrastructure before embarking on the ML journey. For one thing, you want to make sure your model governance, deploying, monitoring, & re-training, etc., is efficiently set up and that you won’t have any major slowdowns in the process. The benefits of multi-variate lie in the simplicity of the coding but the lift in performance in ML creates a strong business case to support the investment.
Machine Learning model vs traditional methods tend to deteriorate through time such that it is important to re-train them as soon as changes are detected. Now, the good news is that ML models are easier to re-tune than traditional scorecards which means it takes less time. This gives you an opportunity, provided you have the right infrastructure in place, to iterate through model versions on a much quicker pace. For instance, using GDS Link makes deploying new models as easy as a drag and drop operation.
Being able to detect these changes in a timely fashion is also crucial so that the decision to re-tune a model can be taken as soon as the first signs of degradation are observed. Having proper analytical tools around monitoring your models will pay itself back on the long run because the time you lose with models underperforming adds up over time.
3. Select Agile Language
Though some might argue that the language you are using to develop your models does not matter, it is quite important and there are a few questions that you must ask before selecting one. Technological choices are too often guided by no other argument than this is the skills/software we have in house or have been using for the past decade. Machine Learning model building requires a language that is flexible and powerful, with a large community supporting it and producing high quality libraries. The main points to have in mind are the following:
– What ML algorithm am I going to use? For instance, if you plan on using Neural Networks vs Ensemble tree you might have different needs.
– How do I plan to productionize these models? Here, ask yourself if you are going to use the native language to deploy these models or will you be exporting them in a standard format like PMML. If the latter, you need to think about the export capabilities of the language/library to this format.
– The algorithm is only a subpart of the problem, you also need to think about where you will execute the code that creates the variables that feed these models. Often, data scientists rely on libraries that can manipulate multidimensional arrays and that implement functions like sorting, filtering and array operations. Translating this logic into another language might be time consuming, prone to errors and overall limit your capacity to quickly create new models and deploy them. Again, choosing the infrastructure that will allow you to run natively these transformation functions will be a huge boost to your productivity.
4. Build /Acquire/Outsource Expertise
With the right infrastructure, the entry bar to creating ML model has been lowered over the past few years. It does not mean that anyone that can should build models for critical applications such as credit decisioning in general. As easy as it is to use modern libraries in Python or R, there are a multitude of subtleties that need to be mastered and understood to be able to make informed decisions and build models that will perform well once in production and deliver the expected results. We strongly believe that the initial set up of ML model building is an involved process were human critical thinking is key in achieving robust results, however re-tuning should lessen the burden and speed deployment. Therefore, we recommend building expertise in house or partnering with experts in the domain that will be able to bring the expertise of having built 100s of models.
5. Data vs Algorithms
Across the different projects that we have delivered at GDS Link and based on our expertise working with ML models for years, we have concluded that in most cases data is more important than algorithms. By this we mean that the potential lift that you can get by adding more data points to describe your applicant for instance is substantially higher than the gain you will get by fine tuning your hyper parameters to the third decimal or using ensembles of models (where explainability becomes a challenge as well). Said differently, your time is best spent in engineering features that uncover the nature of your problem than scanning hundreds of models.
A good example of this is the product GDS Link and TVD Partners launched recently, and which consists of an extensive library of bank transaction attributes that can be leveraged in a multitude of different situations. The lift provided by adding these advanced engineered features to your models can be as large as 20% in some cases. To learn more, read our white paper on the impact of Bank Transaction Attributes in credit decisioning.
This also highlights the importance of having a platform that gives you direct access to hundreds of data sources as well as partnering with data experts.
ML has come a long way over the past decade, and it is easier than ever to leverage this market changing technology to improve all stages of credit decisioning. GDS Link offers analytics as a service with full support for the complete model lifetime from model building to deployment, monitoring your models and re-tuning them when necessary.
About the Author
Florian Lyonnet, Chief Data Scientist
Florian holds a PhD in Theoretical Physics, and boasts 8 years of experience building analytical solutions, using standard analytics, machine learning and optimization. Florian has developed predictive models for a whole range of business problems with a strong focus on credit life cycle.