Interview with HACE Data Scientist: Andy Lau
Data Scientist Andy Lau speaks to blog contributor Elle Bennett about the work he does with HACE and how data can be used in the fight to eliminate child labour.
Hi Andy, can you give me a brief overview of what a Data Scientist does?
The term ‘Data Scientist’ is thrown around a lot nowadays, it is quite a generic term. What one data scientist does at one company may be completely different to what another does at another company.
I see a Data Scientist’s role as solving business problems using data. Depending on what type of data you are working with or what industry you are working in this could range from building something that predicts what you might buy next given your shopping basket or something that can scan a PDF document to pick out key words and subjects.
So, in a nutshell, a Data Scientist is just someone who works with data and builds creative solutions to answer questions.
What do you do with Data Science at HACE?
Here at HACE my main function is to take data that has been extracted by our Data Researchers and provide insight. This is almost always in the form of some sort of visual representation of the data.
Choosing how to display the data is more art than science. Do I display this data in the form of a bar graph, geographically or something else entirely? Do I split graphs into different segments to highlight a point that my team is trying to make?
Data Scientists also have other roles at HACE. They work on building systems and structures with Data Engineers for the data we collect and collate. We also have some Data Science projects working on building models using technology such as Natural Language Processing (NLP) and predictive models.
Plus we have a team of Data Scientists, Data Analysts and Statisticians all working together on solving complex problems of missing data. Data is so broad, it really encompasses a whole range of different disciplines.
How did you get involved working with HACE?
I got involved by chance when I had a conversation with a networking contact of Eleanor’s. He asked me if I was interested in working with child labour data as he knew I was interested in the concept of using data for good. This led to a chat with Eleanor and then a presentation given by me and as the saying goes, the rest is history.
Did you know much about the issue of child labour before joining HACE?
I knew of its existence, but I unfortunately did not know much more than that. I believe many people are possibly in the same situation that I am and could possibly be able to make a positive difference but simply do not know how. So here at HACE we are raising awareness and informing these very sorts of people.
What insights can data give us on child labour?
The goal here at HACE is to determine the links between other factors and child labour, to ultimately explore the causes of child labour.
For example, does higher education levels correlate with a decrease in child labour levels? If this is the case why do parents not just enrol their children in schools more, what reasons are there for children not to go to school? Does electricity availability in the school or even their home affect schooling? There is potentially a myriad of factors linked to child labour and HACE’s job is to find out which are the most influential.
How can we use data to work toward eliminating child labour?
It is not enough to simply describe the situation of child labour; we also need to make an impact using the data and insights that HACE produces. This means providing the right information to the relevant contacts who can use our insights to make impactful decisions.
For example, we can supply government bodies to concentrate budgets on at-risk factors that we have determined are increasing child labour or to supply major retailors with information regarding the presence of child labour in their supply chain.
Since working with HACE, what has been the most eye-opening thing you’ve learned about child labour?
The eye-opening thing for me is that data regarding child labour is not kept in a central location. It can be contained in multiple PDF files across different years or .csv downloads provided by official bodies.
There also seems to be no logic as to how often data is collected, we have seen data in Tanzania for 2001, 2014 and 2018 with nothing in between. Both of these causes some difficulties, our Data Researchers have to meticulously extract data across several sources each with its own nuances and then we have only a handful of years’ worth of data to work with.