In this interview, Raiha Khan, who recently took up the role of Project Coordinator for the Data for Good Scholars Rights CoLab team, talks about her experience on the project. A Columbia University graduate student in Computer Science concentrating in machine learning, Raiha was selected as a 2021 Summer Fellow and became a Graduate Student Researcher in Fall 2021. 

Rights CoLab: Tell us why you decided to join the Rights CoLab data science project?

Raiha: I am passionate about applying data science to social good initiatives, and so I jumped at the opportunity when I saw it advertised on the Columbia University Data Science Institute website back in July. Being able to use data science to highlight worker voice and agency, and to help investors make informed decisions around company practices is extremely rewarding work as a researcher. So that’s something that I’m really excited to be doing.

Rights CoLab:  What are you working on now?

Raiha: Currently, I’m concentrating on filling the gaps in SASB’s materiality map, what we call “the Addition Workstream,” by working with a database of global news events known as FactSet’s Truvalue Spotlight Events. We are using this dataset to search for events that demonstrate the financial impact – positive or negative – of corporate labor practices across a wide variety of industries. You might expect to find insights like this in financial reports. But the news can often serve to uncover these insights for us from a different perspective. 

Rights CoLab: Tell us more about the data set and how you are using it.

Raiha: FactSet’s Truvalue Spotlight Events – or what we call TVL, for short –  is a global news database of ESG “spotlight events” from 2016 to the present. It’s a good source for us because it provides a broad view of issues that should be on investors’ radars around the world. I worked on the topic of labor conditions in global supply chains by using unsupervised natural language processing (NLP) techniques to build out a list of relevant keywords, to detect co-occurrences of practices, such as “wages” or “precarious work,” and financially impactful outcomes, such as consumer protest, lawsuits, or compensation. Unsupervised learning means we apply algorithms to datasets that are neither classified nor labeled to identify patterns – here, keywords – that surface from the topics the algorithms discover. For example, applying such an algorithm to articles that contain the word “lawsuit” may return other frequently occurring words from these articles, such as “impoundment” or “penalty,” that we can add to our dictionary to discover other supply chain labor incidents tied to legal risk. We are concerned with labor practices whose financial impacts are widespread – that is, they are not one-off events but affect multiple companies in a particular industry. This is how we can demonstrate that that practice is financially material in that industry. We tried other news article databases – one called GDELT by Google Jigsaw. It’s a huge database, which is a plus, but getting the corpus into the shape we need – like mapping our findings to specific industries and filtering articles down to a topic like labor conditions in supply chains – would be a heavy lift.  TVL makes this really easy in the way its news articles are categorized by SASB’s 77 industry standards and are organized around SASB’s General Issue Categories (GICs). 

Rights CoLab: What’s been the most interesting aspect of the Rights CoLab project for you?

Raiha: I’d say it’s investigating company labor practices that can potentially be financially material. I’ve never worked with a news data set before, so it’s been a huge learning experience for me. I wasn’t aware that the private sector’s social impacts are being covered in the news all the time. So being able to surface the connections between company practice and social impact by industry – for example, in the financial services and technology industries —  is actually really, really interesting.

Rights CoLab: Can you give us one major discovery so far that you are proud of? 

Raiha: I recently discovered that the number of companies covered in the news for financially-material labor practices in their supply chain has grown eightfold between 2016 and 2020. This strongly suggests the fallout from the coronavirus pandemic which laid bare the poor supply chain management practices of so many companies. By applying our techniques on news articles to detect co-occurrences of company practices and risks, I have found concrete evidence that unsafe working conditions and lack of transparency pose legal, reputational, and modern slavery risks. These findings can hopefully support the development of accounting metrics under SASB’s “Labor Conditions in the Supply Chain” (LCSC) topic for at least 11 industries that currently have no such standard in place. These discoveries could not have been possible without continuous subject matter guidance from Joanne and Paul and without technical guidance from our full project team!


On April 5, Raiha and Isha Shah gave a presentation on the Rights CoLab project at the Truvalue Labs Academic Roundtable on ESG research under the title “Using Data Science to Surface Evidence for Better Integrating Human Rights into SASB Human Capital Standards.”  Click here to access the recordings and slides.

About the photo: Raiha tells that, “I like to travel, and here is a picture of me on a cruise on the Bosphorus,  a Turkish waterway that forms the continental boundary between Asia and Europe!”