How we built the ESG/Coronavirus Semantic Map

August 20, 2020
Image for post

“In greater numbers and at greater speed, environmental, social and governance (ESG) issues are becoming financially material. Thirty years ago, when climate change was established as a scientific fact, most investors did not consider it as a substantial investment risk. Today, in light of mounting evidence, activism and regulation, investors are including climate considerations in their investment decision‑making.

[…]

This paper argues that, in the coming decade, identifying the issues that are not material today that could become so tomorrow will be a capability investors cannot do without. To win in the coming decade, investors and companies must equip themselves with forward‑looking and proactive approaches to materiality. This paper offers a framework that provides investors with guidance on the signals to look for to better identify dynamic ESG issues and to incorporate them into the process of portfolio construction, security selection and stewardship.”[1]

World Economic Forum and BCG

On April 2nd we published a map that scraped 200,000 tweets from March 2020, you can learn more about why we built that here. This article will specify how we went about collecting data and building our coronavirus semantic web.

Data Gathering

We began our data search looking for the hashtag “#coronavirus” then identifying other tags in the tweets we obtained. We can refer to the other tags in the same tweet as a tweet containing “#coronavirus” as the neighbouring tags of “#coronavirus”, or the neighbouring nodes.

Example:

Input text: “#coronavirus is a global #pandemic posing #health threats.”

Output graph:

Image for post
Image for post

We then searched, for each neighbouring tag, for the neighbours of that tag. We did this for several iterations over multiple days throughout March 2020, forming a raw graph of tweets that share semantic context or meaning, through the co-occurence relations between hashtags in explicit relationships that form a graph[4]. We believe that such a graph allows us to understand social media’s view of ESG in relation to Coronavirus, because hashtag networks produce similar results as all-word networks. Hashtags therefore capture the semantic context of tweets so it is relevant to study their structure and relations[5].

Grouping

In order to map the raw social media graph to ESG keywords, we analysed the hashtags we have found (the nodes) and we grouped hashtags together by topics, using topic classification and spell checking techniques to clean up the raw data. We then displayed a graph of coalesced tags, where all tags falling under the same topic are coalesced into one node, e.g. #coronavirus, #covid19, #coronaoutbreak, #fightcovid19 would all fall under the topic Coronavirus. Links between a node that has been coalesced and all of its neighbours are now moved to links between the topic node (that the old node coalesced into) and all neighbours of all tags coalesced in that topic. Self-referential links (e.g. link to a neighbour coalesced in the same topic) are excluded.

Below are the keywords included for E, S, and G:

  • Environmental: airpollution, airtravel, cleantech, climate, green, oceans, onlineclimatestrike, renewable, scientists, smoking
  • Social: basicnecessities, childcare, dailylife, delivery, education, fitness, healthcare, humanrights, mentalhealth, personalsafety, pets, publichealth, sustainability
  • Governance: advocacy, cybersecurity, employees, financialsystem, government, grants, healthcaresupplies, healthcareworkers, innovation, job, nationalemergency, outbreak, politics, socialdistancing, work, workers, workingfromhome

Below there are 3 examples of groups of keywords coallesced together, from the Social and Governance themes:

  • “Employees”: employee, labor, layoffs, pay, sick, strike, time off, unemployment, union, worker
  • “Healthcare”: cancer, cdc, doctors, health, healthcare, helpthenhs, hospitals, hospitalsnhsheroes, medical, medicalimaging, nhs, nhsheroes, protectthenhs, standwithhealthworkers, telemedicine, universalhealthcare, vaccine, vaccines, who
  • “Healthcare supplies”: emergency services, facemask, gowns, healthcare supplies masks, kits, mask, masks, medical equipment, sanitizer, shortage, shortages, ventilators

Filtering

We then filtered out topics unrelated to ESG keywords (we removed less than 50% of keywords after topic classification and node coalescing) and we then set a display filter so that the graph is not too large to be readable, where we only showed the links of significant weight. Link weight was estimated based on the number of texts that contain each link, with bias added for ESG topics in order to reveal the most prevalent ESG issues that were being discussed in relation to coronavirus on social media.

The font size of each keyword is relative to the number of tweets and retweets of hashtags related to that topic, and the color legend shown above the graph explains which issue a topic maps to, whether it is environmental, social or governance. Some topics go across two letters, for example “Healthcare Workers” maps to Social and Governance, and “Online Climate Strike” maps to Environmental and Governance.

Conclusion

The texts we have scraped, as well as the outcome, represent a small sample set of the discussions on social media about ESG issues in relation to Coronavirus. We believe that this data aggregation technique gives us a good level of confidence about discussions in the social media space in relation to Coronavirus, as it aligns with major publications[1],[2],[3] from early April showing that ESG scoring was an indicator of better net returns in Q1 of 2020.

The WEF/BCG report[1] in particular highlights stakeholder activism fuelled by social media as one of the 4 core components of their framework for assessing financial materiality of ESG issues. Analysing social media discussions on ESG issues can then provide insight into financial materiality. Starting from works of [1], [4], [5] we can continue towards building a prototype for contextual ESG data analysis. For example from this initial mapping we can construct a Hashtag Graph-based Topic Model[4] and apply link prediction techniques[5].

You can view our ESG/Coronavirus Semantic Web map here. To learn more about what we are building visit our website.

We appreciate claps ;P

[1] WEF/BCG: “Embracing the New Age of Materiality Harnessing the Pace of Change in ESG”

[2] Bloomberg: “Coronavirus Is Shifting the Focus of Leading ESG Investors”

[3] Financial Times: “Big data shows Covid-19 reshaping ESG”

[4] Y. Wang, J. Liu, Y. Huang and X. Feng, “Using Hashtag Graph-Based Topic Model to Connect Semantically-Related Words Without Co-Occurrence in Microblogs,” in IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 7, pp. 1919–1933, 1 July 2016

[5] Martinčić-Ipšić S, Močibob E, Perc M (2017) Link prediction on Twitter. PLOS ONE 12(7): e0181079. https://doi.org/10.1371/journal.pone.0181079