In this exploration we want to analyze the free text summary written for each incident. For the process of finding report similarity, we first used the TF-IDF and Cosine Similarity algorithms to find similarity between all the incident reports. Then for our analysis we consider two reports to be similar only if their threshold is 50% or higher. To gather important words across report clusters, we first find the three most important words in each of the reports. This was found using TF-IDF (Term Frequency, Inverse Document Frequency) scores. Then for each cluster of nodes we gather all the important words from each node and take the top three scored word. The font size of the words is determined by the frequency of occurrence in the cluster.

By clustering the incident reports in this method we can see that most similar incidents that were reported in metro stations, administration buildings were hoax or false reports which leaded to no causalities.

Click on a the words, nodes or the legend to show the text reports for the incidents.
Change Report Similarity(%):