Explore Google Search Data
Explore result of Google Search API Queries
When looking at the list of documents that corresponded to one of the 4 topics, the hope is to find a way to distinguish the different documents given the vocabulary associated with each topic. Before the search data was processed, the raw data contained all sorts of noise, including stop words, versions of words, punctuation, and numeric characters. After processing the data (See Data Cleaning), each of the sets of the documents can be visualized using a wordcloud. A wordcloud is a simple tool that visually allows the user to see which words show up more often. So for Covid, besides "Covid" being important, pandemic, global, nutrition, and supply are seen in larger text, implying they show up more often. Under the locusts tab, there are several words that show up that are important, including harvest, swarm, desert, and crop, all pointing towards the impact locusts have. Using a simple wordcloud, it is clear the data is cleaner than the raw data and can be used in applications like clustering.