The goal of this project was to demonstrate the capabilities of extractive summarization for making it easier for UX researchers to analyze feedback through summarization of transcripts into several key sentences that may be of importance.
It is expensive to do post-interview analysis.
How can we speed up research analysis and allow research to collect more findings instead of analyzing?
We initially used a naive approach and count lengths of each string. The idea here is that the longer the quote is the more likely that it is significant. We also filter out strings to only include those that mention relevant terms, like “app”, “wellness”, etc...
Our next approach uses the Gensim library which is a fully developed NLP backend specializing in extractive summarization. It is very simple to implement and use, and there are possibilities of fine-tuning the model if necessary. Learn more about Gensim here.
Our last approach was to use Spacy. Spacy does not come with a built in summary function, so we must make an inference based on word frequencies and scoring the frequency of words. We then extend that to sentences to generate our summarization. The idea is that if a sentence is related or similar to a large number of other sentences then it is a good candidate to summarize those sentences. We received sentence scores that basically adds up the scores of the word frequencies. The higher the score the more likely that it is a similar sentence to all the others making it a candidate for a summary sentence. We then take the 50 largest scores and insert them into a list. This essentially takes the 50 sentences that are most similar to the rest of the transcript.