by Dr. Steve Poulin

Text analytics can be extremely valuable for companies. Previously we discussed how using text analytics on survey questions could be particularly useful for companies. Text analytics techniques have the power to reduce the amount of time that it might take to analyze results and produce more consistently reliable results. By converting unstructured data into fields that can be used for data analysis, text analytics produces new value from open-ended questions on surveys that have in past been used to improve survey questions or collect customer anecdotes. 

Recently we worked with a financial lending company that conducted a monthly survey to help find insights in their data. The survey included two key open-ended questions:

  • What can we do better?
  • What were the primary reasons that you were dissatisfied?

One of the company’s staff was reviewing these text responses on a monthly basis for frequently occurring words or phrases, which was a time consuming and inconsistent process.  To solve these problems, we implemented a text analytics solution, which occurs in four key phases:

17-08-25 - Introduction to Text Analytics Part 2 (graphic).png

To aid the text analytics process, we used a specialized text analytics software product, IBM SPSS Text Analytics for Surveys or STAfS. This allowed us to begin automatically extracting words and phrases from the fields and convert them into categories. In turn, these became a new set of flag fields for analysis.

First, we began the "text tokenization" process, which occurs as part of the data cleaning step. During the initial extraction process for both fields, several words were extracted separately, even though they shared the same meaning.  Despite a spell checking process that is built into the STAfS software, some spelling variations were not recognized as the same words.  Equivalent sentiments could be express in alternative ways, such as “satisfied” and “not dissatisfied”.  Different inflections of a word, such as “sell”, “selling”, and “sold” were extracted separately.

Another common problem in the initial extraction is that one or two words of particular interest may extracted as part of a longer phrase,  For example, “interest rate” may be extracted as “interest rate is too high” or “unhappy with interest rate”.  Typically it is best practice to extract single words or short phrases, which are better suited for building categories in the next phase of the text analytics process.

Some words or phrases may fail to be extracted at all, such as a company’s name or acronyms.  Any word, phrase, or group of letters can be explicitly extracted during the extraction process.  Words and phrases can also be automatically excluded from this process.

Text analytics software enables the user to improve the extraction process by identifying equivalent words or phrases, isolating words or phrases from longer phrases, and forcing or suppressing words or phrases.  In the STAfS software, these changes are saved as a “library”.  Libraries can be customized for a single field or for a survey, and are transportable across surveys, projects and computers.  Improvements to the extraction process can be maddeningly small, but the ability to save these improvements in a library ensures that the extraction process is always getting better.

An initial review of the responses to the two questions cited above revealed that the responses to one question could be the mirror image of the other.  For instance, a recommendation for improvement such as “lower interest rates” could also be a complaint such as “interest rates are too high”.  As a result, only one library was necessary for the two questions.

The final phase of the process is the creation of categories from the “building blocks” of words and phrases extracted.  The following categories were created for each question:

What can we do better?

  • Help Desk
  • Lower Interest
  • Generally Positive Comment (e.g. “great job!”)

What were the primary reasons that you were dissatisfied?

  • Fees
  • High Interest
  • Not Dissatisfied

These categories can be used for further analysis based upon other questions that are asked in the survey. Each of these categories became a new flag field in the company’s survey data. In the case of this company, they asked the question, “How likely is it that you would recommend the company to a friend or colleague?” This question is commonly used to calculate the Net Promoter Score (NPS) for a company.  NPS "measures customer experience and predicts business growth" by comparing the number of "promoters" (those who rate a company highly) to "detractors" (those who rate the company lowly).

NPS can be calculated for different subgroups across a company's customers in order to better understand different customer experiences. As expected, the NPS was lower for customers that made a recommendation for improvement or expressed dissatisfaction.  However, the NPS was lowest for the customers that recommended a better help desk, which suggests that efforts to communicate better with customers will the most impact on their NPS. In addition, the NPS was lowest for customer that complained about fees, which indicates that lowering fees will increase the NPS.

In this case, the analysis of NPS fell in line with our expectations. Further analysis over time would allow the company to see trends and gain a better understanding of how different policy decisions directly lead to changes in NPS, ultimately providing customers with a better experience. 

AuthorRyan Harrington