by Ryan Harrington & Dr. Steve Poulin

Your company is deeply committed to better serving its customers. Because of this, you and your team built out a survey. This lets you quickly get customer feedback and respond accordingly. While building out the survey, you decided that one of the best ways to get customer feedback would be to use an open-ended question. Your questions probably look something like one of these:

  • What do you like most about our new product?
  • What changes would most improve our new product?
  • If you do not like our new product, why not?

SurveyMonkey outlines some benefits of using open-ended questions:

  • Lettings respondents answer in their own words can be empowering
  • Your respondents will usually surprise you - sometimes giving you information that you didn’t even know you needed

This all sounds great. The problem? Open-ended questions have actually become annoying for you and your company to use. You certainly don’t feel like you’re getting the full benefit from them. For as great as the information that comes from open-ended questions can be, there are at least two main issues that cause you to not use open-ended questions to their fullest extent:

It is time consuming to analyze the results.
Oftentimes, a team member will need to dedicate hours to reading and analyzing the text of a survey question. Depending upon the number of responses to the survey in question, this could be a serious drain to your company’s resources.

The results are not consistently reliable.
Humans are not perfectly consistent. When we perform a task such as analyzing text multiple times, we will often have a different takeaway between attempts. This means that when analyzing open-ended survey responses, perhaps on a month-to-month basis, that we’ll get slightly different results each time. That makes it difficult to get an accurate picture of customer responses over time.

To fully make use of open-ended responses, these two issues must be solved. Employing a text-analytics solution allows for your company to quickly and reliably analyze unstructured text. Let’s take a look at what that looks like.

The text analytics process converts text data into categories, which are represented as either numeric codes within a field or as series of numeric flag fields for each category.  Text is a type of “unstructured” data, and the categories it produces represent the patterns found in the data.  Typically, the regular expression (regex) language underlies the text analytics process for pattern recognition.

Just like any other analytics problem, text analytics must begin with a clear set of objectives to achieve appropriate results.  Some of these examples might include:

  • Customer complaints or recommendations
  • Positive, negative, and neutral sentiments expressed
  • Skills and experience found in resumes
  • Companies or products mentioned for advertising purposes
  • Benefits requested by employees

When thinking about how to apply text analytics to a survey question, any of the above objectives might be appropriate. Whatever the objective, it should be deeply rooted in a business need for the organization.

Specialized tools help to make the text analytics process possible. There are a variety of these tools on the market, some more custom than others. At CompassRed, we typically use the IBM SPSS statistical program, which has a long history of text analytics to create numeric data for statistical analysis. Each tool will have a slightly different process, though they are all rooted in the same concepts.

Once an objective is established, the process of “text tokenization” can begin. This process includes “normalizing” the text by correcting punctuation errors such as commas in the middle of words, misspellings, periods not followed by a space, or dealing with accents on words (like resume as opposed to résumé). Using a base dictionary, words are also systematically identified as parts of speech, such as nouns, verbs, pronouns, adjectives, and adverbs.

Once words have been identified and punctuation errors are corrected, the extraction process becomes more probabilistic.  Words that are commonly part of a phrase, such as “sports car” or “birds of a feather” are extracted together rather than as individual words.  The following variations in compound words and groups of words are recognized as being equivalent:

  • Words with different separators, such as stress free, stressfree, and stress-free
  • Words in alternative orders, such as officials of the companies and company officials
  • Words with different inflections, such as words that differ by tense (analyze, analyzed) or are either singular or plural (word, words)
  • Phrases that are optional elements (SPSS, SPSS, Inc., IBM SPSS)
  • Spelling variants (similar to spell check, although most text analytic programs are not as robust as word processing programs)
  • Geographic variants (color, colour)

Based on the rules of the extraction process, a list of word and phrases is produced.  At this point, the analyst must review the list to ensure that this list is appropriate for the objectives of the analysis.  This may include identifying words, phrases, or acronyms that are unique to an organization.  The improvements made to the extraction process by the analyst are by far the most time-consuming part of the process.

Once a satisfactory list of words and phrases is produced, these become the building blocks of categories, which in turn will become the final fields generated by the text analytics process.  The text analytics process is very labor intensive initially, but the use of text analytics software enables the analyst to save their improvements to the extraction process and the production of categories from the extraction results.  This means that as additional text is added, text analytics software can be used to automatically produce categories once the new text has been processed.

Next week, we’ll cover a sample case that utilized text analytics to extract insights from a survey for a company. Their survey data was extremely powerful, but it was time consuming for the company to analyze the results in a meaningful way. The introduction of the text analytics process helped them to find insights and build meaningful business solutions.

AuthorRyan Harrington