Every new product or feature release requires a comprehensive compilation of technical help responses; it ensures that users can have their questions answered, troubleshoot problems or even modify the application as per their needs.
With a vast and global user base, some queries are likely to be asked more frequently than others. Classifying technical queries and separating niche questions from common issues can intuitively identify the help that is typically needed by users and address it at the earliest.
Challenges with Technical Help Queries
With every new product, there are different layers and variations of help content. Help content associated with the latest version of Microsoft Edge is a good example. Users might have ‘How to’ questions related to certain features or settings. Like ‘how to launch a new tab through the keyboard’ or ‘how to adjust the firewall’ on the browser. They could also have ‘Troubleshooting’ queries when dealing with a crash or system error. Developers may need to know how to switch off ‘natural metrics’ to render fonts properly.
Users could also need explicit help, require downloadable documentation, or deeper information about certain features or aspects of Edge.
Edge and all other Microsoft products collect a constant stream of user queries and store them in logs. These anonymized data logs serve as the fodder to train a machine learning algorithm that could eventually classify the data into groups so that similar and popular queries are easy to answer.
Help content typically falls into the following categories: troubleshooting, how to, information (what is), explicit help, configuration and download intents.
To dissect and categorize the data effectively, a machine learning algorithm needs to be able to operate through multiple phases:
- Phase 1: Identify the product or feature concepts
- Phase 2: Classify as per product-related topics
- Phase 3: Classify based on hierarchy
- Phase 4: Cluster content with the same or similar solutions
An effective solution needs to be able to differentiate between explicit and implicit intent from the logs. Search queries in Bing, for example, are explicit (How to disable this feature?) while logs in Windows 10 may be based on user actions which are implicit.
The algorithm also needs to be able to identify the right version and model of each product to ensure queries are relevant. Data is also scrubbed with the Azure ML Studio’s Preprocess Text module. Finally, the algorithm must be able to measure and prioritize the resultant clusters so that the most relevant (depending on priority, severity, number of impressions per market etc.) and useful answers are quickly matched to each type of query.
Unpacking and classifying complex data like this is a challenge. Here’s an effective solution - a probabilistic knowledge base using the Probase architecture.
The Probase architecture is a prototype knowledge base created using data from search queries. Combining this architecture with a probabilistic model can help create relevant and actionable clusters from help content.
Initially, the web domain click data for each specific product is used to identify the tech queries and potential concepts for each product. The new identified concepts helped fetching more tech domains related to the product and the process is iterated till we get good coverage of concepts and web domains.
Probability of Feature = (Probability of Feature with Product A/Total appearances of Feature), (Probability of Feature with Product B/Total appearances of Feature)
For e.g., device manager in Bing logs mostly refer to Windows product (probability = 0.8), but also belongs to other products with probability = 0.2. However, the probability for windows product is high when its fetched from USB logs.
The model works by asking, “what are the chances?”. For example, based on the data, the model can identify the percentage probability of a query being related to something. It can identify specific products from a range and associate them with features. Then it can associate the query with the user's intent. In other words, the model asks itself - “What are the chances of this product having this feature? What are the chances that the user intends to disable or modify the feature?”
This helps classify queries based on product, feature, and user’s intent. The model must then classify based on topic (example, ‘Windows Phone’ or ‘Skype calls’) and hierarchy (niche or generic).
In this way, the Probase model works on different levels and can effectively group technical queries based on the product, feature, intent, topic, and relevance. The concepts from Probase model output along with the web domains (which are clicked) is given to the decision tree algorithm as features for the product classification. The accuracy of the model is >95%. Bringing similar topics and solutions together makes it easier for users to troubleshoot errors and use the products more effectively. Capturing all possible variations of the concept and creating all the concept hierarchies, the data can be useful for creating any level of classifications and clustering. Since we use the shortest possible concept, multiple queries can match. This reduces the training set substantially. With around 1K Windows concepts, we could match millions of clusters.
Creating clusters is a continuous process. Trending issues are monitored and collected daily. Final clusters are selected every week based on the impression count and severity. Since we run this algorithm daily, the speed and robustness is also achieved by applying the Bing queries using already learned probase. Probase is updated every month from the no concepts queries.
With technical advancement, devices and new products are bound to get more complicated in the future. Supporting documentation and technical help content is likely to become unwieldy without proper classification tools.
Combining machine learning with product logs to understand trending queries and derive actionable insights is the solution. The knowledge base creates actionable clusters for common solutions based on the type of product, the intent of the user, and the topic of the query. This is the key to make technical assistance more intuitive, effective and future ready.