Artificial intelligence (AI) is one of most-discussed technology topics among consumers and enterprises alike. While diverse AI experiments have been underway, the area that has shown the most maturity is Conversational AI. With the widespread adoption of AI-powered smart devices (Alexa and Google Home) and virtual assistants (e.g. Siri, Cortana and Google Assistant), more and more organizations are moving toward a conversation-driven interface to better engage their customers and drive efficiency through automation. The simplest form of conversational AI is a chatbot. But what exactly is chatbot, and how are they built?

 

Chatbots – The Simplest Form of Conversational AI

Chatbots have come a long way from their humble origins at an MIT Lab in 1966. At that time, “Eliza” was a way to demonstrate the superficiality of human-to-machine communication by matching user prompts with simple scripted responses. Today, chatbots have grown into a full-blown industry with constant innovations bridging the human-to-machine communication gap. Far from simple responses, natural language processing (NLP) enables bot engines to be trained to understand and respond to different types of dialogs in an almost human-like response.

 

Some chatbot deployments are still simple knowledge-based conversational agents that match a query to a predefined set of answers. Others are much more advanced and can learn to understand a user’s need, context and mood – and respond accordingly with personalization and sentimental analysis. Yet regardless of their complexity, chatbot solutions share key building blocks and multiple core technologies.

 

Anatomy of a Chatbot

Delivering a chatbot experience that’s considered an intelligent conversation requires multiple technologies to work in perfect harmony. Those three technology layers are the Bot Engine, Channels, and Enterprise Systems. The Bot Engine is the heart of the chatbot, enabling it to have conversations based on information leveraged from Enterprise Systems (existing data sources) and rendered through the appropriate communication Channel(s).

 

 

Bot Engines are comprised of four components: the AI Engine, the Conversation Builder, Domains, and Analytics & Reporting. Taken as a whole, these elements of the Bot Engine allow developers to design and implement conversations while helping the bot to learn from the past interactions.

 

The AI Engine is the brain of the overall solution, and an intelligent chatbot leverages multiple AI technologies to understand, learn and respond to a user. It accomplishes this in one of three ways: natural language processing (NLP), machine learning (ML), and voice-to-text / text-to-voice conversion. 

 

  • NLP enables the Bot to understand verbal requests that users make using their natural language. NLP can interpret the important elements of the sentence, identify aspects that might correspond to specific features in a data set, and return an answer.
  • ML is predominantly used to train the chatbot to respond in a certain way. Responses may be based on past user interactions and queries, or on defined interactions that improve over time using data captured from new user interactions. A training model consumes the refined data much like how we teach children to identify colors first by informing them about colors, then reinforcing the lesson until they give the right answer. During the entire process, if a chatbot encounters an ambiguous input, the correct response must be provided and reinforced to teach it the new information. Using the child/color analogy from above, if a child has learned about yellow and red only, he/she will not know how to respond after encountering a shade of orange. The child therefore needs to be taught that there exists another color, called orange, in order to provide the correct answer in the future.

 

While NLP and ML do the heavy lifting behind the scenes, voice-to-text and text-to-voice engines are predominantly required for voiced-based channels where users interact with the chatbot verbally. As the name suggests, the key role of this component is to convert voice inputs to text, then pass the text to the NLP engine for analysis and response. The process is then reversed to convert the text response provided by the NLP engine to voice so it can be communicated by the channel.

 

The second core component of a Bot Engine, the Conversation Builder, designs relevant conversations and helps to define the path of a conversation depending on user input. One of the key subcomponents of a conversation builder is Dialog Management, which defines how the user moves from one step in the conversation to another, or what happens when the chatbot encounters something for which it has not been trained. The other key subcomponent is the Domain Modeler, which extracts the meaning and context of a user input and defines the supported interactions through Intents and Entities. Intents are the users’ relevant intentions that a chatbot can support, whereas Entities are objects that provide more context to the intent. For example, if a user wants to block his/her access card or apply for sick leave, the intent would be something like blockCard or applyLeave, and the entities could be cardType and leaveType. The Intents and Entities are then used to build a domain dictionary to defined all supported interactions.

 

Domains, functional blocks that group related conversations, are the third core Bot Engine component. Depending on organizational requirements, there may be multiple domains. For employee self-service bots, domains could be HR, Payroll and Travel, whereas a customer-service commerce bot might have domains for product search/information, returns and tracking.

 

The final core Bot Engine component is Analytics & Reporting. This module analyzes the conversation between the chatbot and user and captures performance metrics such as how many times the bot provided a satisfactory response, how many times it had to pass the conversation to a live agent, or how many times the bot failed to respond. Based on these reports, the ML training model can be modified to enhance the chatbot’s performance and improve the quality and context of its responses. 

 

The second of the three technologies that must work in harmony to deliver an intelligent conversation is a group called Channels. Channels are the interfaces where humans and machines communicate, the places where users submit queries and chatbots provide responses. Depending on the requirements, use cases and coverage it wants to provide, an organization may enable single or multiple channels for chatbot-driven conversations. Enterprises often deploy chatbots into one or more of four primary channels: Smart Devices, Virtual Assistants, Traditional Channels and Social Media.

 

  • Smart devices often allow users to interact through voice commands (Amazon Echo, Google Home, etc.) and hence meet the strictest definition of “conversational AI.” Most smart devices have a built-in Voice-to-Text/ Text-to Voice-converter and provide an out of-the-box bot engine (provided by the device vendor) to design conversation. However, developers and organizations may use their own bot platforms or engines to implement custom chatbot applications that leverage some or all the capabilities provided by these devices (voice input, voice to text, Bot engine, etc.).
  • Virtual assistants (e.g. Siri, Google Assistant, Cortana) are third-party applications that run on a user’s personal device. Like smart devices, virtual assistants provide some key capabilities including audio inputs (voice commands), voice-to-text/text-to-voice conversion and chatbot engines, allowing developers and organizations to build their own conversations using a vendor-provided ecosystem of tools. Though powerful, they can act merely as another channel if integrated with a third-party or proprietary bot engine to gather data.
  • Traditional channels include organization-owned digital properties such as a website, mobile applications or IVR systems that can be integrated with a chatbot engine to allow better engagement with visitors. This involves development of custom module within the existing application (website, mobile app or IVR) that leverages the complete capability of the Bot Engine.
  • Some social media sites provide a framework allowing developers to write their own chatbot application that runs within the social media environment to provide user functionality . For example, an ecommerce company may allow users to check the status of their order through a Facebook chatbot. The application can either use the framework offered by the social media provider to build the complete bot, or it can leverage them only as channels while relying on the core bot engine outside the social media platform.

 

The third and final technology that must work in harmony with others to deliver an intelligent conversation is Enterprise Systems. Enterprise systems refers to the data source or backend applications on which a chatbot relies to get the information requested by the user. For example, a chatbot used to apply for leave might access the backend SAP, which contains all the information and workflows defined to apply for leave in the system. Depending on the use case and type of transaction, a chatbot might connect with one or multiple enterprise systems to provide the right response. The Bot engine needs to integrate with different backend systems through APIs, and it should allow integration with a Live Agent System to perform smooth handover to human support staff. In so doing, it must also provide the complete context for responding to queries that the chatbot couldn’t resolve itself.

 

Bringing it All Together

We have come a long way since a 1966 MIT lab in our ability to create meaningful conversations using AI. Yet there is still a long way to go before chatbots can replace humans. Even designing a simple conversation requires many different technologies to work in tandem to ensure a delightful user experience. There are many different approaches and tools, each with varying capabilities, that can be used to create chatbots that meet enterprise and user requirements. Much depends on the training model and data used to train a chatbot, and the requisite course corrections in training methodologies require human intervention to ensure the right learning. The second part of this blog series will cover the key criteria organizations should focus on while choosing a chatbot platform.

Ashutosh Uniyal

Ashutosh Uniyal

Ashutosh Uniyal is a digital evangelist with over a decade of experience in helping customers adopt digital technologies and ways of working. He leads presales for Wipro’s Mobility and User Experience practice focused on mobility driven digital transformations.

What you’ve read here? Tip of the iceberg. Are you ready to be part of the excitement?