Plenary Lecture
Addressing Big Data Problems Using Natural Language Understanding: Applications in Intelligent Search, Question & Answer System, Business Intelligence and More
Professor Emdad Khan
InternetSpeech, USA
Southern University, USA
Imam University, Saudi Arabia
E-mail: emdad@ccis.imamu.edu.sa
Abstract: The need to solve the key problems related to Big Data in a practical and effective way is becoming very important as the data is growing very fast - already exceeding the exabyte range. In this Information Age, information is growing very fast. Internet is a classic example. The data growth on the Internet during last 15 years is phenomenal. There are various other key sources for data growth – e.g. scanners, sensors, mobile phones, smart meters, social media platforms, credit cards, digital medical records, satellite imagery and the like. Such data sources generate both unstructured and structured data. Such data are also getting integrated on the Internet and Intranet. With the growth of data, the nature of its usage is changing fast. E.g. the field of astronomy is changing from where taking pictures of the sky was a large part of an astronomer’s job to one where the pictures are all in a database already and the astronomer’s task is to find interesting objects and phenomena in the database. In the biological sciences, there is now well established tradition of depositing scientific data into a public repository, and also creating public database to be used by other scientists. In the business world, Business Intelligence (BI) has already become an important field to extract key business data from large volume of data.
There are multiple problems with big data including storage, search, transfer, sharing, analysis, processing, viewing, and deriving meaning / semantics. Such problems are mainly due to the 4 Vs i.e. Volume, Velocity, Variety and Variability. We propose Semantic Engine (SE) and associated Natural Language Understanding (NLU) based approach to address the key problems of big data. Our approach resembles human Brain-Like and Brain-Inspired algorithms as humans can significantly compress the data by representing with a few words or sentences using the semantics of the information while at the same time preserve the meaning of the content.
While current approaches to Natural Language Understanding (NLU) produce good results in some specific domains, NLU, in general, remains a complex open problem. NLU complexity is mainly related to semantics: abstraction, representation, real meaning, and computational complexity. We argue that while existing approaches are great in solving some specific problems, they do not seem to address key Natural Language problems in a practical and natural way. We propose a Semantic Engine using Brain-Like approach (SEBLA) that uses Brain-Like algorithms to solve the key NLU problem (i.e. the semantic problem) as well as its sub-problems.
Humans use hierarchical multi-level compression of the sentences, paragraphs, pages using the semantics. Our approach in SEBLA addresses big data problems in a similar way. The main theme in SEBLA is to use each word as object with all important features, most importantly the semantics. In our human natural language based communication, we understand the meaning of every word even when it is standalone i.e. without any context. Sometimes a word may have multiple meanings which get resolved with the context in a sentence. The next main theme is to use the semantics of each word to develop the meaning of a sentence as we do in our natural language understanding as human. Similarly, the semantics of sentences are used to derive the semantics or meaning of a paragraph. The 3rd main theme in SEBLA is to use natural semantics as opposed to existing “mechanical semantics” of Predicate logic or Ontology or the like. This lecture will describe the details of how we are addressing the Big Data problems for both unstructured and structured (with more emphasis on unstructured) data using Semantics and Natural Language Understanding. As mentioned above, humans do a very good job in processing unstructured data, especially using semantics to significantly compress the data by representing with a few words or sentences using the semantics of the information while keeping the core meaning. Thus, semantics and NLU help human for processing, doing summarization, for knowledge discovery / drawing inference, deriving intelligence, as well as significantly compressing the data. Semantics also plays a very important role in processing and deriving knowledge & intelligence from structured data.
Our SEBLA and NLU based approach can be used in various applications including Intelligent Information Retrieval, Intelligent Search, Q & A System, Summarization, and Business Intelligence. This lecture will briefly talk about how SEBLA addresses Big Data problems for such applications.
Brief Biography of the Speaker: Dr. Emdad Khan is the Founder of InternetSpeech. He founded the company in 1998 with the vision to develop innovative technology for accessing information on the Internet anytime, anywhere, using just an ordinary telephone and the human voice.
As a pioneer in the Internet voice space, Khan is a frequent speaker at Natural Language, Big Data, Voice-Recognition, Internet applications, bridging the Digital and Language Divides and other academia & industry conferences and trade shows. He holds 23 patents and has published more than 50 journal & conference papers on the advent of voice technology on the Internet, content rendering, Natural Language Processing/Understanding, Big Data, neural nets, fuzzy logic, intelligent systems, VLSI and optics. Khan’s acute technical knowledge and keen understanding of emerging markets has played an important role in the development of InternetSpeech’s first product/service netECHO, the only product available today that delivers complete Internet access using voice and any telephone.
During his career, Khan invented, defined, developed and deployed worldwide new intelligent software products for micro-controller-based home appliances. He has also created and deployed speech recognition based Internet applications. He has 20 years of experience with large semi-conductor companies, including Intel and National.
Khan is active in research. His current major interest is to use brain-like and brain-inspired algorithms to solve some open problems, especially, NLU (Natural Language Understanding) and Big Data, which is very well aligned with InternetSpeech’s next generation products & services to allow users (especially bottom of the pyramid people) to interact with the Internet using their natural language and thus help their economic, social and other developments.
He is the author of the book “Internet for Everyone: Reshaping the Global Economy by Bridging the Digital Divide”.
He holds a doctorate in computer science, masters of science degrees in electrical engineering and engineering management, and a bachelor of science degree in electrical engineering.
Khan is currently on leave from InternetSpeech and a faculty at the Computer Science department of Imam University, Riyadh, Saudi Arabia. Khan is also a visiting Research Professor at the Southern University in Baton Rouge, Louisiana, USA.