2014 ~ Algos + Machines

Wednesday, December 10, 2014

On Buzzwords

If my time at an investment bank has taught me anything, I would say abbreviations and jargon are the biggest barrier to entry to the financial services industry. Why in the world would anyone say real estate instead of space? Hey Jane, can I put my books on your table? I don't have enough real estate in my drawer.

Now it seems the tech industry has taken a leaf out of the good ole' wall street's book. Even Peter Thiel recently warned venture capitalists to be wary of buzzwords like 'big data' and 'cloud'.

So what are the buzz words in tech today? "Disruptive", "Analytics", "Big Data", "hacking life/problems/laundry/literally anything", 'Scalable", "the uber for <x>"... and for the greedy ones among us: SO LO MO. Forget Jesus, apparently SO, LO and MO make the holy trinity in today's startup land:

SO (Social)

LO (Local)

MO (Mobile)

$$$$$$$

Sarcasm aside, Life Hacker has a more comprehensive list of buzzwords. And go to this website at your own risk for more. Another buzz word that I see emerging is Machine Learning. Now before you go on to claim your product harvests Machine Learning technology to solve <insert a generic problem here> in your website, it would be good to learn what exactly Machine Learning (ML) is.

Watson! - Deducing IBM's Answer to Jeopardy

Watson, an IBM built Question Answering computer, is most popular for its extraordinary performance in the popular TV show Jeopardy. Most recently, it has been in the news for its application in financial services and healthcare. Watson was built as part of the DeepQA Project, and an overview of the project was published in a 2010 paper. This paper was part of an AI course that I took in Fall 2013. Below is a quick summary of how WATSON solves Jeopardy questions that I wrote (rather hastily) as part of an assignment.

Building Watson: An overview of the QA project

Question Answering technology can be useful to professionals in timely and sound decision making in the areas of compliance, legal, healthcare, etc. With this technology in mind, Ferrucci and et al. started working on a project to build Watson, a QA computer system that would compete in Jeopardy. Jeopardy is a TV competition that asks natural language questions from a broad domain to the competitors. To answer a question and get points, the competitor should be both accurate, confident and fast. To compete at human level, Watson would have to be able to answer around 70% of the questions asked in less than 3 second each with 80% precision.

To meet its goal, Watson incorporates QA technologies such as parsing, question classification, knowledge representation, etc. The decision for Watson to press or not press the buzzer in order to answer a question would depend on its confidence on its answer generated. Since most questions in Jeopardy are complex, there are different components of the questions that need to be evaluated separately. Hence, to calculate the confidence for an answer, each component's confidences is combined to generate the answer's combined confidence. Confidence estimation is an important part of QA technology.

An answer for a Jeopardy question might not be generated from the whole question. The subclues could be generated from two different parts of the question sentence, or the subclues could be hidden within another subclue in the sentence. Similarly, puzzles in the question can be categorized for special processing.

For Watson, certain kinds of Jeopardy questions were excluded. Questions with audio visual components have been excluded as they are outside the scope.

The domain: A Lexical Answer Type (LAT) is a word in the clue that indicates the type of the answer. In 20,000 questions, 2500 distinct LATs were discovered. It is worth mentioning that about 12% of the questions did not have any distinct LAT. The most frequent 200 LATs made less than 50% of the total information. Since figuring out the distinct LAT for a question is a complex process, a system like Watson must have a sound Natural Language Processing and Knowledge Representation and Reasoning technology.

The metrics: To play Jeopardy, factors such as betting strategy and luck of the draw is important as well. But for building Watson, the main factors being investigated are speed, confidence and correctness. Precision is the number of correct answers out of the number of questions Watson chose to answer. Whether Watson chooses to answer a question is based on the confidence threshold. Any confidence score above the confidence threshold would mean Watson would press the buzzer in order to answer the question. This threshold controls the tradeoff between precision and percent answered. Data collected show that human winners usually have 85%-95% precision with 40% - 50% of the questions answered.

Watson uses the DeepQA technology, which is a massively parallel, probabilistic evidence based architecture. More than 100 techniques are used for analyzing natural language, collecting sources, finding and generating hypotheses, finding and ranking hypotheses. The main principals implemented are: massive parallelism for multiple interpretation, combination of probabilistic questions from a broad domain, contribution of all components of a question towards its confidence generation, use of strict and shallow semantics.

Content Acquisition: This is the process of collecting the content to be used for answer generation. There are a manual process and an automatic one. First, it is important to analyze the question to find out what type of answer is required (manual), and if possible the category of the question, possibly characterized by a LAT (automatic). The sources of content are literary works, dictionaries, encyclopedias, etc. After question evaluation, the process of content evaluation begins. Four high level steps take place: identify seed source documents from the web, extract the text nuggets from the documents, give scores to each text nugget depending on their informative nature with respect to their source document, merge the text nuggets to give one result to be evaluated.

Question Analysis: The process of Question Analysis uses many techniques, but they mainly consist of Question classification, LAT detection, Relation Detection and Decomposition. Question classification is the process of using words or phrases to identify the type of the question by considering the phrases' syntactic, semantic or rhetorical functionality. Some example of question types are puzzles, math question, definition question, etc. LAT detection involves determining whether the answer of a question can be an instance of a LAT. This can be done by replacing a component of the question with the LAT and then gathering evidence for the correctness of the given LAT candidate. Similarly, many questions in Jeopardy contain relations such as syntactic subject-verb-object predicates or semantics relationships. Watson uses relation detecting in many of its QA process, from LAT detection to answer scoring. Because of the large knowledge domain in Jeopardy, detecting most frequent relations can be helpful in only about 25% of the Jeopardy questions.

Hypotheses generation: After question analysis, the QA system generates candidate answers by searching the system's sources and generating answer sized snippets. This stage consists of two processes: Primary search and candidate answer generation. The goal of Primary search is to find as many answer-bearing contents as possible. Many search techniques are used, mainly: multiple text-search engines, document search as well as passage search, knowledge base search using SPARQL.

Soft filtering: After candidate generation, there are algorithms that are light-weight (less resource intensive) that prune the larger set of candidates to a smaller set. There is usually a threshold for the number of candidates in the final set. Candidates that are pruned are taken to the final merging step.

Hypothesis and evidence sorting: After pruning, the candidate answers go through a stronger evaluation process. This step consists of two main processes: Evidence Retrieval and Scoring. Evidence Retrieval is the process in which contents that provide evidence for a candidate answer. Search techniques like passage search using a question component with the candidate answer as query are used. Scoring: After evidence gathering, each answer-candiate is given confidence scores based on different factors. Watson itself employs more than 50 scoring components that range from formal probabilities to categorical features. These scorers use logical, geospatial, relational, logical reasoning. After these scores have been generated, they are merged for each candidate answer. The merged scores are then ranked by a system that has already been run over a training questions whose answers are known.

Trees - A Non Linear Data Structure

Trees are undirected (without direction) graphs with exactly one path between any two nodes. They are also non-linear data structures that have node-relationships: parent nodes, siblings and children nodes.

Trees are helpful in implementing search algorithms like Breadth-First and Depth-First search. For a graph to be a tree, it should have no cycle, i.e. there shouldn't be more than one path between any two nodes. A graph G is a tree if it satisfies one of the following equivalent criteria:

G is connected (a path to each node in the graph) and has no cycles.
A cycle is formed if any two nodes are connected in G.
G is connected (at least one path to each node), and it is not connected anymore if any edge is removed from G.

For a tree G with n number of nodes, it has n - 1 edges.

An example of a graph (taken from here):

I can spot 3 cycles, made of nodes:
- 2,3,4,5
-1,2,5
- 1,2,3,4,5

Now, an example of a tree (taken from here):

In this tree, n (number of nodes) = 7 and number of edges = n-1 = 7-1 = 6 edges

BFS and DFS - The Simplest of Search Algorithms

Every algorithms and artificial intelligence class starts with two simple search algorithms: Breadth First Search (BFS) and Depth First Search (DFS). They are used in data structures like trees, which is basically a network/graph of nodes that have values or states associated with them. What makes a tree a tree? Find out here.

BFS and DFS are two kinds of search algorithms in which you follow a certain path to to go through each node until you find the goal node. They are uninformed searches, meaning you do not know whether one non-goal node is better than another non-goal node. This way you do not know whether the path that you are following is better than any other path that you could follow.

Now onto search strategies. Both strategies start with expanding a parent/start node s: i.e. recording its successor nodes.

Assume the following relationship:
- s is the starting node
- a and b are successor nodes of s
- x is a successor of a

In BFS, every direct successor of s will be checked for solution before moving onto its successors' successors. So a and b are checked for solution before checking x. The path will look something like this:

In DFS, after expanding s, its first direct successor is checked for solution. If its not the goal node, we move on to the successor node of s's first direct successor node. So, a and x are checked for solution before b. The path for DFS looks like this:

Performance wise, one search strategy is not necessarily better than the other. Detailed comparisons here.

Algos + Machines

Sample Text

Unordered List

Wednesday, December 10, 2014

On Buzzwords

Saturday, November 15, 2014

Watson! - Deducing IBM's Answer to Jeopardy

Wednesday, October 29, 2014

Trees - A Non Linear Data Structure

Thursday, September 11, 2014

BFS and DFS - The Simplest of Search Algorithms

Popular Posts

Recent Posts

Definition List

Text Widget

Blog Archive

Pages