Words of Wisdom:

"to take a risk is to take no risk at all :)" - Aggie5394

India

  • Date Submitted: 09/25/2011 03:48 AM
  • Flesch-Kincaid Score: 64.9 
  • Words: 1453
  • Essay Grade: no grades
  • Report this Essay
MIT OpenCourseWare http://ocw.mit.edu

6.006 Introduction to Algorithms
Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Lecture 1

Introduction and Document Distance

6.006 Spring 2008

Lecture 1: Introduction and the Document
Distance Problem

Course Overview
• Efficient procedures for solving problems on large inputs (Ex: entire works of Shake­ speare, human genome, U.S. Highway map) • Scalability • Classic data structures and elementary algorithms (CLRS text)
• Real implementations in Python ⇔ Fun problem sets!
• β version of the class - feedback is welcome!

Pre-requisites
• Familiarity with Python and Discrete Mathematics

Contents
The course is divided into 7 modules - each of which has a motivating problem and problem set (except for the last module). Modules and motivating problems are as described below: 1. Linked Data Structures: Document Distance (DD) 2. Hashing: DD, Genome Comparison 3. Sorting: Gas Simulation 4. Search: Rubik’s Cube 2 × 2 × 2 5. Shortest Paths: Caltech → MIT 6. Dynamic Programming: Stock Market √ 7. Numerics: 2

Document Distance Problem
Motivation Given two documents, how similar are they? • Identical - easy? • Modified or related (Ex: DNA, Plagiarism, Authorship) 1

Lecture 1

Introduction and Document Distance

6.006 Spring 2008

• Did Francis Bacon write Shakespeare’s plays? To answer the above, we need to define practical metrics. Metrics are defined in terms of word frequencies. Definitions 1. Word : Sequence of alphanumeric characters. For example, the phrase “6.006 is fun” has 4 words. 2. Word Frequencies: Word frequency D(w) of a given word w is the number of times it occurs in a document D. For example, the words and word frequencies for the above phrase are as below: Count : 1 0 1 1 0 1 W ord : 6 the is 006 easy f un In practice, while counting, it is easy to choose some canonical ordering of words. 3. Distance...

Comments

Express your owns thoughts and ideas on this essay by writing a grade and/or critique.

  1. No comments