"Remember, teachers know about this site!"

- Maritusss

MIT OpenCourseWare http://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Lecture 1

Introduction and Document Distance

6.006 Spring 2008

Lecture 1: Introduction and the Document

Distance Problem

Course Overview

• Eﬃcient procedures for solving problems on large inputs (Ex: entire works of Shake speare, human genome, U.S. Highway map) • Scalability • Classic data structures and elementary algorithms (CLRS text)

• Real implementations in Python ⇔ Fun problem sets!

• β version of the class - feedback is welcome!

Pre-requisites

• Familiarity with Python and Discrete Mathematics

Contents

The course is divided into 7 modules - each of which has a motivating problem and problem set (except for the last module). Modules and motivating problems are as described below: 1. Linked Data Structures: Document Distance (DD) 2. Hashing: DD, Genome Comparison 3. Sorting: Gas Simulation 4. Search: Rubik’s Cube 2 × 2 × 2 5. Shortest Paths: Caltech → MIT 6. Dynamic Programming: Stock Market √ 7. Numerics: 2

Document Distance Problem

Motivation Given two documents, how similar are they? • Identical - easy? • Modiﬁed or related (Ex: DNA, Plagiarism, Authorship) 1

Lecture 1

Introduction and Document Distance

6.006 Spring 2008

• Did Francis Bacon write Shakespeare’s plays? To answer the above, we need to deﬁne practical metrics. Metrics are deﬁned in terms of word frequencies. Deﬁnitions 1. Word : Sequence of alphanumeric characters. For example, the phrase “6.006 is fun” has 4 words. 2. Word Frequencies: Word frequency D(w) of a given word w is the number of times it occurs in a document D. For example, the words and word frequencies for the above phrase are as below: Count : 1 0 1 1 0 1 W ord : 6 the is 006 easy f un In practice, while counting, it is easy to choose some canonical ordering of words. 3. Distance...

6.006 Introduction to Algorithms

Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Lecture 1

Introduction and Document Distance

6.006 Spring 2008

Lecture 1: Introduction and the Document

Distance Problem

Course Overview

• Eﬃcient procedures for solving problems on large inputs (Ex: entire works of Shake speare, human genome, U.S. Highway map) • Scalability • Classic data structures and elementary algorithms (CLRS text)

• Real implementations in Python ⇔ Fun problem sets!

• β version of the class - feedback is welcome!

Pre-requisites

• Familiarity with Python and Discrete Mathematics

Contents

The course is divided into 7 modules - each of which has a motivating problem and problem set (except for the last module). Modules and motivating problems are as described below: 1. Linked Data Structures: Document Distance (DD) 2. Hashing: DD, Genome Comparison 3. Sorting: Gas Simulation 4. Search: Rubik’s Cube 2 × 2 × 2 5. Shortest Paths: Caltech → MIT 6. Dynamic Programming: Stock Market √ 7. Numerics: 2

Document Distance Problem

Motivation Given two documents, how similar are they? • Identical - easy? • Modiﬁed or related (Ex: DNA, Plagiarism, Authorship) 1

Lecture 1

Introduction and Document Distance

6.006 Spring 2008

• Did Francis Bacon write Shakespeare’s plays? To answer the above, we need to deﬁne practical metrics. Metrics are deﬁned in terms of word frequencies. Deﬁnitions 1. Word : Sequence of alphanumeric characters. For example, the phrase “6.006 is fun” has 4 words. 2. Word Frequencies: Word frequency D(w) of a given word w is the number of times it occurs in a document D. For example, the words and word frequencies for the above phrase are as below: Count : 1 0 1 1 0 1 W ord : 6 the is 006 easy f un In practice, while counting, it is easy to choose some canonical ordering of words. 3. Distance...

Express your owns thoughts and ideas on this essay by writing a grade and/or critique.

**Sign Up** or **Login to your account** to leave your opinion on this Essay.

Copyright © 2021. EssayDepot.com

No comments