Laboratory #4
Data Analysis and Pattern Recognition

1. Pattern Recognition Toolbox

1.1 Introduction

The experiments in Lab 4 use a Matlab program by Michael Heinz called the Pattern Recognition and Feature Extraction Toolbox. (The manual provides a systematic description of all of its features.) The purpose of this part of Lab 4 is to familiarize you with the program and to review pattern classification concepts. Be aware of the following facts about the program:

It is designed for 2-category classification.
It is designed for 2-dimensional (x-y) input data. In the d-by-2 data matrix, each column is a feature and each row is a different data point.
It reads data from files, not MATLAB variables; whenever you want to create your own data, you must use data as a variable name and save it to a file with the MATLAB command save <filename> data .
It provides several tools, but you can use only one tool at a time.

1.2 Getting Started

0. Make a local copy of the cs436/Lab4 directory in your own file area.

1. Launch Matlab.

2. Connect to your directory (e.g., at the Matlab prompt, type cd my-dir )

3. Type lab4

4. Look at Command-Bar Menus. You should see:

Graph -> Lab Data ----- Purpose: Plot 1-d data from 1 or 2 files

Analysis -> Envelope ----- Purpose: Extract waveform features (not for Lab 4)
Analysis -> Covariance ----- Purpose: Do statistical analysis
Analysis -> DFT ----- Purpose: Extract spectral features (not for Lab 2)

Cluster -> k-Means ----- Purpose: Apply k-means procedure
Cluster -> Nearest Neighbor ----- Purpose: Apply nearest-neighbor procedure

Link -> Real Time Links ----- Purpose: Acquire real-time data (not for Lab 4)
Exit -> Close HCI Lab ----- Purpose: The clean way to quit

1.3 Exploring the Graph tool

1. At MATLAB prompt, generate and save a sine wave by typing

data = sin(0:0.1:300)';
save temp data;

2. Reselect the HCI Lab window

3. Choose Graph -> Lab Data
(A window should appear along with a new "Options" menu.)

4. Choose Options -> Plot Data -> Single File
A popup menu will ask for the file name; type temp and select "Continue"
A popup menu will show the number of data points and will ask for the range of points; just select "Continue"

5. Repeat 4, but this time make the range be 1 to 300

Record what you see.

6. Repeat 4, but this time choose Options -> Plot Data -> Two Files; type temp as the name of each file; use your choice for the data ranges -- but keep them in the legal range!

Record the data ranges you chose and what you see.

7. Choose Options -> Exit

1.4 Exploring the Analysis -> Covariance tool

1. Reselect the HCI Lab window

2. Choose Analysis -> Covariance

3. Choose Options -> Covariance Example
(You should see a plot with red and blue data points, asterisks at the means; note the different scales for the two axes.)

4. If the box at the bottom right does NOT say "Click for Euclidean Distances", click on "Toggle Distance Type" at the top center so that it does.

5. Position the cursor at a point half-way between the means and click. (You should see the roughly equal distances; press any key to dismiss the message.)

Record the two distances

6. Position the cursor at the top-most red point in Cluster 1 and click.

Record the two distances
How would a minimum-Euclidean-distance classifier classify this point?

7. On the upper-left pulldown (saying "Hide Contours"), choose "Euclidean Contours"; then select "Cluster 1". You should see ellipses.

Why aren't the contours circular?
(Reshape the window until they are roughly circular.)

8. On the upper-left pulldown (saying "Euclidean Contours"), choose "Hide Contours".

9. On the upper-right pulldown (saying "Hide Separator"), choose "Euclidean Separator" (You should see the decision boundary based on Euclidean distance.)

Estimate the percentage of points that are misclassified.

10. Repeat 9, but choose "Mahalanobis Separator"

Estimate the percentage of points that are misclassified.

11. Click on "Toggle Distance Type" to get Mahalanobis distances. By positioning the cursor and clicking:

Measure the Mahalanobis distances to a point halfway between the means
Measure the Mahalanobis distances to the uppermost point in Cluster 1
How would a minimum-Mahalanobis-distance classifier classify each point?

12. On the upper-left pulldown, choose "Mahalanobis Contours"; then select "Cluster 1".

Describe the difference between the Mahalanobis contours and the Euclidean contours.

13. Under "Options" choose "Exit"

1.4 Exploring the Cluster -> K-means tool

1. Choose Cluster -> K-means

2. Choose Options -> K-means Example 2

Describe how you would divide the data points into two groups

3. Using the popup menu, select 2 clusters and "Continue"; watch as the K-means procedure clusters the data

Note whether or not the K-means procedure produced a clustering similar to yours; describe any major differences

4. Repeat 2 and 3 using 4 clusters

Would you call the resulting clustering "reasonable"? Comment.

5. Choose Options -> K-means Example 1

Describe how you would divide the data points into two groups

6. Select 2 clusters and "Continue"; watch as the K-means procedure clusters the data

Note whether or not the K-means procedure produced a clustering similar to yours; describe any major differences

7. Repeat 5 and 6 using 4 clusters

Would you call the resulting clustering "reasonable"? Comment.

8. Choose Options -> Exit

9. Choose Cluster -> Nearest Neighbor

10. Choose Options -> Nearest Neighbors Example 1

11. Using the popup menu, select t=1.0 and "Continue"; watch as the Nearest-Neighbor procedure clusters the data

Note whether or not the Nearest-Neighbor procedure produced a clustering similar to yours; describe any major differences

12. Repeat 10 and 11 using t=0.6

Would you call the resulting clustering "reasonable"? Comment.

13. Choose Options -> Nearest Neighbors Example 2

14. Try all 4 values of t

Were any of the clusterings "reasonable"?
In general terms, describe why Nearest Neighbor is better for Example 1 and K-means is better for Example 2

15. Choose Options -> Exit, and then choose Exit -> HCI Lab.

On to Lab # 4, Part b: Synthetic Data

Up to Lab #4

Laboratory #4 Data Analysis and Pattern Recognition

1. Pattern Recognition Toolbox

1.1 Introduction

1.2 Getting Started

1.3 Exploring the Graph tool

1.4 Exploring the Analysis -> Covariance tool

1.4 Exploring the Cluster -> K-means tool

Laboratory #4
Data Analysis and Pattern Recognition