Data Analysis and Pattern Recognition

The experiments in Lab 4 use a Matlab program by Michael Heinz called the Pattern Recognition and Feature Extraction Toolbox. (The manual provides a systematic description of all of its features.) The purpose of this part of Lab 4 is to familiarize you with the program and to review pattern classification concepts. Be aware of the following facts about the program:

- It is designed for 2-category classification.

- It is designed for 2-dimensional (x-y) input data. In the d-by-2 data
matrix, each column is a feature and each row is a different data point.

- It reads data from files, not MATLAB variables; whenever you want
to create your own data, you must use data as a variable name
and save it to a file with the MATLAB command save <filename>
data .

- It provides several tools, but you can use only one tool at a time.

0. Make a local copy of the cs436/Lab4 directory in your own file area.

1. Launch Matlab.

2. Connect to your directory (e.g., at the Matlab prompt, type cd my-dir )

3. Type lab4

4. Look at Command-Bar Menus. You should see:

Graph -> Lab Data ----- Purpose: Plot 1-d data from 1 or 2 files

Analysis -> Envelope ----- Purpose: Extract waveform features (not for Lab 4)

Analysis -> Covariance ----- Purpose: Do statistical analysis

Analysis -> DFT ----- Purpose: Extract spectral features (not for Lab 2)

Cluster -> k-Means ----- Purpose: Apply k-means procedure

Cluster -> Nearest Neighbor ----- Purpose: Apply nearest-neighbor procedure

Link -> Real Time Links ----- Purpose: Acquire real-time data (not for Lab 4)

Exit -> Close HCI Lab ----- Purpose: The clean way to quit

1. At MATLAB prompt, generate and save a sine wave by typing

data = sin(0:0.1:300)';2. Reselect the HCI Lab window

save temp data;

3. Choose Graph -> Lab Data

(A window should appear along with a new "Options" menu.)

4. Choose Options -> Plot Data -> Single File

A popup menu will ask for the file name; type

A popup menu will show the number of data points and will ask for the range of points; just select "Continue"

5. Repeat 4, but this time make the range be 1 to 300

6. Repeat 4, but this time choose Options -> Plot Data -> Two Files; type

- Record what you see.

7. Choose Options -> Exit

- Record the data ranges you chose and what you see.

1. Reselect the HCI Lab window

2. Choose Analysis -> Covariance

3. Choose Options -> Covariance Example

(You should see a plot with red and blue data points, asterisks at the means; note the different scales for the two axes.)

4. If the box at the bottom right does

5. Position the cursor at a point half-way between the means and click. (You should see the roughly equal distances; press any key to dismiss the message.)

6. Position the cursor at the top-most red point in Cluster 1 and click.

- Record the two distances

7. On the upper-left pulldown (saying "Hide Contours"), choose "Euclidean Contours"; then select "Cluster 1". You should see ellipses.

- Record the two distances
- How would a minimum-Euclidean-distance classifier classify this point?

8. On the upper-left pulldown (saying "Euclidean Contours"), choose "Hide Contours".

- Why aren't the contours circular?

(Reshape the window until they are roughly circular.)

9. On the upper-right pulldown (saying "Hide Separator"), choose "Euclidean Separator" (You should see the decision boundary based on Euclidean distance.)

10. Repeat 9, but choose "Mahalanobis Separator"

- Estimate the percentage of points that are misclassified.

11. Click on "Toggle Distance Type" to get Mahalanobis distances. By positioning the cursor and clicking:

- Estimate the percentage of points that are misclassified.

12. On the upper-left pulldown, choose "Mahalanobis Contours"; then select "Cluster 1".

- Measure the Mahalanobis distances to a point halfway between the means
- Measure the Mahalanobis distances to the uppermost point in Cluster 1
- How would a minimum-Mahalanobis-distance classifier classify each point?

13. Under "Options" choose "Exit"

- Describe the difference between the Mahalanobis contours and the Euclidean contours.

1. Choose Cluster -> K-means

2. Choose Options -> K-means Example 2

3. Using the popup menu, select 2 clusters and "Continue"; watch as the K-means procedure clusters the data

- Describe how you would divide the data points into two groups

4. Repeat 2 and 3 using 4 clusters

- Note whether or not the K-means procedure produced a clustering similar to yours; describe any major differences

5. Choose Options -> K-means Example 1

- Would you call the resulting clustering "reasonable"? Comment.

6. Select 2 clusters and "Continue"; watch as the K-means procedure clusters the data

- Describe how you would divide the data points into two groups

7. Repeat 5 and 6 using 4 clusters

- Note whether or not the K-means procedure produced a clustering similar to yours; describe any major differences

8. Choose Options -> Exit

- Would you call the resulting clustering "reasonable"? Comment.

9. Choose Cluster -> Nearest Neighbor

10. Choose Options -> Nearest Neighbors Example 1

11. Using the popup menu, select t=1.0 and "Continue"; watch as the Nearest-Neighbor procedure clusters the data

12. Repeat 10 and 11 using t=0.6

- Note whether or not the Nearest-Neighbor procedure produced a clustering similar to yours; describe any major differences

13. Choose Options -> Nearest Neighbors Example 2

- Would you call the resulting clustering "reasonable"? Comment.

14. Try all 4 values of t

15. Choose Options -> Exit, and then choose Exit -> HCI Lab.

- Were any of the clusterings "reasonable"?
- In general terms, describe why Nearest Neighbor is better for Example 1 and K-means is better for Example 2

On to Lab # 4, Part b: Synthetic Data

Up to Lab #4