Laboratory #4
Data Analysis and Pattern Recognition
1. Pattern Recognition Toolbox
1.1 Introduction
The experiments in Lab 4 use a Matlab program by Michael Heinz called
the Pattern Recognition and Feature Extraction Toolbox. (The
manual provides a systematic description of
all of its features.) The purpose of this part of Lab 4 is to familiarize
you with the program and to review pattern classification concepts. Be aware
of the following facts about the program:
- It is designed for 2-category classification.
- It is designed for 2-dimensional (x-y) input data. In the d-by-2 data
matrix, each column is a feature and each row is a different data point.
- It reads data from files, not MATLAB variables; whenever you want
to create your own data, you must use data as a variable name
and save it to a file with the MATLAB command save <filename>
data .
- It provides several tools, but you can use only one tool at a time.
1.2 Getting Started
0. Make a local copy of the cs436/Lab4 directory
in your own file area.
1. Launch Matlab.
2. Connect to your directory (e.g., at the Matlab prompt, type cd
my-dir )
3. Type lab4
4. Look at Command-Bar Menus. You should see:
Graph -> Lab Data ----- Purpose: Plot 1-d data from 1 or
2 files
Analysis -> Envelope ----- Purpose: Extract waveform features (not for
Lab 4)
Analysis -> Covariance ----- Purpose: Do statistical analysis
Analysis -> DFT ----- Purpose: Extract spectral features (not for Lab
2)
Cluster -> k-Means ----- Purpose: Apply k-means procedure
Cluster -> Nearest Neighbor ----- Purpose: Apply nearest-neighbor procedure
Link -> Real Time Links ----- Purpose: Acquire real-time data (not for
Lab 4)
Exit -> Close HCI Lab ----- Purpose: The clean way to quit
1.3 Exploring the Graph tool
1. At MATLAB prompt, generate and save a sine wave by typing
data = sin(0:0.1:300)';
save temp data;
2. Reselect the HCI Lab window
3. Choose Graph -> Lab Data
(A window should appear along with a new "Options" menu.)
4. Choose Options -> Plot Data -> Single File
A popup menu will ask for the file name; type temp and select "Continue"
A popup menu will show the number of data points and will ask for the range
of points; just select "Continue"
5. Repeat 4, but this time make the range be 1 to 300
6. Repeat 4, but this time choose Options -> Plot Data -> Two Files;
type temp as the name of each file; use your choice for the data
ranges -- but keep them in the legal range!
- Record the data ranges you chose and what you see.
7. Choose Options -> Exit
1.4 Exploring the Analysis -> Covariance tool
1. Reselect the HCI Lab window
2. Choose Analysis -> Covariance
3. Choose Options -> Covariance Example
(You should see a plot with red and blue data points, asterisks at the means;
note the different scales for the two axes.)
4. If the box at the bottom right does NOT say "Click for Euclidean
Distances", click on "Toggle Distance Type" at the top center
so that it does.
5. Position the cursor at a point half-way between the means and click.
(You should see the roughly equal distances; press any key to dismiss the
message.)
6. Position the cursor at the top-most red point in Cluster 1 and click.
- Record the two distances
- How would a minimum-Euclidean-distance classifier classify this point?
7. On the upper-left pulldown (saying "Hide Contours"), choose
"Euclidean Contours"; then select "Cluster 1". You should
see ellipses.
- Why aren't the contours circular?
(Reshape the window until they are roughly circular.)
8. On the upper-left pulldown (saying "Euclidean Contours"), choose
"Hide Contours".
9. On the upper-right pulldown (saying "Hide Separator"), choose
"Euclidean Separator" (You should see the decision boundary based
on Euclidean distance.)
- Estimate the percentage of points that are misclassified.
10. Repeat 9, but choose "Mahalanobis Separator"
- Estimate the percentage of points that are misclassified.
11. Click on "Toggle Distance Type" to get Mahalanobis distances.
By positioning the cursor and clicking:
- Measure the Mahalanobis distances to a point halfway between the means
- Measure the Mahalanobis distances to the uppermost point in Cluster
1
- How would a minimum-Mahalanobis-distance classifier classify each
point?
12. On the upper-left pulldown, choose "Mahalanobis Contours";
then select "Cluster 1".
- Describe the difference between the Mahalanobis contours and the Euclidean
contours.
13. Under "Options" choose "Exit"
1.4 Exploring the Cluster -> K-means tool
1. Choose Cluster -> K-means
2. Choose Options -> K-means Example 2
- Describe how you would divide the data points into two groups
3. Using the popup menu, select 2 clusters and "Continue"; watch
as the K-means procedure clusters the data
- Note whether or not the K-means procedure produced a clustering similar
to yours; describe any major differences
4. Repeat 2 and 3 using 4 clusters
- Would you call the resulting clustering "reasonable"? Comment.
5. Choose Options -> K-means Example 1
- Describe how you would divide the data points into two groups
6. Select 2 clusters and "Continue"; watch as the K-means procedure
clusters the data
- Note whether or not the K-means procedure produced a clustering similar
to yours; describe any major differences
7. Repeat 5 and 6 using 4 clusters
- Would you call the resulting clustering "reasonable"? Comment.
8. Choose Options -> Exit
9. Choose Cluster -> Nearest Neighbor
10. Choose Options -> Nearest Neighbors Example 1
11. Using the popup menu, select t=1.0 and "Continue"; watch as
the Nearest-Neighbor procedure clusters the data
- Note whether or not the Nearest-Neighbor procedure produced a clustering
similar to yours; describe any major differences
12. Repeat 10 and 11 using t=0.6
- Would you call the resulting clustering "reasonable"? Comment.
13. Choose Options -> Nearest Neighbors Example 2
14. Try all 4 values of t
- Were any of the clusterings "reasonable"?
- In general terms, describe why Nearest Neighbor is better for Example
1 and K-means is better for Example 2
15. Choose Options -> Exit, and then choose Exit -> HCI Lab.
On to Lab # 4, Part b: Synthetic Data
Up to Lab #4