References:
An organization collected a list of physicochemical test result about red wine samples from the north of Portugal. The goal is to determine the wine quality and category based on the test result of the samples.
(Reference: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.)
In the template input file of this assignment, you should be able to find two worksheets. In the worksheet named "Sample Testing Data", you can find the test result of 1599 wine samples. And the title row specifies the name of each attribute tested on the sample wine.
Note that the "Sample ID" is not part of the test result. Values in the "Sample ID" column are used to identify the samples.
When the researchers establish the model to predict the wine quality,
they use standardized data (between 0 and 1) instead of the raw testing
data. To standardize the data,
you treat data in each column as a sequence, find the maximum number (max_v)
and the minimum number (min_v) in this sequence, then for each data item
(data_v) in this sequence, the calculated result of the following formula
is the standardized data of this data item (standardized):
standardized = (data_v - min_v) / (max_v - min_v)
If the maxumum and minimum numbers are the same, assign
the standardized data as 1 to that attribute of all the samples.
The model the researchers has established is a Linear Regression Model. This model calculates the quality of each sample as the sum of the product result of each attribute's value (of the sample) times the attribute's coefficient. You can find the coefficients of the attributes in the worksheet named "Coefficients".
If the calculated quality result of a sample is 100 or above,
this wine sample is categorized as very good.
If the calculated quality result of a sample is between 50 and 100,
this wine sample is categorized as good.
If the calculated quality result of a sample is between 20 and 50,
this wine sample is categorized as average.
If the calculated quality result of a sample is below 20,
this wine sample is categorized as bad.
Extra requirement:
In order to make the main subroutine easier to understand, a divide and conquer approach must be adopted in developing your application. That is, you must develop and use procedures to complete at least the following tasks: finding the maximum number and the minimum number in a sequence, calculating the quality of a sample, and assign the category based on the sample's quality result.
Data is an asset to a company. We are supposed to leave the raw sample testing data intact always and only read values from that worksheet. Therefore, your program needs to perform the following tasks:
Because the coefficients of the attributes (in the worksheet named "Coefficients" will be used many times in the next step, your program must read the coefficient values into an array first, and then use the array data in the later calculations.
How to submit:
To submit your macro enabled solution Excel file, first login to your VIU Learn account, find the CSCI 112 course page, click on the "Assessment" drop-down menu, click on the "Assignments" item, then click on the folder named "A8". Then you can click on the "Add a File" button to browse and upload your solution file.
If you have difficulties to make the VIU Learn submission work, please email me to let me know and attach your Excel solution file with your email.