- The iris flower data is available in a spreadsheet. The spreadsheet is most useful for manipulating the data and creating sets to be plotted. You can create the histograms in the spreadsheet, but most of the data visualization will be easier with Desmos. You can copy-paste columns or pairs of columns into Desmos.
- Familiarize with the data.
- Split your data into training and testing sets by randomly choosing five samples from each iris species to set aside in second sheet.
- For each of the four features, create a histogram to get a sense of the distributions. How distinguishing is each feature?
- In Desmos, create a scatter plot of the Petal Length versus the Sepal Length for all three species. The three species should be in three separate tables so that you can distinguish them by color.
- Using your scatter plot, define classification boundaries. These boundaries should be equations that partition the points into their respective classes (as much as possible).
- Define a classifier (with your equations and possibly if/else statements). Your classifier should take four values corresponding to the four features, and return a species of iris.
- Only after you are confident in your final classifier, validate your classifier by tesing it on the testing data that was set aside at the start.
- Read How Eugenics Shaped Stastistics. As you read, consider the following:
- Pearson and others sought to legitimize their bigotry as scientific. What biases can influence science and how do these biases emerge and propagate?
- How is viewing data as purely objective a technical and ethical pitfall?
- Particularly if you are in or have taken a statistics course, do you recognize any of the statistics mentioned?
- Recently, there were multiple reports of swastikas carved into surfaces in our school. Bigotry will always resurface. What are factors in education that can help future scientists avoid both the ethical and technical pitfalls of previous generations?