MIS 655 Grand Canyon University Supervised and Unsupervised Tasks Responses
Discussion 1: Tyler
Supervised learning typically is categorized by its use of labeled datasets to train (supervise) the given algorithm to be able to classify data or predict outcomes (typically through regression) more accurately. Unsupervised learning approaches utilize machine learning algorithms to analyze and cluster unlabeled data (Delua, 2021). These can be great because they can often find patterns within data without being trained by the programmer to look for those patterns. The typical algorithms used in unsupervised learning include clustering, association, and dimensionality reduction.
Supervised learning algorithms learn from training that you set up within the data based on the results of past data to predict the outcome of future data in the case of regression, for example. These algorithms are often more accurate than unsupervised learning models but do require a lot more upfront knowledge and information about your data and often require more time and resources to reach the level of accuracy that you’re looking to achieve. Simply put, you need to train your algorithm to work with the data and this can be more complicated and time consuming but may end up being worth the investment (Delua, 2021). We can use exoplanetary detection and classification as an example. NASA has collected data on exoplanets for a while and has positively classified what they believe to be exoplanets. This data can be used with supervised machine learning algorithms to detect if a newly found object shares enough characteristics with positively identified exoplanets to be considered an exoplanet. This can be difficult considering there is still a foundation of knowledge that one should build to ensure that you have all the necessary data included to the point where this exoplanet could actually be classified.
Unsupervised machine learning models do often require much less human intervention up front. However, this is not to say that you can create this type of model and just pass it’s results on directly to your stock holders and call it a day. These models still require a fair amount of human intervention to validate the outputs they receive from the algorithm. Unsupervised machine learning models are often used to gain insight into large volumes of new data since the algorithm will tell you what interesting patterns it finds within the data without you having to take the time to find those relationships. The downside to these models is that you can end up receiving wildly inaccurate results so you do have to consider if the results make intuitive sense. For example, you may find a correlation between the purchasing data of rubber bands and car tires. Intuitively, it may not make a lot of sense to group these two products together when making business decisions whereas a correlation in the purchasing of milk and cereal together might make a lot more sense. Since the results can be very swingy with these models, one must always question the results they output.
References
Delua, J. (2021, March 12). Supervised vs. unsupervised learning: What’s the difference? IBM. Retrieved October 22, 2021, from https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning.
Discussion 2: Arcelia
Supervised and unsupervised learning are two types of data analysis conducted in machine learning (Analytics Vidhya, 2021). The key difference between the two is the type of data (labeled vs. unlabeled) used in the machine learning process. In supervised learning, labeled datasets are used to train algorithms that classify and predict outcomes by using methods like regression and decision trees. As noted by IBM Cloud Education (2020), this type of machine learning is often used in the real world to solve problems by classifying large scales of data. For example, supervised machine learning could be used to train an algorithm that classifies email coming in as spam which is then acted upon (labeled, placed in a different folder, deleted, etc.).
In unsupervised learning, unlabeled data is used to train the algorithm. In this case, the algorithm “discovers patterns that help solve for clustering or association problems” (IBM Cloud Education, 2020, Unsupervised section). Unsupervised learning is especially useful when patterns are not easily surmised and must be discovered (Analytics Vidhya, 2021). Methods used in unsupervised learning include Gaussian mixture, hierarchical clustering, and k-means models. Examples of unsupervised learning using hierarchical clustering include market segmentation, where consumers may be segmented into clusters based on demographics, behaviors, and geographic information. The results from this analysis may then be used by businesses to conduct targeted marketing campaigns catered to the identified groups.
References
Analytics Vidhya. (2021, April 28). Clustering in R: Beginner’s guide to clustering in R. Retrieved October 19, 2021, from https://www.analyticsvidhya.com/blog/2021/04/beginners-guide-to-clustering-in-r-program/
IBM Cloud Education. (2020, August 19). Supervised learning. IBM. Retrieved October 19, 2021, from https://www.ibm.com/cloud/learn/supervised-learning
Discussion 3: Cornilus
The techniques of data mining come in two main forms: supervised and unsupervised. Supervised is a predictive technique whereas unsupervised is a descriptive technique. Although both the algorithms are widely used to accomplish different data mining tasks, it is important to understand the difference between the two.
Harsh Gupta, Founder at proton AutoML (2019-present) Supervised learning is typically done in the context of classification when we want to map the input to output labels, or regression when we want to map the input to continuous output. Common algorithms in supervised learning include logistic regression, naive Bayes, support vector machines, artificial neural networks, and random forests. When conducting supervised learning, the main considerations are model complexity and the bias-variance tradeoff. Note that both of these are interrelated.
Unsupervised learning algorithms, such as K-means clustering, assume that the input dataset is distributed according to some unknown underlying statistical distribution. The goal of unsupervised learning algorithms is to learn the true nature of this distribution. It attempts to find clusters or groups in the data without any labels for what constitutes a “cluster” or “group”. Unsupervised learning is also very useful in exploratory analysis because it can automatically identify structure in data. In situations where it is either impossible for a human to propose trends in the data, unsupervised learning can provide initial insights that can then be used to test individual hypotheses.
Supervised technique attempts to identify casual relationships between dependent and independent variables, isolate the degree of correlation for each set of variables, and develop a model showing the web of dependencies. The model is then applied to data for which the target value is unknown. Unsupervised learning seeks to identify unknown patterns in a data set with no predetermined labels and with no or minimal human supervision. The goal of unsupervised data mining techniques is to find patterns in data set based on the relationship between data points themselves.
Scalability is one of the major issues with mining large data sets and it is not practical to parse the entire data set more than once. Supervised data mining tends to be highly scalable, meaning it can handle huge volumes of data in time frames that do not increase unreasonably, and it is generally fast. Unsupervised learning methods, on the other hand, often raise several issues when it comes to scalability if some sort of parallel evaluation is not used, and unlike supervised learning, it is relatively slow, but it can converge toward multiple sets of solution states.
References
https://www.quora.com/What-is-the-difference-between-supervised-and-unsupervised-learning-algorithms