University of the Cumberlands Week 11 Data Mining Discussion
I need support with this Computer Science question so I can learn better.
Accuracy is one of the characteristics of data. Information is supposed to be accurate and correct. If one wants to determine if the information, they are using is correct or not, one needs to ask themselves if that particular data reflects on the real-world position. Inaccurate data in any business or organization will cause problems that will hinder the organization from meeting its goals and objectives. Data should also be relevant. There must be a valid reason why the data is being collected. One should contemplate if they need that information. Data should also be up to date. This is a crucial characteristic because untimely details will lead to making wrong decisions (Hamel et al, 2017).
In the prototype-based cluster, observations carried out are assigned to the centroids and medoids. This is unlike density-based clustering, whereby the unsupervised learning methods recognize different groups in the information or data. On the other hand, graph-based clustering consists of unsupervised algorithms mainly designed to group the graph edges and vertices.
The scalable clustering algorithm is defined as identifying the same descriptions in different groups of data on a profile basis. Salable clustering algorithm involves distance metrics whereby the data points resemble various partition points (Vishwasrao & Sangaiah, 2017).
For one to choose the correct algorithm, one needs to consider the number of features. In addition to this, the accuracy of the output should be deemed when choosing the correct algorithm. One should also collect enough amount of information if they want to get reliable predictions. Linearity is also essential to consider when choosing a good algorithm. The number of parameters helps in selecting the right algorithm. Algorithms that have large numbers of parameters need many trials to get the right combination.