Cloud data storage is a service where data is maintained, managed and backed up remotely. The service allows users to store files online so they can access them from anywhere via the Internet. Cloud computing and many users expect cloud computing to reshape computing processes. The huge amount of data is stored in the cloud and needs to be retrieved efficiently. Retrieving information from the cloud takes a long time as the data is not stored in an organized manner. Data mining is therefore important in cloud computing. We can integrate data mining and cloud computing (Integrated Data Mining and Cloud Computing – IDMC) which will provide agility and rapid access to technology. With cloud computing technology, users use a variety of devices, including PCs, laptops, smartphones, and PDAs to access programs, storage, and application development platforms over the Internet, through services offered by cloud computing providers. The benefits of cloud computing technology include cost savings, high availability, and easy scalability. Therefore, in this presented work, a survey for data storage in the cloud and their cluster analysis for data usage in various business intelligence applications is introduced. This paper suggests that a new data cluster analysis model is proposed that provides clustering as a service. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay The large volume of data is stored in the cloud environment and needs to be retrieved efficiently. Retrieving information from the cloud takes a long time as the data is not stored in an organized manner. Data clustering is a technique of analyzing data and extracting meaningful patterns from raw data sets. Meaning is defined here to mean the patterns or knowledge retrieved from the training samples which is further used to identify the similar pattern that belongs to the learned pattern. In data clustering, two main types of learning techniques are observed: supervised learning technique and unsupervised learning technique. These learning models are used to evaluate the data and create a mathematical model that can be used to identify similar data patterns arrived to classify them into some predefined groups. In supervised learning technique the data are processed with their class labels and here the class labels are working as a teacher for learning the algorithm. On the other hand in unsupervised learning technique the data does not contain the labels of the classes to be used as a teacher. Therefore, using the similarity and dissimilarity of the training input samples, the data is categorized. Therefore supervised learning processes are known as data classification, and unsupervised learning techniques support cluster analysis of data. In this presented work unlabeled data is used for analysis, therefore data analysis technique is used as cluster analysis. Clustering is the unsupervised classification of input models or samples. This can use classifying observations, data elements, or feature vectors into groups. These groups are involved in data mining known as data cluster analysis. In the case of clustering, the problem is to group a given set of unlabeled patterns into meaningful clusters. In a certain sense, labels are also associated with clusters,but these category labels are data-driven; that is, they are obtained exclusively from the background of the data.Clustering technique. Clustering is a most popular data mining technique used to find useful unknown pattern from data in large repository. Clustering is the grouping of data into different clusters such that items belonging to the same cluster are more similar while items belonging to different clusters are dissimilar. Basically, clustering methods are divided into two broad categories. i) Hard Clustering ii) Soft Clustering. In Hard Clustering each document can belong to only one Cluster. Hard Clustering is also known as exclusive clustering. In Soft Clustering, the same document can belong to more than one group. It is also known as the Overlapping Cluster technique. Raw data and cluster data. This section provides an overview of the introduction of data clustering and the domain selected for study in data storage. In the next section, we learn about different types of clustering algorithms to understand the technique behind cluster analysis. Types of clustering techniques. There are a significant amount of clustering algorithms and methods available, some essential techniques are described: Partitioning method. In this clustering approach numbers of data or objects are given and k number of partitions are required from the data, but the number of partitions is such that k=n. This means that the partitioning algorithm will generate k partitions that satisfy the following condition: a. Each group has at least one object. B. Each object should be a member of exactly one group. 2. Hierarchical methods. The hierarchical method generates a hierarchical way of organizing clusters. This can be achieved using the following way: Agglomerative approach. It follows the bottom-up approach. First, generate a separate group for each data object. It then merges these groups based on their closest similarities. This process is repeated until the entire group of groups is combined into one or until the termination conditions exist. Divisive approach. It follows the top-down approach. The process starts with a single cluster containing all data objects. Then continue dividing the larger clusters into smaller clusters. This process continues until the termination condition exists. This method is inflexible, that is, once the union or division is completed, it can never be denied. Density-based methods. This technique uses the perception of density. The main design is to continue expanding the cluster until the neighborhood density reaches a certain threshold, that is, within a given cluster, the radial extent of a cluster must possess a certain number of points for each data point. Grid-based method. This method quantizes the object space into a large n. of cells that together feed a grid. The method has the following advantages: • The main advantage provided by the method is its fast processing. •The only reliability is to rely on no. of cells in object space. Model-Based Methods. In the model-based scheme, a model can be conjectured for each cluster along with that; then identify the data that best fits that model. This method provides a means to automatically reveal a set of clusters derived from standard statistics, considering outliers or noise. As a result, it creates robust clustering methods. Constraint-based method. Performs clustering based on application- or user-oriented constraints. These constraints are actually the perspective or properties of the desired clustering results. These,.
tags