There are 2 common type of unsupervised learning settings:
Semi-supervised learning setting assumes that unlabeled data comes from exactly the same distribution as the labeled data. In Semi-supervised learning setting most of the unlabelled data belongs to one of the classes.
These two methods are most powerful in problems where we have a lot of unlabeled data, and a smaller amount of labeled data.
- Self-taught learning
- Semi-supervised learning
Semi-supervised learning setting assumes that unlabeled data comes from exactly the same distribution as the labeled data. In Semi-supervised learning setting most of the unlabelled data belongs to one of the classes.
These two methods are most powerful in problems where we have a lot of unlabeled data, and a smaller amount of labeled data.
Self-Taught Learning (Unsupervised Learning):
ReplyDeleteData: Relies solely on unlabeled data. This means the data points don't have pre-defined categories or labels associated with them.
Goal: The goal is to discover underlying patterns or structures within the unlabeled data. Techniques like clustering, dimensionality reduction, and anomaly detection are commonly used.
Example: Imagine training a model to identify patterns in customer purchase history data. The data might include items purchased, but not pre-defined categories like "electronics" or "clothing." The model might learn to group similar purchase patterns on its own.
Semi-supervised Learning:
Deep Learning Projects for Final Year
Data: Utilizes a combination of labeled data (data with pre-defined categories) and unlabeled data. However, the labeled data is typically a much smaller set compared to the unlabeled data.
Goal: The model leverages the labeled data to learn the underlying structure and relationships between features and labels. It then uses this knowledge to make predictions on the unlabeled data.
Example: Consider training a model to classify images as cats or dogs. You might have a limited dataset of labeled images (both cats and dogs), but a much larger dataset of unlabeled images. The model can learn from the labeled data and then use that knowledge to classify the unlabeled images.