Popular Posts

Sunday, March 4, 2012

Difference between Self-taught learning and semi-supervised learning settings

There are 2 common type of unsupervised learning settings:
  • Self-taught learning
  • Semi-supervised learning
   Self-taught learning setting is more versatile, broadly applicable and does not assume that your unlabeled data has to be drawn from the same distribution as your labeled data. In Self-taught learning setting it is not necessary that most of the unlabelled data belongs to at least one class, it may happen that appreciable amount of data does not belong to any class. 

   Semi-supervised learning setting assumes that unlabeled data comes from exactly the same distribution as the labeled data. In  Semi-supervised learning setting most of the unlabelled data belongs to one of the classes.

These two methods are most powerful in problems where we have a lot of unlabeled data, and a smaller amount of labeled data.

1 comment:

  1. Self-Taught Learning (Unsupervised Learning):

    Data: Relies solely on unlabeled data. This means the data points don't have pre-defined categories or labels associated with them.
    Goal: The goal is to discover underlying patterns or structures within the unlabeled data. Techniques like clustering, dimensionality reduction, and anomaly detection are commonly used.
    Example: Imagine training a model to identify patterns in customer purchase history data. The data might include items purchased, but not pre-defined categories like "electronics" or "clothing." The model might learn to group similar purchase patterns on its own.
    Semi-supervised Learning:

    Deep Learning Projects for Final Year



    Data: Utilizes a combination of labeled data (data with pre-defined categories) and unlabeled data. However, the labeled data is typically a much smaller set compared to the unlabeled data.
    Goal: The model leverages the labeled data to learn the underlying structure and relationships between features and labels. It then uses this knowledge to make predictions on the unlabeled data.
    Example: Consider training a model to classify images as cats or dogs. You might have a limited dataset of labeled images (both cats and dogs), but a much larger dataset of unlabeled images. The model can learn from the labeled data and then use that knowledge to classify the unlabeled images.

    ReplyDelete