ML & AI/인공지능 (AI)

[인공지능/AI] 자연어 처리(Natural Language Processing, NLP) (1) : 기본 개념 및 파이토치(PyTorch) 소개

by newstellar 2021. 10. 4.

자연어 처리/NLP 소개


1. 원-핫(One-hot) 표현 및 문서빈도-역문서빈도(Term Frequency - Inverse Document Frequency, TF-IDF) 표현

  • Term Frequency(TF) 표현

from sklearn.feature_extraction.text import CountVectorizer

import seaborn as sns


corpus = ['Time flies like an arrow.',
          'Fruit flies like a banana.']

one_hot_vectorizer = CountVectorizer(binary=True)
one_hot = one_hot_vectorizer.fit_transform(corpus).toarray()
vocab = one_hot_vectorizer.get_feature_names()

sns.heatmap(one_hot, annot=True, cbar=False, xticklabels=vocab,
            yticklabels=['Sentence 1', 'Sentence 2'])​


