반응형
자연어 처리/NLP 소개
1. 원-핫(One-hot) 표현 및 문서빈도-역문서빈도(Term Frequency - Inverse Document Frequency, TF-IDF) 표현
- Term Frequency(TF) 표현
from sklearn.feature_extraction.text import CountVectorizer
import seaborn as sns
corpus = ['Time flies like an arrow.',
'Fruit flies like a banana.']
one_hot_vectorizer = CountVectorizer(binary=True)
one_hot = one_hot_vectorizer.fit_transform(corpus).toarray()
vocab = one_hot_vectorizer.get_feature_names()
sns.heatmap(one_hot, annot=True, cbar=False, xticklabels=vocab,
yticklabels=['Sentence 1', 'Sentence 2'])
반응형
댓글