LSTMのメモ[随時更新]

Pandasでtsvファイルを読み書きする方法

CNNの畳み込み演算のサイズ

カテゴリデータのPython前処理まとめ

実験デザインと血流動態(HRF)モデル

Seabornで相関行列の可視化|データの可視化

[C++]Vectorの使い方

Python

PyCharmとAnacondaを連携させる

回帰分析のt値の求め方:Pythonで実装

C++のクラス-簡単な作り方だけ-

多項式回帰について with Python

VIF統計量をPythonで計算

Grad-CAMのメモ

PyTorchで重みの確認と、畳み込み層のカーネルの可視化

scikit-learnでグリッドサーチ

JupyterNotebookでGPUのメモリを解放する方法-Windows編

SPM12|WFU_PickAtlasでROI解析

サポートベクター回帰（SVR)のメモ

Python NLTKを使った英語のストップワード

自然言語処理

2020.01.14

Python NLTKを使った英語ストップワード
参考

Python NLTKを使った英語ストップワード

import nltk
import re

NLP = "Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data."

#===== クリーニング =====
def clearn(text):
    text = re.sub(r',', '', text)
    text = re.sub(r'\.', '', text)
    text = re.sub(r'\(.*?\)', '', text)
    return text

NLP = clearn(NLP)

#===== トークン化 =====
from nltk.tokenize import word_tokenize
NLP = word_tokenize(NLP)

#===== ストップワード =====
from nltk.corpus import stopwords
#nltk.download('stopwords')

stop_words = stopwords.words('english')
NLP = [word for word in NLP if word not in stop_words]

参考

タイトルとURLをコピーしました