Windows10でGPUが使えるPythonを環境構築する

Visual StudioでOpenCVを使う方法

データの正規化|データの前処理

SPM12の使い方メモ

Python+OpemCV|メディアンフィルタ[平滑化]

RNNをとっつきやすく紹介

statsmodelsで季節成分分解

KerasによるCNNでCIFAR-10を学習する方法

SPM12|Specify 1st-level（モデル決定）

SIR,SEIR|感染症流行過程の数理モデル

正規P-Pプロット

HAC標準誤差によるt検定

決定木回帰(DTR)のメモ

ロジスティック回帰とはPythonとsklearn

SPM12|Estimate(推定)

Nipy 活性化マップの描画のメモ

Label Encoding|前処理シリーズ

ディープラーニング

KerasでGPUを使わない方法

Python NLTKを使った英語のストップワード

自然言語処理

2020.01.14

Python NLTKを使った英語ストップワード
参考

Python NLTKを使った英語ストップワード

import nltk
import re

NLP = "Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data."

#===== クリーニング =====
def clearn(text):
    text = re.sub(r',', '', text)
    text = re.sub(r'\.', '', text)
    text = re.sub(r'\(.*?\)', '', text)
    return text

NLP = clearn(NLP)

#===== トークン化 =====
from nltk.tokenize import word_tokenize
NLP = word_tokenize(NLP)

#===== ストップワード =====
from nltk.corpus import stopwords
#nltk.download('stopwords')

stop_words = stopwords.words('english')
NLP = [word for word in NLP if word not in stop_words]

参考

タイトルとURLをコピーしました