TypeError: No matching overloads found for kr.lucypark.okt.OktInterface.tokenize(float,java.lang.Boolean,java.lang.Boolean) konlpy, 형태소 분석 오류

데이터 분석/Python 2021. 11. 13. 15:50

csv로 판다스 데이터 프레임 형태의 데이터를 불러와

형태소 분석기를 돌리고 TDM, 즉 빈도수 높은 순의 1000개의 형태소 단어를 출력하려고 했다.

먼저 데이터를 불러오고 2글자 이상의 형태소 단어를 출력한다.

df = pd.read_csv('C:\\Users\\consultation_preprocessing_ver0.3.csv', encoding='utf8')
df

cxt = df['CONTENT']
tagger = Twitter()

# 2글자 이상인 명사만 추출

def kor_morphs(text):
    words = []
    for w in tagger.morphs(text):
        if len(w) > 1:
            words.append(w)
    return words

토크나이징을 생성하여 cxt data를 tdm으로 만드려고 하니까 아래 오류가 난다

from sklearn.feature_extraction.text import CountVectorizer

# 형태소 토크나이징 생성
cv = CountVectorizer(tokenizer=kor_morphs, max_features=1000, stop_words=stopwords )

start = time.time()  # 시작 시간 저장

tdm = cv.fit_transform(cxt)
# 전체 data 한꺼번에 input
print("time :", time.time() - start)

TypeError: No matching overloads found for kr.lucypark.okt.OktInterface.tokenize(float,java.lang.Boolean,java.lang.Boolean), options are:
public java.util.List kr.lucypark.okt.OktInterface.tokenize(java.lang.String,java.lang.Boolean,java.lang.Boolean

해결 방법

consultation_preprocessing_ver0.3.csv 파일의 CONTENT 칼럼 열에 Nan 값이 있었다.

Nan 값을 제거하고 다시 코드를 실행하니 오류가 해결됐다.

'데이터 분석 > Python' 카테고리의 다른 글

쇼핑몰 주문 업체별 주문파일 생성하기, 대량 이메일 발송 업무 자동화 프로그램 파이썬 개발, Gui, PyQT5, QT designer, pyinstaller exe 실행 파일 만들기 디버깅 error 해결, pyinstaller exe파일 용량 줄이는.. (0)	2022.02.10
소스트리 깃 commit 이전으로 되돌리기 - 용량 큰 파일 push 오류 날 때 커밋 삭제하는 법 (0)	2021.12.08
Kolypy 실행 오류 SystemError: java.nio.file.InvalidPathException: Illegal char <*> at index 71: (0)	2021.10.09
AttributeError: module 'tweepy' has no attribute 'StreamListener' (0)	2021.10.09
파이참 bs4 Installing packages failed: Installing packages: error occurred. (0)	2021.07.21

ABOUT ME

Hunt for Data Hunt for Data

'데이터 분석 > Python' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'데이터 분석 > Python' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바