Thinking Notes: 12月 2018

Pythonでコレポンをやる

macでやってる

$python3 plain_mca3.py cross_table_for_mca.csv

plain_mca3.pyの中身

import sys

# mca,pandasをインポート
import mca
import pandas as pd

# csvデータ読み込む
# index_col：行のインデックスに用いる列番号。 (デフォルト: None)
df = pd.read_csv(sys.argv[1],index_col=0)

# コレスポンデンス分析
# ncol = df.shape[1]
# Benzécri補正
# mca_ben = mca.MCA(df, ncols=ncol, benzecri=False, TOL=1e-8)
mca_ben = mca.MCA(df, benzecri=False, TOL=1e-8)


# Rowsのスコア（座標）を書き出す
result_row = pd.DataFrame(mca_ben.fs_r(N=2))
result_row.index = list(df.index)
print ("Rows:")
print(result_row)
print('\n', end='')

# Columnsのスコア（座標）を書き出す
result_col = pd.DataFrame(mca_ben.fs_c(N=2))
result_col.index = list(df.columns)
print ("Columns:")
print(result_col)
print('\n', end='')



# N（成分：固有値の数）の算出:表頭と表側の少ない方から1を引いた数にする
cnt_column = len(list(df.columns))
cnt_index = len(list(df.index))

if(cnt_column >= cnt_index) :
     cnt_eigenvalue = cnt_index-1
else :
    cnt_eigenvalue = cnt_column-1


# 固有値（eigenvalue）と寄与率（explained variance of eigen vectors）
data = {'value': pd.Series(mca_ben.L),
            'ratio': mca_ben.expl_var(greenacre=False, N=cnt_eigenvalue)}
columns = ['value', 'ratio']
table2 = pd.DataFrame(data=data, columns=columns).fillna(0)
table2.index += 1
table2.loc['Σ'] = table2.sum()
table2.index.name = 'Factor'
print ("Principal inertias(eigenvalues):")
print(table2)
print('\n', end='')



# 作図用ライブラリ
import matplotlib.pyplot as plt
import matplotlib
# import random as rnd #ラベル つけるときに使用

# Jupyterの中で表示したい場合は、プログラム初頭で、%matplotlib inlineとする。
# すると、インライン表示される（しなければ、別ウインドウが開く）。
# %matplotlib inline


# グラフのサイズを指定
plt.rcParams["figure.figsize"] = [7, 7]

fig, ax = plt.subplots()

# print(matplotlib.colors.cnames) #色の確認

# 表頭をプロット
result_col.plot(0, 1, kind='scatter', ax=ax, color='C0', s=20, marker="o")
for k, v in result_col.iterrows():
    ax.annotate(k, v)

# 表側をプロット
result_row.plot(0, 1, kind='scatter', ax=ax, color='#FFA500', s=20, marker='.')
for k, v in result_row.iterrows():
    ax.annotate(k, v)

# plt.rcParams['font.family'] = 'IPAexGothic' #全体のフォントを設定
# plt.rcParams['font.size'] = 12 #フォントサイズを設定 default : 12
# plt.rcParams['xtick.labelsize'] = 10 # 横軸のフォントサイズ
# plt.rcParams['ytick.labelsize'] = 10 # 縦軸のフォントサイズ
# matplotlib.font_manager._rebuild()

# X軸Y軸の目盛線とラベル
plt.axhline(0, color='gray')
plt.axvline(0, color='gray')
plt.xlabel('Factor 1')
plt.ylabel('Factor 2')

# 任意（図の設定）
# plt.figure(figsize=(4,4)) #図の設定
# plt.rcParams["font.size"] = 10 #なにかの指定


# 図を見てみる
plt.show()

----

cross_table_for_mca.csvの中身

----

,yamamoto,instant,twice,facecare,linestone

strong,80,30,50,90,70

beautiful,20,80,60,30,50

cute,20,90,70,10,60

clever,60,20,50,70,40

big,80,10,50,90,40

smart,60,30,20,70,40

charming,30,60,80,10,90

lovely,40,90,70,50,60

fresh,10,80,40,20,30

traditional,70,10,20,50,40

----

実行結果

----

Rows:

0 1

strong -0.345675 -0.033748

beautiful 0.418075 0.078905

cute 0.584282 -0.040474

clever -0.356028 -0.006797

big -0.541367 0.010732

smart -0.401739 0.149244

charming 0.390651 -0.354082

lovely 0.244705 0.090889

fresh 0.596563 0.282539

traditional -0.550051 -0.081841

Columns:

0 1

yamamoto -0.514181 -0.033323

instant 0.624395 0.192199

twice 0.251629 -0.113170

facecare -0.523083 0.164357

linestone 0.110477 -0.198570

Principal inertias(eigenvalues):

value ratio

Factor

1 0.197552 0.856996

2 0.023801 0.103250

3 0.007040 0.030540

4 0.002124 0.009214

Σ 0.230517 1.000000

クロス表の結果がこのcsvくらいのものであれば、上記コードできちんと結果が出る。

ところがモノによってはこれではきちんと出力されない場合がある。

その場合は、以下TOLの閾値を緩めておく必要がある。

mca_ben = mca.MCA(df, benzecri=False, TOL=1e-8)

で、最終的にいくつにしたかというと、この記述自体をなくしたか、もっと桁数増やしたと思う、たぶん。。。

Thinking Notes

ページ

2018年12月29日土曜日

PythonでMCA（コレスポンデンス分析）