Introduction

MNIST is a simple dataset, “If it doesn’t work on MNIST, it won’t work at all. Well, if it does work on MNIST, it may still fail on others”, thus we further tested the reconstruction ability on the more complex F-MNIST dataset, the results show that the AggMap is unable to restore the original images completely.

However, the pixels that are related to each other are still clustered together and form specific “patches”, For example, the strap on the “Bag” and the sleeves on the “Dress” images still has the same shape patterns with original images. Although the reconstruction ability for AggMap on F-MNIST is not perfect, in general, it still has excellent agglomeration capabilities.

[ ]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from aggmap import AggMap

[19]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data() #load data
[20]:
_, w, h = x_train.shape
orignal_cols = ['p-%s' % str((i+1)).zfill(len(str(w*h))) for i in range(w*h)]
x_train_df = pd.DataFrame(x_train.reshape(x_train.shape[0], w*h), columns=orignal_cols)
x_test_df = pd.DataFrame(x_test.reshape(x_test.shape[0], w*h), columns=orignal_cols)
[21]:
ax = plt.imshow(x_train_df.iloc[0].values.reshape(w,h))
../_images/_example_PoC_fmnist_5_0.png
[22]:
ax = plt.imshow(x_test_df.iloc[0].values.reshape(w,h))
../_images/_example_PoC_fmnist_6_0.png
[ ]:

Step1: MNIST pixel random permutation

[23]:
shuffled_cols = shuffle(orignal_cols, random_state=111)
x_train_df_shuffled = x_train_df[shuffled_cols]
x_test_df_shuffled = x_test_df[shuffled_cols]
[24]:
ax = plt.imshow(x_train_df_shuffled.iloc[0].values.reshape(w,h))
../_images/_example_PoC_fmnist_10_0.png
[25]:
ax = plt.imshow(x_test_df_shuffled.iloc[0].values.reshape(w,h))
../_images/_example_PoC_fmnist_11_0.png
[ ]:

Step2: AggMap pre-fitting on training set

[26]:
mp = AggMap(x_train_df_shuffled, metric='correlation')
mp = mp.fit(cluster_channels=1, var_thr=0, verbose=0, densmap=True)


  0%|          | 21/60000 [03:49<5:49:29,  2.86it/s]

  0%|          | 21/60000 [03:49<5:49:29,  2.86it/s]
2021-11-10 11:23:08,059 - INFO - [bidd-aggmap] - Calculating distance ...
2021-11-10 11:23:08,096 - INFO - [bidd-aggmap] - the number of process is 16
100%|##########| 306936/306936 [01:04<00:00, 4785.58it/s]
100%|##########| 306936/306936 [00:00<00:00, 2771348.63it/s]
100%|##########| 784/784 [00:01<00:00, 406.46it/s]


  0%|          | 21/60000 [04:56<5:49:29,  2.86it/s]/home/shenwanxiang/anaconda3/envs/molmap/lib/python3.6/site-packages/umap/umap_.py:1736: UserWarning: using precomputed metric; transform will be unavailable for new data and inverse_transform will be unavailable for all data
  "using precomputed metric; transform will be unavailable for new data and inverse_transform "
2021-11-10 11:24:14,519 - INFO - [bidd-aggmap] - applying hierarchical clustering to obtain group information ...


  0%|          | 21/60000 [04:59<5:49:29,  2.86it/s]
2021-11-10 11:24:18,224 - INFO - [bidd-aggmap] - Applying grid assignment of feature points, this may take several minutes(1~30 min)


  0%|          | 21/60000 [05:00<5:49:29,  2.86it/s]
2021-11-10 11:24:18,895 - INFO - [bidd-aggmap] - Finished
[ ]:

Step3: AggMap transformation on training and test test

[27]:
x_train_restructured = mp.batch_transform(x_train_df_shuffled.values[:10])
x_test_restructured = mp.batch_transform(x_test_df_shuffled.values[:10])
100%|##########| 10/10 [00:01<00:00,  5.14it/s]
100%|##########| 10/10 [00:01<00:00,  5.23it/s]
[28]:
ax = plt.imshow(x_train_restructured[0].reshape(*mp.fmap_shape))
../_images/_example_PoC_fmnist_18_0.png
[29]:
ax = plt.imshow(x_test_restructured[0].reshape(*mp.fmap_shape))
../_images/_example_PoC_fmnist_19_0.png
[ ]:
UMAP()
[18]:
from umap import UMAP
[58]:
mp = mp.fit(cluster_channels=1, var_thr=0.5, verbose=0, densmap=True, #force_approximation_algorithm=True,
            dens_lambda=1.0,
            dens_frac=0.9,
            dens_var_shift=0.5, )


  0%|          | 21/60000 [14:20<5:49:29,  2.86it/s]/home/shenwanxiang/anaconda3/envs/molmap/lib/python3.6/site-packages/umap/umap_.py:1736: UserWarning: using precomputed metric; transform will be unavailable for new data and inverse_transform will be unavailable for all data
  "using precomputed metric; transform will be unavailable for new data and inverse_transform "
2021-11-10 11:33:38,441 - INFO - [bidd-aggmap] - applying hierarchical clustering to obtain group information ...


  0%|          | 21/60000 [14:24<5:49:29,  2.86it/s]
2021-11-10 11:33:42,743 - INFO - [bidd-aggmap] - Applying grid assignment of feature points, this may take several minutes(1~30 min)


  0%|          | 21/60000 [14:25<5:49:29,  2.86it/s]
2021-11-10 11:33:43,351 - INFO - [bidd-aggmap] - Finished
[59]:
mp.plot_scatter(radius=5)


  0%|          | 21/60000 [14:25<5:49:29,  2.86it/s]

  0%|          | 21/60000 [14:25<5:49:29,  2.86it/s]
2021-11-10 11:33:43,371 - INFO - [bidd-aggmap] - generate file: ./feature points_781_correlation_umap_scatter
2021-11-10 11:33:43,386 - INFO - [bidd-aggmap] - save html file to ./feature points_781_correlation_umap_scatter
[59]:
[60]:
x_train_restructured = mp.batch_transform(x_train_df_shuffled.values[:10])
x_test_restructured = mp.batch_transform(x_test_df_shuffled.values[:10])
100%|##########| 10/10 [00:01<00:00,  5.16it/s]
100%|##########| 10/10 [00:01<00:00,  5.29it/s]
[61]:
ax = plt.imshow(x_train_restructured[0].reshape(*mp.fmap_shape))
../_images/_example_PoC_fmnist_25_0.png
[76]:
ax = plt.imshow(x_test_restructured[2].reshape(*mp.fmap_shape))
../_images/_example_PoC_fmnist_26_0.png
[62]:
ax = plt.imshow(x_test_restructured[0].reshape(*mp.fmap_shape))
../_images/_example_PoC_fmnist_27_0.png
[ ]: