Introduction

In this example, we will introduce the example of AggMap in feature restructuring on Randomized MNIST data. Specifically, we first permute the pixel randomly to generate the unordered data, this will destroy the structured MNIST images into randomly permuted pixels, then we used AggMap to reconstruct from these random pixels.

To randomize the MNISt data, we first reshaped the 28x28 pixels into 684 feature points (FPs) and permuted these 684 FPs randomly, and then reshaped them into the new shuffled 28x28 images. The random permuted images OrgRP1 have destroyed the spatial correlation of the original images totally.

After the unsupervised learning from these randomized MNIST data by AggMap, the disrupted MNIST images can been reconstructed to the very structured images and very similar to the original images. Moreover, the reconstruction ability of AggMap is related to the number of randomized samples for unsupervised pre-learning. You can try to use different number of the randomized samples to fit AggMap and to transform the randomized data, and see what happens.

Step0: Import AggMap and Orignal MNIST data

[2]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from aggmap import AggMap

[3]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() #load data
[4]:
_, w, h = x_train.shape
orignal_cols = ['p-%s' % str((i+1)).zfill(len(str(w*h))) for i in range(w*h)]
x_train_df = pd.DataFrame(x_train.reshape(x_train.shape[0], w*h), columns=orignal_cols)
x_test_df = pd.DataFrame(x_test.reshape(x_test.shape[0], w*h), columns=orignal_cols)
[5]:
ax = plt.imshow(x_train_df.iloc[0].values.reshape(w,h))
../_images/_example_PoC_mnist_6_0.png
[6]:
ax = plt.imshow(x_test_df.iloc[0].values.reshape(w,h))
../_images/_example_PoC_mnist_7_0.png
[ ]:

Step1: MNIST pixel random permutation

[7]:
shuffled_cols = shuffle(orignal_cols, random_state=111)
x_train_df_shuffled = x_train_df[shuffled_cols]
x_test_df_shuffled = x_test_df[shuffled_cols]
[8]:
ax = plt.imshow(x_train_df_shuffled.iloc[0].values.reshape(w,h))
../_images/_example_PoC_mnist_11_0.png
[9]:
ax = plt.imshow(x_test_df_shuffled.iloc[0].values.reshape(w,h))
../_images/_example_PoC_mnist_12_0.png
[ ]:

Step2: AggMap pre-fitting on training set

[10]:
mp = AggMap(x_train_df_shuffled, metric='correlation')
mp = mp.fit(cluster_channels=1, var_thr=0, verbose=0)
2022-08-01 15:00:08,013 - INFO - [bidd-aggmap] - Calculating distance ...
2022-08-01 15:00:08,041 - INFO - [bidd-aggmap] - the number of process is 16
100%|#######################################################################################################################################| 306936/306936 [00:35<00:00, 8665.40it/s]
100%|####################################################################################################################################| 306936/306936 [00:00<00:00, 5186856.19it/s]
100%|##############################################################################################################################################| 784/784 [00:00<00:00, 961.65it/s]
2022-08-01 15:00:44,524 - INFO - [bidd-aggmap] - applying hierarchical clustering to obtain group information ...
2022-08-01 15:00:46,360 - INFO - [bidd-aggmap] - Applying grid assignment of feature points, this may take several minutes(1~30 min)
2022-08-01 15:00:46,799 - INFO - [bidd-aggmap] - Finished
[ ]:

Step3: AggMap transformation on training and test test

[11]:
x_train_restructured = mp.batch_transform(x_train_df_shuffled.values)
x_test_restructured = mp.batch_transform(x_test_df_shuffled.values)
100%|#########################################################################################################################################| 60000/60000 [00:10<00:00, 5493.30it/s]
100%|#########################################################################################################################################| 10000/10000 [00:01<00:00, 7557.54it/s]
[12]:
ax = plt.imshow(x_train_restructured[0].reshape(*mp.fmap_shape))
../_images/_example_PoC_mnist_19_0.png
[13]:
ax = plt.imshow(x_test_restructured[0].reshape(*mp.fmap_shape))
../_images/_example_PoC_mnist_20_0.png
[14]:
ax = plt.imshow(x_test_restructured[1].reshape(*mp.fmap_shape))
../_images/_example_PoC_mnist_21_0.png
[15]:
ax = plt.imshow(x_test_restructured[2].reshape(*mp.fmap_shape))
../_images/_example_PoC_mnist_22_0.png
[ ]:

Step4: AggMap visualization

In the scatter and grid plot, we will get the final optimized position for each pixel that is in arbitrary order

[20]:
# the scatter plot
mp.plot_scatter(radius = 6, enabled_data_labels=True)
2022-08-01 15:04:55,078 - INFO - [bidd-aggmap] - generate file: ./feature points_717_correlation_umap_scatter
2022-08-01 15:04:55,084 - INFO - [bidd-aggmap] - save html file to ./feature points_717_correlation_umap_scatter
[20]:
[21]:
# the regular 2D grid plot
mp.plot_grid(enabled_data_labels=True)
2022-08-01 15:05:06,872 - INFO - [bidd-aggmap] - generate file: ./feature points_717_correlation_umap_mp
2022-08-01 15:05:06,885 - INFO - [bidd-aggmap] - save html file to ./feature points_717_correlation_umap_mp
[21]:
[ ]:

[ ]: