This tool allows you to train and visualize a Self-Organizing Map (SOM) based on the data and parameters you provide. A SOM is a type of neural network used for dimensionality reduction and clustering, creating a (typically 2D) map where similar input data points activate neurons that are close to each other on the map.
Sets the number of neurons (cells) horizontally and vertically in the SOM grid.
A larger width creates a wider map.
Defines the total number of times the entire input dataset will be used for training.
One epoch means every input vector has been presented to the SOM once.
More epochs generally lead to a more refined map but increase training time.
Controls how much the neuron's weight vectors are adjusted during each update early in the training.
A higher value means larger initial adjustments.
This rate decreases automatically over the epochs, allowing for finer adjustments later on.
Typical values are between 0.1 and 1.0.
What it is: This parameter defines the initial size of the "neighborhood" around the Best Matching Unit (BMU) on the SOM grid. The BMU is the neuron whose weight vector is closest to the current input data vector.
How it works: When an input vector is presented, and its BMU is found, not only the BMU's weights are updated, but also the weights of the neurons within its neighborhood. The Initial Radius sets how large this neighborhood is at the beginning of the training (Epoch 0).
Why it matters: A larger initial radius means that at the start, many neurons surrounding the BMU are influenced by a single input vector. This encourages neurons across a wide area of the map to learn similar features, promoting the global ordering of the map. It helps the SOM form its large-scale structure first.
Dynamic Behavior: Crucially, this radius shrinks over the training epochs. It starts large (based on Initial Radius) and gradually decreases towards zero (or a very small value). This shrinking allows the map to transition from organizing its overall structure to refining the local details and creating finer distinctions between neighboring neurons later in the training.
Effect:
Too large: It might smooth the map too much initially, potentially losing detail (the tool may automatically adjust it if it's larger than half the grid size).
Too small: It might prevent the map from ordering globally, potentially resulting in disconnected clusters.
Guideline: A common starting point is a value around half the width or height of the grid, but experimentation is often needed.
Selects the mathematical method used to calculate the "distance" or "similarity" between an input data vector and each neuron's weight vector. This determines which neuron becomes the BMU.
Euclidean: The standard straight-line distance in multi-dimensional space (sqrt(sum((v1_i - v2_i)^2))). Good for general-purpose clustering where the magnitude of vector components matters.
Informational*: A distance based on information theory concepts (entropy, mutual information) derived from component histograms. May capture different types of relationships, potentially useful if correlations or distributions within vectors are more important than absolute values.
Cosine: Calculates distance based on the angle between vectors (1 - Cosine Similarity). It ignores vector magnitude and focuses only on orientation. Useful for high-dimensional data like text document vectors, where vector length can vary greatly but direction indicates topic similarity.
*Informational Distance:
Add reference
Key Concepts Used
Entropy (H): Measures the "uncertainty" or "surprise" associated with a random variable or probability distribution. Zero entropy means no uncertainty (always the same value); high entropy means high uncertainty (many equally likely values). In a vector context, we can think about the entropy of the distribution of its component values.
Mutual Information (I): Measures how much information one random variable contains about another. How much does knowing X reduce uncertainty about Y? Zero if independent.
Conditional Entropy (H(Y|X)): Measures the uncertainty remaining about Y after observing X. H(Y|X) = H(X, Y) - H(X), where H(X, Y) is the joint entropy (uncertainty of the X,Y pair).
Core Calculation: Instead of operating directly on the raw values of vectors vec1 and vec2, this method first constructs a 2D histogram.
2D Histogram: Imagine plotting each corresponding component pair (vec1[i], vec2[i]) as a point on a 2D plane. The 2D histogram divides this plane into a grid of "bins" (like squares; the code uses 5x5 bins) and counts how many points fall into each bin.
What it Represents: This histogram captures the joint distribution of the component values from the two vectors. It shows how values in vec1 tend to pair up with values in vec2. For example:
If high values in vec1 tend to occur with high values in vec2, bins along the main diagonal will be highly populated.
If high values in vec1 tend to occur with low values in vec2, bins along the anti-diagonal will be populated.
If there's no relationship, points will be scattered more randomly across the bins.
Derivation: From this 2D histogram (normalized to represent joint probabilities P(X, Y)), marginal probabilities (P(X), P(Y)), and subsequently entropies (H(X), H(Y), H(X, Y)) and conditional entropies (H(X|Y), H(Y|X)) can be calculated.
The Informational Distance (in this specific code): H(X|Y) + H(Y|X)
This metric is related to "Variation of Information".
It measures the sum of information that is not shared between the two vectors.
H(X|Y): How much uncertainty remains about vec1 even after knowing vec2?
H(Y|X): How much uncertainty remains about vec2 even after knowing vec1?
If the two vectors are strongly related (predictable from each other based on their joint histogram), H(X|Y) and H(Y|X) will be low, resulting in a small distance.
If the vectors are independent or chaotically related, H(X|Y) will be close to H(X), H(Y|X) close to H(Y), and the distance will be large.
May capture different types of relationships, potentially useful if correlations or distributions within vectors are more important than absolute values.
Different from Euclidean: Euclidean distance focuses on the absolute difference between component values. Two vectors can have very different values but follow a similar pattern.
Pattern Capture: Informational distance, being based on the joint histogram, is more sensitive to patterns of co-occurrence and the shape of the value distributions. It can detect correlations (linear or non-linear) or structural relationships between vector components, even if their absolute values differ significantly.
Example:
Consider vecA = (0.1, 0.2, 0.3), vecB = (0.6, 0.7, 0.8) (which is vecA + 0.5), and vecC = (0.8, 0.7, 0.6) (inversely related to vecB). Euclidean distance finds A and B closer than B and C. Informational distance, however, might find the predictable relationship between B and C (clear pattern on the histogram, low conditional entropy) results in a relatively small distance. The metric seeks this predictability in the relationship.
Use Cases: This might be useful when the absolute magnitude isn't the main focus but rather how values vary together. Examples include time series analysis (do they rise and fall together?), biological data (correlated gene expression), or any data where distribution shape or interdependence is key.
Summary: Informational distance transforms vectors into a 2D histogram reflecting how their values pair up, then uses information theory concepts (entropy) to measure how "predictable" or "structured" that relationship is. A strong, predictable relationship (regardless of absolute values) results in a small distance.
The text area where you provide your input data.
Each line should represent one data vector.
Values within a vector should be separated by commas (CSV format).
All vectors must have the same number of dimensions (the same number of comma-separated values per line).
Note: The visualization uses the first 3 dimensions of the neuron weights to create an RGB color. If your data has more or fewer than 3 dimensions, the coloring will adapt (e.g., using the first dimension for grayscale if 1D, or ignoring dimensions beyond the 3rd).
Normalization Note: Normalization is crucial for the SOM to work effectively, especially with data features having vastly different numerical ranges. The code automatically normalizes your input data (scaling each feature to a [0, 1] range) after parsing and before training begins. However, it is still recommended that you normalize your data if the data of your columns have vastly different numerical ranges.