Seeking the soul of words. The mathematical world of language, from Word2Vec to Language Models and Neural Networks.
Word Embeddings · Language Models · Neural Networks · Attention
Use Spacebar or Arrow Keys to navigate or scroll
Transforming discrete words into continuous, dense, multi-dimensional floating-point vectors (Digital DNA).
Rule: *"You shall know a word by the company it keeps."* (J.R. Firth, popularized by Mikolov in 2013)
vector("King") - vector("Man") + vector("Woman") ≈ vector("Queen")
(Understanding word relationships using Numpy arrays)
import numpy as np
# Hypothetical multi-dimensional word embeddings
# Dimensions: [Royalty_Score, Masculinity_Score]
vec_king = np.array([0.9, 0.9])
vec_man = np.array([0.0, 0.9])
vec_woman = np.array([0.0, -0.9])
# The Word2Vec Magic equation
vec_result = vec_king - vec_man + vec_woman
print("Computed Vector (King - Man + Woman):")
print(vec_result)
# Ideal vector for Queen = [Royalty, Femininity]
vec_queen = np.array([0.9, -0.9])
print("\nTarget Vector (Queen):")
print(vec_queen)
print("\n💡 The algebraic operation perfectly arrives at the semantic concept of 'Queen'!")
(The mathematical backbone of semantic search and NLP distance metrics)
import numpy as np
from numpy.linalg import norm
def cosine_similarity(vec1, vec2):
"""Calculates cosine of angle between vectors. Range: -1 to 1"""
return np.dot(vec1, vec2) / (norm(vec1) * norm(vec2))
# Hypothetical embeddings
vec_king = np.array([0.9, 0.8, 0.1])
vec_queen = np.array([0.9, -0.8, 0.1])
vec_man = np.array([0.1, 0.8, 0.0])
vec_apple = np.array([0.0, 0.0, 0.9])
print("Cosine Similarity Scores:")
print("-" * 30)
print(f"King & Queen : {cosine_similarity(vec_king, vec_queen):.3f}")
print(f"King & Man : {cosine_similarity(vec_king, vec_man):.3f}")
print(f"King & Apple : {cosine_similarity(vec_king, vec_apple):.3f}")
print("\nConclusion: King and Queen share high semantic alignment.")
print("King and Apple are completely orthogonal (unrelated).")
The ultimate solution to the Word2Vec "Static" limitation.
↓ Scroll down to continue ↓
Markov Chains: Predicting the next state based exclusively on the current state, ignoring all previous history.
P(next word | current word)
Example: "The cat sat on the [___]".
The model looks only at the word 'the' and checks its statistical history to see how often 'mat' or 'chair' followed it.
A massive lookup table storing the probability distribution of shifting from one specific word to another.
P(word₂ | word₁) = Count(word₁, word₂) / Count(word₁)
*Historical Note: Andrey Markov originally developed this by analyzing the consonant/vowel distribution in Alexander Pushkin's poetry.
(Building a Transition Matrix using NLTK N-grams)
from nltk import ngrams, ConditionalFreqDist
text = "I love AI . I love Python . AI is amazing .".split()
# Extract Bigrams (Pairs of consecutive words)
bigrams = list(ngrams(text, 2))
print("Sample Bigrams:", bigrams[:4], "...\n")
# Build Conditional Frequency Distribution (Transition Matrix)
cfd = ConditionalFreqDist(bigrams)
print("Transition Matrix for 'AI':", dict(cfd['AI']))
print("Transition Matrix for 'love':", dict(cfd['love']))
# Next Word Prediction
current_word = "I"
# Max() returns the most statistically probable next word
next_word = cfd[current_word].max()
print(f"\nPrediction: After the word '{current_word}', the model predicts '{next_word}'.")
To predict the next word, the model uses a sliding window to look at the two preceding words (Trigrams) instead of just one.
The computational study of extracting subjective information, identifying whether the underlying emotional tone of a text is Positive, Negative, or Neutral.
(Implementing VADER - Valence Aware Dictionary and sEntiment Reasoner)
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
# VADER is highly optimized for social media text and microblogs
analyzer = SentimentIntensityAnalyzer()
text = "I absolutely love this NLP course, it is incredibly engaging! But the homework is awful."
# Generate polarity scores
scores = analyzer.polarity_scores(text)
print("Input Text:", text)
print("\nRaw Scoring Metrics:", scores)
# The 'compound' score is a normalized, weighted composite score
compound = scores['compound']
if compound >= 0.05:
print("\nOverall Sentiment: Positive 😊")
elif compound <= -0.05:
print("\nOverall Sentiment: Negative 😠")
else:
print("\nOverall Sentiment: Neutral 😐")
The Need for Sequence: How do we differentiate grammatical structures over time?
↓ Scroll down to continue ↓
An advanced RNN architecture specifically engineered to carry long-term dependencies across vast sequences.
How do we capture the "essence" of one language and generate an entirely new sequence? (e.g., Machine Translation, Summarization).
A mathematical mechanism allowing a model to calculate how strongly every word in a sequence relates to every other word simultaneously.
"The animal didn't cross the street because it was too tired."
How does the machine know what 'it' refers to? The street or the animal? Self-Attention assigns massive mathematical weight between 'it' and 'animal', instantly resolving the coreference.
(Using Matrix Multiplication to calculate focus weightings)
import numpy as np
# Sequence: "Bank", "of", "River"
words = ["Bank", "of", "River"]
# Simplified Embeddings: [Water_Feature, Financial_Feature]
vectors = np.array([
[0.9, 0.1], # Bank (Assuming riverbank context here)
[0.1, 0.1], # of
[0.8, 0.2] # River
])
# Self-Attention Formula core: Q x K^T (Query matrix dot Key matrix)
attention_scores = np.dot(vectors, vectors.T)
print("Raw Attention Scores for 'Bank' against all words:")
print(np.round(attention_scores[0], 2))
# Identify which context word 'Bank' pays the most attention to
highest_attention_idx = np.argmax(attention_scores[0][1:]) + 1
print(f"\nThe word 'Bank' attends most strongly to: '{words[highest_attention_idx]}'")
print("💡 The model successfully contextualizes 'Bank' as a body of water!")
By discarding slow RNN architecture entirely and relying solely on Attention mechanisms ("Attention Is All You Need", 2017), the Transformer unlocked massive parallel processing capabilities.
① Data Processing
② Feature Engineering
③ Semantic Representation
④ Deep Learning Architectures
💡 Key Takeaway: There is no ML without clean data. There is no comprehension without embeddings. Modern AI is the culmination of this entire pipeline.