2025aiML Researcher

ContextoSolver.

Reproducible NLP research framework benchmarking Word2Vec, fastText, GloVe, and SVD+PPMI on word similarity and beam-search navigation. GloVe achieves 99.8% navigation success rate.

01Overview

ContextoSolver is a reproducible experimental framework that trains static word embeddings on the text8 corpus and evaluates them two ways: intrinsic word-similarity correlation against human judgments (WordSim-353, SimLex-999), and a navigation task inspired by the word game Contexto — move from a start word to a target word through high cosine-similarity neighbors using beam search. Four embedding families are implemented with aligned hyperparameters (50-dimensional vectors, window 5, min count 5): Word2Vec (Gensim skip-gram), fastText (subword-enriched), GloVe (vendored Stanford C reference), and count-based SVD on a harmonic-weighted PPMI co-occurrence matrix. Training runs across 5 random seeds per model for variance reporting and paired statistical tests. Key finding: GloVe achieves the strongest navigation metrics (99.8% success rate, 5.0 median steps) while Word2Vec scores highest on SimLex yet fails more often on navigation — revealing a tension between lexical similarity benchmarks and geometric navigability that motivates the dual-evaluation design.

02Problem & Solution

Problem

Word embedding models are typically evaluated only on intrinsic word-similarity benchmarks, which may not reflect their practical utility for multi-hop reasoning tasks. There is no standard extrinsic benchmark for geometric navigability in embedding space.

Solution

Designed a beam-search navigation task as a computable analogue to Contexto: move from start to target through cosine-similarity neighbors. Automated paired significance tests (McNemar, Wilcoxon, permutation) formalize model comparisons across both intrinsic and extrinsic dimensions.

03Highlights

01Trained 4 embedding families (Word2Vec, fastText, GloVe, SVD+PPMI) on text8 with aligned hyperparameters across 5 random seeds
02Beam-search navigation task: move from start→target through cosine-similarity neighbors in embedding space
03GloVe 99.8% success rate, 5.0 median steps; Word2Vec 81.3% — reveals navigation vs. intrinsic benchmark tension
04Automated paired significance tests (McNemar, Wilcoxon, permutation) across all model pairs and metrics
05Full reproducibility: YAML configs, hashed run IDs, multi-seed variance reporting, pytest suite

04Metrics

99.8%GloVe Success Rate
4Embedding Models
5Random Seeds
200+Trial Pairs

08Stack

backend

Python
NumPy / SciPy
Gensim
scikit-learn
NLTK

other

pandas
matplotlib
pytest

09Links

Source on GitHub