Back to projects
2025aiML Researcher

ContextoSolver.

Reproducible NLP research framework benchmarking Word2Vec, fastText, GloVe, and SVD+PPMI on word similarity and beam-search navigation. GloVe achieves 99.8% navigation success rate.

ContextoSolver
01Overview

ContextoSolver is a reproducible experimental framework that trains static word embeddings on the text8 corpus and evaluates them two ways: intrinsic word-similarity correlation against human judgments (WordSim-353, SimLex-999), and a navigation task inspired by the word game Contexto — move from a start word to a target word through high cosine-similarity neighbors using beam search. Four embedding families are implemented with aligned hyperparameters (50-dimensional vectors, window 5, min count 5): Word2Vec (Gensim skip-gram), fastText (subword-enriched), GloVe (vendored Stanford C reference), and count-based SVD on a harmonic-weighted PPMI co-occurrence matrix. Training runs across 5 random seeds per model for variance reporting and paired statistical tests. Key finding: GloVe achieves the strongest navigation metrics (99.8% success rate, 5.0 median steps) while Word2Vec scores highest on SimLex yet fails more often on navigation — revealing a tension between lexical similarity benchmarks and geometric navigability that motivates the dual-evaluation design.

02Problem & Solution

Problem

Word embedding models are typically evaluated only on intrinsic word-similarity benchmarks, which may not reflect their practical utility for multi-hop reasoning tasks. There is no standard extrinsic benchmark for geometric navigability in embedding space.

Solution

Designed a beam-search navigation task as a computable analogue to Contexto: move from start to target through cosine-similarity neighbors. Automated paired significance tests (McNemar, Wilcoxon, permutation) formalize model comparisons across both intrinsic and extrinsic dimensions.

03Highlights
  • 01Trained 4 embedding families (Word2Vec, fastText, GloVe, SVD+PPMI) on text8 with aligned hyperparameters across 5 random seeds
  • 02Beam-search navigation task: move from start→target through cosine-similarity neighbors in embedding space
  • 03GloVe 99.8% success rate, 5.0 median steps; Word2Vec 81.3% — reveals navigation vs. intrinsic benchmark tension
  • 04Automated paired significance tests (McNemar, Wilcoxon, permutation) across all model pairs and metrics
  • 05Full reproducibility: YAML configs, hashed run IDs, multi-seed variance reporting, pytest suite
04Metrics
  • 99.8%GloVe Success Rate
  • 4Embedding Models
  • 5Random Seeds
  • 200+Trial Pairs
08Stack

backend

  • Python
  • NumPy / SciPy
  • Gensim
  • scikit-learn
  • NLTK

other

  • pandas
  • matplotlib
  • pytest
09Links