This repository contains a preliminary prototype for the Markov-Transport Correlation Coefficient (MTCC).
While current representation learning frameworks (like OT-CPCC) excel at embedding symmetric hierarchies, sequential data exhibits strict directional asymmetry. This project proposes a novel information-geometric metric that correlates latent Wasserstein distances between contextual distributions with their empirical Markovian transition costs.
In testing on the undeciphered Indus script, the latent space of a standard LSTM exhibited a directed OT-correlation of 0.2211 with a highly significant p-value of 8.08e-06. This suggests that while networks implicitly learn grammatical flow, they require explicit OT regularization (e.g., FastFT) to maximize this alignment.
The repository includes indus_dataset_anonymized.json, a representative sample of the 6,000+ inscription corpus I curated for this research. While the results reported in my proposal were derived from the full dataset, this sample is provided to demonstrate the data structure and the reproducibility of the MTCC pipeline.