Unsupervised Sequence Discovery via Directed Optimal Transport

This repository contains a preliminary prototype for the Markov-Transport Correlation Coefficient (MTCC).

Abstract

While current representation learning frameworks (like OT-CPCC) excel at embedding symmetric hierarchies, sequential data exhibits strict directional asymmetry. This project proposes a novel information-geometric metric that correlates latent Wasserstein distances between contextual distributions with their empirical Markovian transition costs.

Key Result

In testing on the undeciphered Indus script, the latent space of a standard LSTM exhibited a directed OT-correlation of 0.2211 with a highly significant p-value of 8.08e-06. This suggests that while networks implicitly learn grammatical flow, they require explicit OT regularization (e.g., FastFT) to maximize this alignment.

The repository includes indus_dataset_anonymized.json, a representative sample of the 6,000+ inscription corpus I curated for this research. While the results reported in my proposal were derived from the full dataset, this sample is provided to demonstrate the data structure and the reproducibility of the MTCC pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data1		data1
manuscripts		manuscripts
notebooks		notebooks
README.md		README.md
data		data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Sequence Discovery via Directed Optimal Transport

Abstract

Key Result

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Sequence Discovery via Directed Optimal Transport

Abstract

Key Result

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages