Skip to content
View AKAPhilipD's full-sized avatar
๐ŸŽฏ
Focusing
๐ŸŽฏ
Focusing

Block or report AKAPhilipD

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
AKAPhilipD/README.md

Hi, I'm Shihe "Philip" Dong ๐Ÿ‘‹

I am interested in deep learning, speech emotion recognition, computer vision deployment, and efficient graph learning systems.
My work focuses on building practical AI systems from model design to engineering implementation, including neural network architecture design, feature extraction, model training, and deployment on real platforms.


๐Ÿ”ฌ Research & Technical Interests

  • Speech Emotion Recognition (SER)

    • Multimodal and spatial-temporal representation learning
    • Mamba / Transformer-based sequence modeling
    • Cross-attention and feature fusion for emotional speech understanding
  • Efficient Temporal Graph Learning

    • Streaming dynamic graph training
    • Insert-delete graph update semantics
    • System-level optimization for large-scale temporal GNN training
  • Computer Vision & Edge Deployment

    • Object detection with YOLO
    • Android-based AI application deployment
    • Lightweight model conversion and inference
  • Digital Image Processing & Hardware Design

    • MATLAB-based image enhancement, filtering and histogram processing
    • FPGA / Verilog-based VGA display and interactive game design

๐Ÿš€ Highlighted Work

๐ŸŽ™๏ธ CMTNet for Speech Emotion Recognition

I participated in the development of CMT-Net: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition.

This project focuses on speech emotion recognition by combining Mamba-style sequence modeling, Transformer-based attention mechanisms, and spatial-temporal cross-fusion strategies.
The repository provides code for feature extraction, model training, cross-validation, and SER experiments.

Keywords: Speech Emotion Recognition, PyTorch, Mamba, Transformer, Cross-Attention, WavLM


๐Ÿ“ฑ YOLOv5 TFLite Android Application

I developed an Android application based on YOLOv5 + TFLite, aiming to deploy object detection models on mobile devices.

This project includes model training, model conversion, and Android Studio-based application development.
It helped me gain practical experience in bridging deep learning models with real-world mobile deployment.

Keywords: YOLOv5, TFLite, Android, Java, Model Deployment


๐Ÿ–ผ๏ธ MATLAB Image Processing System

I built a MATLAB-based image processing project covering basic and classical image processing operations, including:

  • Image enhancement
  • Image filtering
  • Histogram processing
  • Basic image transformation and visualization

This project strengthened my understanding of low-level image representation and traditional computer vision methods.

Keywords: MATLAB, Image Processing, Filtering, Histogram, Enhancement


๐ŸŽฎ FPGA DE2-115 FlappyBird

I implemented a simple FlappyBird game on FPGA DE2-115, using Verilog and VGA display control.

The project involved hardware description, VGA timing control, game logic design, and FPGA platform debugging.

Keywords: FPGA, Verilog, VGA, DE2-115, Digital Logic Design


๐Ÿ› ๏ธ Tech Stack

Programming Languages

Python Java Verilog MATLAB

Deep Learning & AI

PyTorch Transformers YOLO TFLite

Tools & Platforms

Android Studio Git FPGA


๐Ÿ“Œ Featured Projects

Project Description Tech
CMTNET_for_SER Collaborative Mamba-Transformer network for speech emotion recognition Python, PyTorch, SER
Yolov5tflite-Android-App-Java YOLOv5-based Android object detection application Python, Java, Android, TFLite
Matlab-Image-Processing MATLAB image processing course project MATLAB, Image Processing
FPGA_DE2-115_FlappyBird FPGA-based FlappyBird game with VGA display Verilog, FPGA, VGA

๐Ÿ“š Current Focus

I am currently focusing on:

  • Designing more effective neural network structures for speech emotion recognition
  • Exploring quaternion representation, attention mechanisms, and multimodal fusion
  • Optimizing temporal graph neural network training under streaming dynamic graph scenarios
  • Improving the engineering efficiency and reproducibility of deep learning systems

๐Ÿ“Š GitHub Stats

GitHub Stats

Top Languages


๐Ÿ“ซ Contact


I believe that good research should not only propose new ideas, but also be implemented, tested, and improved through real engineering practice.

Pinned Loading

  1. Yolov5tflite-Android-App-Java Yolov5tflite-Android-App-Java Public

    Python 4

  2. CMTNET_for_SER CMTNET_for_SER Public

    CMT-Net: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition

    Python 8