I am interested in deep learning, speech emotion recognition, computer vision deployment, and efficient graph learning systems.
My work focuses on building practical AI systems from model design to engineering implementation, including neural network architecture design, feature extraction, model training, and deployment on real platforms.
-
Speech Emotion Recognition (SER)
- Multimodal and spatial-temporal representation learning
- Mamba / Transformer-based sequence modeling
- Cross-attention and feature fusion for emotional speech understanding
-
Efficient Temporal Graph Learning
- Streaming dynamic graph training
- Insert-delete graph update semantics
- System-level optimization for large-scale temporal GNN training
-
Computer Vision & Edge Deployment
- Object detection with YOLO
- Android-based AI application deployment
- Lightweight model conversion and inference
-
Digital Image Processing & Hardware Design
- MATLAB-based image enhancement, filtering and histogram processing
- FPGA / Verilog-based VGA display and interactive game design
I participated in the development of CMT-Net: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition.
This project focuses on speech emotion recognition by combining Mamba-style sequence modeling, Transformer-based attention mechanisms, and spatial-temporal cross-fusion strategies.
The repository provides code for feature extraction, model training, cross-validation, and SER experiments.
Keywords: Speech Emotion Recognition, PyTorch, Mamba, Transformer, Cross-Attention, WavLM
I developed an Android application based on YOLOv5 + TFLite, aiming to deploy object detection models on mobile devices.
This project includes model training, model conversion, and Android Studio-based application development.
It helped me gain practical experience in bridging deep learning models with real-world mobile deployment.
Keywords: YOLOv5, TFLite, Android, Java, Model Deployment
I built a MATLAB-based image processing project covering basic and classical image processing operations, including:
- Image enhancement
- Image filtering
- Histogram processing
- Basic image transformation and visualization
This project strengthened my understanding of low-level image representation and traditional computer vision methods.
Keywords: MATLAB, Image Processing, Filtering, Histogram, Enhancement
I implemented a simple FlappyBird game on FPGA DE2-115, using Verilog and VGA display control.
The project involved hardware description, VGA timing control, game logic design, and FPGA platform debugging.
Keywords: FPGA, Verilog, VGA, DE2-115, Digital Logic Design
| Project | Description | Tech |
|---|---|---|
| CMTNET_for_SER | Collaborative Mamba-Transformer network for speech emotion recognition | Python, PyTorch, SER |
| Yolov5tflite-Android-App-Java | YOLOv5-based Android object detection application | Python, Java, Android, TFLite |
| Matlab-Image-Processing | MATLAB image processing course project | MATLAB, Image Processing |
| FPGA_DE2-115_FlappyBird | FPGA-based FlappyBird game with VGA display | Verilog, FPGA, VGA |
I am currently focusing on:
- Designing more effective neural network structures for speech emotion recognition
- Exploring quaternion representation, attention mechanisms, and multimodal fusion
- Optimizing temporal graph neural network training under streaming dynamic graph scenarios
- Improving the engineering efficiency and reproducibility of deep learning systems
- GitHub: AKAPhilipD
- Email: dongshihe030@163.com
I believe that good research should not only propose new ideas, but also be implemented, tested, and improved through real engineering practice.