Air Quality Prediction

Overview

This project is a machine learning-based air quality prediction system developed using TensorFlow and Keras. It utilizes a dataset containing various environmental features to classify air quality into different categories. The model is designed for robust performance, employing feature scaling, dropout regularization, and batch normalization to improve accuracy and prevent overfitting.

Features

Data Preprocessing: Feature scaling using MinMaxScaler for normalization.
Neural Network Architecture: Multi-layer perceptron (MLP) with dense layers, batch normalization, dropout, and L2 regularization.
Optimization Techniques: Uses early stopping and learning rate reduction strategies to enhance model performance.
Classification: Predicts air quality levels based on input environmental parameters.
Evaluation Metrics: Reports loss and accuracy on test data.
Predictive Capabilities: Accepts new input data for air quality classification.

Dataset

The dataset is expected to be in a CSV file named dataset.csv and should have the following columns:

Feature1, Feature2, Feature3, Feature4, Feature5, Feature6, AirQuality

Feature1 to Feature6: Numerical values representing various environmental factors affecting air quality.
AirQuality: The target variable, categorized into different air quality levels.

Installation & Dependencies

Ensure you have the required dependencies installed:

pip install pandas scikit-learn tensorflow

How to Run the Project

Place dataset.csv in the working directory.
Execute the Python script:
```
python air_quality_dataset.py
```
The script performs the following tasks:
- Loads and preprocesses the dataset.
- Splits data into training and testing sets.
- Builds and trains a deep learning model.
- Evaluates the model's accuracy on test data.
- Predicts air quality for a sample input.

Model Architecture

The neural network comprises the following layers:

Input Layer:
- 256 neurons with ReLU activation
- L2 regularization to reduce overfitting
Hidden Layers:
- 128 and 64 neurons with ReLU activation
- Batch normalization to stabilize training
- Dropout layers (30-40%) to prevent overfitting
Output Layer:
- Softmax activation for multi-class classification

Model Schema

Below is a simple schema representation of the model architecture:

Input Layer (256 neurons, ReLU, L2 Regularization)
        |
Batch Normalization
        |
Dropout (40%)
        |
Hidden Layer (128 neurons, ReLU, L2 Regularization)
        |
Batch Normalization
        |
Dropout (30%)
        |
Hidden Layer (64 neurons, ReLU, L2 Regularization)
        |
Batch Normalization
        |
Dropout (30%)
        |
Output Layer (Softmax Activation)

Training Details

The model is compiled using the Adam optimizer with a learning rate of 0.001.
Uses categorical cross-entropy as the loss function.
Implements early stopping and ReduceLROnPlateau to optimize training efficiency.
Training is conducted for up to 200 epochs, with a batch size of 32.

Model Evaluation

The script evaluates model accuracy and loss on the test dataset.

Accuracy is printed after model evaluation:

loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy}")

Making Predictions

A sample prediction can be made as follows:

new_data = [[640, 590, 1105, 1608, 1459, 2427]]
new_data_normalized = scaler.transform(new_data)
prediction = model.predict(new_data_normalized)
predicted_class = prediction.argmax(axis=1) + 1
print(f"Predicted Air Quality Class: {predicted_class}")

Future Enhancements

Expand Dataset: Incorporate more features and data points to improve accuracy.
Hyperparameter Tuning: Experiment with different architectures and optimizers.
Deploy Model: Convert into an API or web-based interface for real-time predictions.
Visualization: Integrate Matplotlib/Seaborn for data analysis and insights.

License

This project is open-source and available under the MIT License.

Author

Dilyan Grigorov

Acknowledgments

TensorFlow & Keras: For deep learning framework.
Scikit-learn: For preprocessing and model evaluation.
Pandas: For data manipulation and processing.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Air_Quality_Dataset.ipynb		Air_Quality_Dataset.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Air Quality Prediction

Overview

Features

Dataset

Installation & Dependencies

How to Run the Project

Model Architecture

Model Schema

Training Details

Model Evaluation

Making Predictions

Future Enhancements

License

Author

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Air Quality Prediction

Overview

Features

Dataset

Installation & Dependencies

How to Run the Project

Model Architecture

Model Schema

Training Details

Model Evaluation

Making Predictions

Future Enhancements

License

Author

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages