
Intro
This project was my first project in the field of Machine learning and Statistics. I completed this project under the supervision of Prof. Saud Afzal Mohammad, Dept. of Civil engineering. This project proved to be an opening window to the world of machine learning and Artificial intelligence.
From then onwards I have been pursuing this field. I worked on the estimation of the return period of the wave height using different statistical methods and ML.

Return period estimation
May - July 2018
About my project
How I did it
During my summer vacation of second year, I asked my professor for a research project in machine learning as I have just completed an online course on ML. Upon discussing my interests and skills, I was given a project on sea wave height return period estimation and forecasting. I completed this project as a part of summer research internship at IIT Kharagpur. The aim of this project is to replace the current physical simulation based model for sea wave height generation as such models require great computational power. Further the results obtained can used in advance for sea energy generation, decrease coastal damage.
Problem statement: Given the dataset for sea wave height, estimate the return period of sea waves and forecast the wave height using Machine Learning.
The Project can be divided into two major parts:
Data collection and analysis
Distribution model and ML
I worked on this project for about 2 and a half months and this project helped me in gaining insights using probability statistical modeling and ML.

Skills and Tools
MATLAB
GEVT and GPD distribution function
Linear Regression
Statistics
LaTeX
.png)
Data collection and analysis
Currently, physical simulation-based models are used to predict the wave height at seashores. These models require great computational power and are not reliable. One such example is SWAN model (Fig 1). They are built on the hydrological equations of surface waves. Our aim is to reduce the computational cost at the same time without compromising with the output.
Test site was taken as Mehamn Harbour bay sea height data. It contains total of 18 variables (Fig 4) including wave height, temperature, humidity etc.
The readings were taken in the interval of 3 hours each for 60 years.
The data was plotted and standardized for better visualization. (Fig 3)
![]() | ![]() | ![]() |
---|---|---|
![]() |
.png)
Distribution model and ML
For estimating return period we used Generalised extreme value theory (GEVT) and generalized Pareto distribution on the basis of the distribution obtained by plotting the data
We calibrated the value of scale, shape and threshold parameter to get the best fit of the distribution.
![]() | ![]() | ![]() |
---|---|---|
![]() | ![]() |
Result
By applying the RNN model we get the RMSE value of 1.8975 m