SARIMA model for predicting US auto sales
This data analysis project aims to provide insights into the sales performance of US auto sales over the past years. By analyzing various aspects of the sales data, we seek to identify trends, forecast the 2024 sales using the most appropiate ARIMA/SARIMA model and make data-driven recommendations and coclusions based on the findings.
The main objectives of this project is to predic the US auto sales using the optimal ARIMA/SARIMA model. The specific objectives are:
U.S. Bureau of Economic Analysis, Motor Vehicle Retail Sales: Domestic Autos, retrieved from FRED, Federal Reserve Bank of St. Louis. To download the file, you can click the links below:
In this phase, the following tasks were performed:
EDA involved exploring the auto sales data to answer key questions, such as:
First we check seasonality of the dataset by using the stl() command from the forecast package. To forecast the auto sales, the forecast package consisting of the auto.arima() function was utilized to fit the ARIMA/SARIMA model on the US auto sales.
# Install and load the required packages
install.packages("forecast")
install.packages("stats")
library(stats)
library(forecast)
# Test for seasonality
tsdata <- ts(data, frequency = 12, start = c(2013, 1))
spectrum(auto_ts) # Periodogram showing frequency of seasonality
stl_component <- stl(auto_ts,s.window = "periodic") # STL decomposition
plot(stl_component, main = "STL Decomposition Plot")
Box.test(auto_ts, lag = 12, type = "Ljung-Box") # Ljung-Box test for seasonality
# Fit the ARIMA model
arima_model <- auto.arima(tsdata, seasonal = FALSE, d = 0)
# Fit the SARIMA model
sarima_model <- auto.arima(tsdata, seasonal = TRUE, d = 0, allowmean = F) # Accounts for seasonality
There was no missing data points in the dataset as shown by the plot below.
The time series plot reveals a downward trend in US auto sales from 2013 to 2023 indicating a decrease in US auto sales in the past decade.
The time series data, (DAUTONSA.csv), is seasonal. therefore the most suitable model to fit was the SARIMA model to account for this seasonality.
The data was confirmed to be stationary through Augmented Dickey-Fuller (ADF) test where the test hypothesis is given as:
Conclusion: Reject Null hypothesis if p-value < alpha
In this case, at 95% significance level (alpha = 0.05), p-value = 0.0222 and hence there is sufficient evidence to reject the null hypothesis and conclude that the time series data is stationary.
SARIMA(3, 0, 1)(0, 1, 1)[12] was the optimal model to fit and forecast the 2024 US auto sales as it had a lower Alkaike Information Criteria (AIC) compared to the other models. All its coefficients were also statistically signicant (p-value < 0.05). The plot below shows the forecast of 2024 auto sales using the seasonal ARIMA(3, 0, 1)(0, 1, 1)[12].
The forecast predicts an increasing trend in US auto sales with the sales predicted to start as low as 94.89 (thousands of units) in January, 2024 to as high as 246.01 (thousands of units) in December, 2024.
The US auto sales data in the past decade is stationary and seasonal. There is a decreasing trend in auto sales in the US from 2013 to the end of 2023. The seasonal ARIMA(3, 0, 1)(0, 1, 1)[12] is the optimal model to forecast US auto sales with this model predicting a peak sale of 248.67 (thousands of units) in September, 2024. The forecast shows a steady increase of sales from January to December, 2024
Based on the analysis, we recommend the following actions: