Behind the Paper

Helformer: an attention-based deep learning model for cryptocurrency price forecasting

Traditional forecasting models often fail to capture the nonlinear, nonstationary nature of crypto markets, where prices swing dramatically based on factors like market sentiment, regulatory news, and macroeconomic trends. This work develops a Helformer, a new variant of Transformer architecture.

Temitope Kehinde Apr 07, 2025

🔍 Behind the Scenes: Methodology & Innovation

1. The Helformer Architecture
Helformer integrates three key components to outperform existing models:

Series Decomposition: Using Holt-Winters smoothing, we break price data into level, trend, and seasonality components (Fig. 1). This step isolates patterns that traditional Transformers might miss.
Multi-Head Attention: Unlike sequential models (e.g., LSTM), Helformer processes all time steps simultaneously, capturing long-range dependencies efficiently.
LSTM-Enhanced Encoder: Replacing the standard Feed-Forward Network with an LSTM layer improves temporal feature extraction.

Fig. 1: Helformer architecture.

2. Data & Hyperparameter Tuning
We trained Helformer on Bitcoin (BTC) daily closing prices (2017–2024) and tested its generalization on 15 other cryptocurrencies (e.g., ETH, SOL). To optimize performance, we used Bayesian optimization via Optuna, automating hyperparameter selection (e.g., learning rate, dropout) and pruning underperforming trials early.

3. Evaluation Metrics
Helformer was benchmarked against RNN, LSTM, GRU, and vanilla Transformer models using:

Similarity metrics: R², Kling-Gupta Efficiency (KGE), EVS
Error metrics: RMSE, MAPE, MAE
Trading metrics: Sharpe Ratio, Maximum Drawdown, Volatility, Cumulative returns

💡 Key Findings & Practical Impact

1. Superior Predictive Accuracy
Helformer achieved near-perfect R² (1.0) and MAPE (0.0148%) on BTC test data, outperforming all baseline models (Table 1). Its decomposition step reduced errors by 98% compared to vanilla Transformers.

Table 1: Model Performance Comparison

Model	RMSE	MAPE	MAE	R²	EVS	KGE
RNN	1153.1877	1.9122%	765.7482	0.9950	0.9951	0.9905
LSTM	1171.6701	1.7681%	737.1088	0.9948	0.9949	0.9815
BiLSTM	1140.4627	1.9514%	766.7234	0.9951	0.9952	0.9901
GRU	1151.1653	1.7500%	724.5279	0.9950	0.9950	0.9878
Transformer	1218.5600	1.9631%	799.6003	0.9944	0.9946	0.9902
Helformer	7.7534	0.0148%	5.9252	1	1	0.9998

2. Profitable Trading Strategies
In backtests, a Helformer-based trading strategy yielded 925% excess returns for BTC—tripling the Buy & Hold strategy’s returns (277%)—with lower volatility (Sharpe Ratio: 18.06 vs. 1.85), as shown in Fig. 2.

Fig. 2: Trading results.

3. Cross-Currency Generalization
Helformer’s pre-trained BTC weights transferred seamlessly to other cryptocurrencies, achieving R² > 0.99 for XRP and TRX. This suggests broad applicability without retraining—a boon for investors managing diverse portfolios.

🌍 Relevance to the Community

For Researchers: Helformer’s architecture opens avenues for hybrid time-series models in finance, healthcare, and climate forecasting.
For Practitioners: The model’s interpretable components (decomposition + attention) make it adaptable to volatile markets beyond crypto.
For Policymakers: Reliable price forecasts could inform regulations to stabilize crypto markets and protect investors.

🤝 Acknowledgments & Open Questions

This work wouldn’t have been possible without my brilliant co-authors Oluyinka Adedokun, Joseph Akpan, Morenikeji Kareem, Hammed Akano, and Oludolapo Olanrewaju, or the support of The Hong Kong Polytechnic University.

We’d love to hear your thoughts!

How might Helformer adapt to non-financial time-series data?
Could integrating sentiment analysis further improve accuracy?
What ethical considerations arise with AI-driven trading?

🔗 Access the full paper: SpringerLink | ReadCube