Neural Network for ETF Analysis

Table of Contents

Data Preprocessing

ETF data was sourced from justetf.com, focusing on the top holdings of each fund. Key metrics such as Basic EPS, EBITDA, EBIT, Gross Profit, Revenue, Net Income, and Operating Margin were collected for each holding. The challenge lay in harmonizing these features across different funds while handling missing values (NA). Data imputation and normalization were applied to ensure comparability across all holdings.

Data Preprocessing Overview
Figure 1: First Neural Network structure.

First Neural Network: Top Holdings Scoring

The first neural network processes input features derived from the top holdings of ETFs. Each holding is scored based on its financial metrics, providing an aggregated score for each ETF. The model was trained using a combination of mean squared error and regularization to ensure robust performance on diverse ETFs.

Neural Network for Top Holdings Scoring
Figure 2: Training and Validation on first Neural Network.

Second Neural Network: ETF Selection

The second neural network takes the output scores from the first network and additional fund-level features as inputs to determine the optimal ETF selection. By minimizing a custom loss function tailored to portfolio optimization, the model identifies ETFs best aligned with specific investment goals.

Neural Network for ETF Selection
Figure 3: Neural network architecture for ETF selection.

Conclusion

The dual-neural-network framework effectively addresses the complexities of ETF analysis by integrating top holdings scoring with fund-level selection. Future work will focus on enhancing the interpretability of the models and incorporating real-time market data to refine predictions. The repository provides comprehensive details and scripts for replication and further development.