To understand the training data better I have a few questions
it seems like the competition is designed for a portfolio that will be updated weekly ? ( i.e you constructed the portfolio based on Friday close and with your model you want to predict the ranking of the 260+instuments for one week holding period and On Monday morning you buy the top 40 highest ranked and sell the bottom 40 ranked and hold the position one week and you repeat )
if that is so ,
Is the rows, date , in training data is in a sequence of weeks in a given time period ?
or it is first day of a week in no particular time sequence derived from many years ?
is the Y the performance ranking of the one weeks performance ? or different period ?
In terms of 460 features in the train data, are they all derived from the price data of the instrument on Friday close ( such as many technical indicators , volatility , trading volume etc, ) or there are non price derived variables specific to the instruments such as debt rating, dividend yield, goodwill ratio to assets, sales growth rates ?, gross margin trends , weighting in the SPY ETF etc. debt to EBITDA, CEO rating at glassdoor.com , or general macro indicators like FED interest rate policy, spread between 2 year 10 yr US treasuries, spread between investment grade and junk bonds, DXY index , VIX index etc. etc