samedi 31 janvier 2015

Is there an optimal way to find the best division of an interval of some positive integers?


I am struggling with a conceptual problem. I have positive integers from an interval [1800, 1850]. For every integer from that interval, let's say (without loss of generality) 1820, I have about 3000 horses. The 1820 number is a year of birth for a horse. Thoses horses were fed with a traditional food and some of those horses were fed with experimental food (there were 29 types of different experimental food). For every horse there was recorded a variable for each feeding named goodness of sneeze (the higer the goodness variable is, the better). Let's assume after every feeding a horse did sneeze. Every single horse could be fed with different type of food every time he came on feeding (with uniformal distribution). Let us assume that sneeze for horses comes from Poisson distribution with lamba=1 parameter.


Now I am looking for the best [1800,1850] interval division on intervals like:


[1800,1810), [1810,1826), [1826,1850] to say: for every subinterval this or that experimental food (or maybe traditional in some cases) gave best average sneeze for horses born in that interval.


I do not know if it is possible, but let's assume that horses does not come on feeding with regularity. Some of them come more often than others. Experiment took 20 days.


If there is a good way of generating the best interval in a relatively fast way? I tried to make a loop for i in 1 to 50 where i is a number of [1800,1850] interval divisions centers. If i=1 I check: [1800,1801],(1802,1850] [1800,1802],(1803,1850] ... [1800,1849],(1849,1850] and check which experimental food gave the biggest mean sneeze in that subinterval and answer the problem as this example:


[1800,1807],(1807,1850] is the best division from division with 1 interval centers for horses born in [1800,1807] the best food is experimentalFoodnr25 and for horses born in (1807,1850] the best food is experimentalFoodnr14. With respect to traditional food they give 0,04 higher mean sneeze for horses. (0.04 is of course a weigthened mean with respect to number of horses in both intervals)


Then I can go for i=2, and so on and so on but there higher the i is, the less horses are in the subintervals and the estimate of the average sneeze has greater standard error. So I thought about to choose the best [1800,1850] division that has the biggest weigthened mean of a's where a is calculated from subinterval and is to be as formula:


$a = \fi( 1- p )^{-1} * \sqrt( Var(X)/n_{x} + Var(Y)/n_{y} ) + \mu_{X} - \mu_{Y}$


where X are the records for horses treated with the experimental food giving the highest average sneeze in that subinterval, Y are the records for horses treated with traditional food in that subinterval. $\mu$ are means of that records, $Var$ are variances and p is the probability of that P( \mu{X}-\mu{Y}>a)=p (where I assue \mu{X} has normal distributions).


Can someone has any idea of relatively fast algorithm for that problem? If the problem is not clear please tell me what to specify.





Aucun commentaire:

Enregistrer un commentaire