?

Log in

No account? Create an account
 

[Lazyweb] Statistics question - F*cking with Clusters

About [Lazyweb] Statistics question

Previous Entry [Lazyweb] Statistics question Mar. 16th, 2028 @ 11:53 pm Next Entry
Dear PeopleWhoKnowStatistics.

Say I have a set of continuously arriving outcomes that have 'goodness' ratings attached. I don't know the distribution of said ratings, but I have some idea of the range, and I have a 1-level deep history of past ratings for some of them. I want to know how many outcomes I should wait for before grabbing one and calling it good. In this case, the payoff for picking sooner is fairly high.

I know there is some sort of actual method for this. The problem is sort of akin to Amazon's old Gold Box Model: given a price, you pick it now or you let it go and wait, giving up a known gain for a potential better future deal.

Can anyone tell me what I'm trying to think of here? This is terribly hard to Google search for :-p

Thanks,
-- Avani (proof positive that you can do years of ML without learning basic stats...)

Current Mood: puzzled
Current Music: Serj Tankian -=- Sky is Over
take a penny
[User Picture Icon]
From:ceph
Date:March 17th, 2008 07:19 am (UTC)
(Link)
http://ansuz.sooke.bc.ca/software/sultans-daughters.php

I think this might be a version of the algorithm you want. Took me a while to Google it, too.
[User Picture Icon]
From:jmpava
Date:March 17th, 2008 07:30 am (UTC)
(Link)
Ok, I wanna know where people come up with the NAMES for algorithms. Jeez... I mean, sure, it makes sense in the provided story, but then they become the canonical names for that 'type' of problem, and suddenly you can't find something becuase you don't know its origin story.

ETA: Not that I'VE ever been in a position where I've needed to track down an algorithm, mind you. Just seems high on the not-useful scale ;->

Edited at 2008-03-17 07:31 am (UTC)
[User Picture Icon]
From:rifhutch
Date:March 17th, 2008 08:30 am (UTC)
(Link)
This is better known as the secretary problem, when posed as the somewhat more chaste scenario of picking the best typist out of the secretary pool. Wikipedia can get you started on a few references for that. While this is plenty famous enough to have its own title to go along with the contrived narrative, there are more formalized titles for this class of problems as as well. "Optimal stopping" and "on-line selection" are the ones that spring to mind.
[User Picture Icon]
From:avani
Date:March 17th, 2008 05:17 pm (UTC)
(Link)
Thank You!! 'Online-selection' is exactly the term that was escaping me.
[User Picture Icon]
From:komaheian
Date:March 17th, 2008 12:28 pm (UTC)

That was fun!

(Link)
Ask more Lazy Web questions! That was fun!
[User Picture Icon]
From:shazamaramack
Date:March 17th, 2008 03:23 pm (UTC)
(Link)
23 is always a good number to stop at.
[User Picture Icon]
From:shazamaramack
Date:March 17th, 2008 03:39 pm (UTC)
(Link)
And now that I've read the link to the sultans daugthers problem...

The strategy is interesting in the version where you have to pick the one with the biggest dowry, but after that they just talk about a strategy for getting one in the top 5 or 10 of beauty. But they're all sisters. Is their beauty going to vary that much? And even if it does, I think my strategy would be to marry the first one that I'm attracted to enough to marry a stranger. If there isn't one, then I'm fine with ending up single. If she's not the best looking one, who cares? She's already really hot because I'm marrying a stranger.

So were there 62 total things to choose from in your problem? Cuz then 23 is the right number to stop at.
[User Picture Icon]
From:shazamaramack
Date:March 17th, 2008 03:40 pm (UTC)
(Link)
Oh... continuously arriving... nevermind.
[User Picture Icon]
From:gustavolacerda
Date:March 17th, 2008 04:37 pm (UTC)
(Link)
If the sequence is infinite, I need to know your patience level, or your discount rate. Supposing you're happy being in the top 5%, you can run a non-parametric quantile test to that effect... you pick as soon as it passes the test (i.e. when it's enough of an outlier)

Edited at 2008-03-17 04:37 pm (UTC)
[User Picture Icon]
From:avani
Date:March 17th, 2008 05:20 pm (UTC)
(Link)
I thought about that ... but I'd rather have it be independent of patience since this is implemented as a sort of peer selector on an arbitrarily sized network, and so I don't think my patience levels should be constant.
[User Picture Icon]
From:gustavolacerda
Date:March 17th, 2008 05:25 pm (UTC)
(Link)
ok. For infinite patience, how about this algorithm:
* wait forever, recording the best one
* after that, when you beat the best one, pick that

Expected running time: 2*infinity
[User Picture Icon]
From:gustavolacerda
Date:March 17th, 2008 05:29 pm (UTC)
(Link)
more seriously, if your patience is varying, then you want to vary the parameters of the quantile test accordingly.
[User Picture Icon]
From:avani
Date:March 17th, 2008 05:54 pm (UTC)
(Link)
Its just a level of complication I'd like to avoid: I think adding too many hooks that are dependent on the network size will adversely affect my scalability calculations later on. I'll try it, though, assuming I have time.
[User Picture Icon]
From:gustavolacerda
Date:March 17th, 2008 06:00 pm (UTC)
(Link)
I said "patience", not "network size".
[User Picture Icon]
From:avani
Date:March 17th, 2008 06:49 pm (UTC)
(Link)
In my case the two aren't independent. As my network size grows, I'm probably willing to wait longer, but not necessarily scaled to the size of the network, since that will kill scalability at 100k nodes or so. This is something I need to tweak experimentally, again, given time. Stupid project is due much too soon :-)
[User Picture Icon]
From:hatter
Date:May 18th, 2008 06:09 am (UTC)
(Link)
If it'll only affect it "later on" then chances are you can collect the data and process it both ways now, and when scalability becomes an issue, drop the one whose outcomes feel inferior. It'd seem like a premature optimisation to pick one way now, if you're not likely to be resource-bound for a little while. You can either keep the other dataset invisible from users entirely, present a subset of users view B instead of view A and see how they feel, offer a certainty on any result based on the different between the answers from the two methods; all sorts of bonus metadata.


the hatter
[User Picture Icon]
From:bennj
Date:March 17th, 2008 06:46 pm (UTC)
(Link)
I know they went over this question in MIT's algorithms course. I just need to remember where it was.
[User Picture Icon]
From:bennj
Date:March 20th, 2008 03:46 am (UTC)
(Link)
The Secretary Problem

I knew I'd seen it, but not as the Sultan's Dowry. The standard solution is to wait n/e (number of secretaries over 2.714...) and then stop at the first number that exceeds anything you've seen before, but this can be proven to maximize the probability you'll pick the best, not that you'll pick a "good" one.
(take a penny)
Top of Page Powered by LiveJournal.com