Despina Maliaka
HeroX team

Leaderboard Elements

Solvers,

To help you better navigate the Cultivate Labs leaderboard, we're providing the attached sample leaderboard. Reference the images below for detailed descriptions of each of the leaderboard elements, including filters, rank, percent improvement, and more.
Attachments

22 Replies

Despina Maliaka
HeroX team
Leader elements continued.
Attachments
reza mottaghian
How i can send idea?
Carolyn Meinel
How do I access Leaderboard Elements? My team captain can see this data, but there is no Elements button on my version. I'm looking at https://discover.iarpagfchallenge.com/leaderboards/brier_scores. I set up a username and password in order to access this site, but I'm wondering if the problem is that the site has no way of knowing that I am on a GF Challenge team unless I do something special. Could anyone enlighten me on what that might be? I'm trying to tag this with "Moderator" but the tagging function won't allow me to do so. So I'll try this: @Moderator
Carolyn Meinel
I read what you wrote on Leaderboard ELements. How do I access Leaderboard Elements? My team captain can see this data, but there is no Elements button on my version. I'm looking at https://discover.iarpagfchallenge.com/leaderboards/brier_scores. I set up a username and password in order to access this site, but I'm wondering if the problem is that the site has no way of knowing that I am on a GF Challenge team unless I do something special. Could anyone enlighten me on what that might be? I'm trying to tag this with "Moderator" but the tagging function won't allow me to do so.
Tagged: Despina Maliaka
Ben Roesch
Hey Carolyn-

Can you post a screenshot of what you're seeing at that URL (https://discover.iarpagfchallenge.com/leaderboards/brier_scores)? Also, what browser are you using?

Thanks,
Ben
Carolyn Meinel
Screenshot from Chromium on Ubuntu. I get the same screen using Chrome on Windows 10, as well.
Attachments
Ben Roesch
Hey Carolyn-

What you're seeing looks right to me. Which part did you think was missing?

Thanks,
Ben
Carolyn Meinel
Above the box labeled "Leaderboard" on my team leader's computer there is a box labeled "Elements." This thread titled "Leaderboard Elements" is about what you can see when you click the "Elements" box. My team leader showed me that his computer screen shows the Elements box above the Leaderboard box, and whne he clicked on it he got the same images that are shown in this thread. However, because my computer does not show the Elements box, I cannot access the Elements features. I would like to be able to do so.
Ben Roesch
If you mean the elements sections in the screenshots that Despina posted, those were added via powerpoint -- they aren't part of the actual platform. What you're seeing is all there is to the leaderboard.
Lars Ericson
@Ben Roesch
Can we get a button that limits the display to just my methods or just the methods of a selected team? I have to scroll through 800 methods to get to my stuff (I'm way at the bottom!).
Tagged: Ben Roesch
Ben Roesch
Hey Lars-

Sorry, but I believe the leaderboards are feature-complete for this round. If we end up making any changes, I'll keep that in mind as a feature request.

Thanks,
Ben
Lars Ericson
"Feature complete"? Does that mean "budgeted billable hours exhausted"?
Alan Patrick
Hi - Not sure where this quesion goes but as it's the leaderboard that's spurred it I'll put it here. I may misunderstand how the scores work so can I ask whether, for any one (of the 40 possible) method used (i) is it scored for *all* the IFPs or only the IFPs it is applied to, and (ii) if it is applied several days after an IFP's start date, is the Ignorance Prior daily forecast for that IFP on the days before the Method is applied counted as part of the Method's score?
Modified on July 3, 2019, 10:34 p.m. PDT
Lars Ericson
My understanding is each method is applied to all IFPs, and any IFP that a method starts late on gets Ignorance Prior up to the day it starts. Your score is max(method1,...method40). So you can't mix and match and apply one model in one method only to questions that it applies to, and ignore other questions in that method. You've probably already lost a fair amount of ground. Better strategy is to broadcast Consensus vote for a method when you don't have a model for that kind of question. You can also use your method slots to combine models in various combinations. This is often used for example in deep learning pipelines to demonstrate sensitivity to various parts of the pipeline, by selectively eliminating portions of the pipeline to show their impact on the overall score. So you could have Consensus, Financial Model, Election Model, Bet the Farm (0%/100%) Model, and do various stackings like Financial Then Consensus, Financial Then Best Forecaster Then Consensus, Financlal + Election, and so on. You get 40 combinations, more than enough for a good paper. Typically in deep learning papers you see 5 or so factors in a pipeline with some sensitivity analysis to individual factors. Each combination stands on it's own though and applies to all IFPs. What's your team name on the Leaderboard? I don't see anything like DataSwarm.

If you're finding the Zeitgeist on GFC2 by actually applying this thing: http://dataswarm.tech/ then that will be a great demo and wonderful advertising if you win. However, we're 14 questions done now. The clock is ticking. If you haven't submitted anything yet you're at a strong disadvantage and your system won't show well.
Modified on July 4, 2019, 5:19 a.m. PDT
Alan Patrick
Thanks for that. the scores make sense now. We're on the leaderboard, seems our team name is the one of us who got the API keys (andy wise) so we've used the terms "DataSwarm Method 1 - 3" for the various approaches - as you surmised, the different methods work for different problem types. We came in late as we only heard about this a week after it started, and it took more time to get all the API stuff up and running so all the early question answers (I now realise) have a lot of Ignorance Prior" overhead in their score. Also thanks for point re different sub-methods with different odds allocations, I was wondering about testing that too.
Modified on July 4, 2019, 6:46 a.m. PDT
Lars Ericson
Glad to help! Note the tip on broadcasting Consensus on unused methods is pinned at the top of the Forum here: https://www.herox.com/IARPAGFChallenge2/forum/thread/3971 Also, at the moment you're doing very well (7th place), so ignore all that stuff about being at a disadvantage, you're doing fine. Except you've only got a few methods up. You'd give yourself more wiggle room by initializing all remaining slots to Consensus or your favorite method, if you have a lot of confidence that it won't crash when a lot of different question types get resolved.
Modified on July 4, 2019, 6:47 a.m. PDT
Alan Patrick
Yes, we saw the Consensus stuff on an email sent out a few weeks back so started to use it then. Our first shot was the Oil question where our 0.95 sure bet disappeared in a 1 1/2 hour market plunge tha bounced back straight after :D RE initialising all other slots, does this mean if you don't use all 40 slots then you are carrying a whole load of "Ignorance Prior" results that bring down the average, or (as I think you mean) if you then use more methods you then inherit all the Inorance Priors to date?
Modified on July 4, 2019, 7:18 a.m. PDT
Lars Ericson
If you find midway through you want to try another method and not put it in place of a method in a currently used slot, then all the slots you haven't used (about 35 of them, it looks like), will be Ignorance Prior'ed up to the moment you start using them, unless you prime them now by sending out Consensus every day for all questions on those slots. That is, the scoring engine is going to assume Ignorance Prior rather than Consensus for all unused slots and all questions in used slots that you haven't forecast.
Alan Patrick
Thanks again - just to check re "the scoring engine is going to assume Ignorance Prior rather than Consensus for all unused slots and all questions in used slots that you haven't forecast" point - does that mean each Method is judged against all questions, even ones where it makes no forecast at all? (Apologies if this sounds dumb, we read the phrase "Each Solver’s forecasting method will be scored across all relevant IFPs" in the rules as "relevant" meaning a Method would only be scored against the IFPs it was used on)
Lars Ericson
Yes, that's exactly my point, I've been saying it over and over. Last year, you got to pick and choose somewhat: you had to answer at least 70% of questions in the highest scoring method. It happened to be the case that, last year, if you scored people against 100% of questions, I won. Against everybody, including all the HFC Performers. This year, it happens to be the case (and I wouldn't mind taking a little credit), that they are going with the 100% rule: you are scored against every question, whether you answered it or not. For all methods. And they're leaving the Performers out of it this time. (For the Performers that are left, I think one or two got knocked out. ) "Relevant" means that the question is in the live portal (not staging or training) and hasn't been voided. If you're not answering every question, y'all got work to do.

The origin (in my mind) of this idea is to imagine yourself as the analytic component of an intelligence agency: You don't get to pick and choose what questions come in. You have to work on every question that comes in. So the fair measure of a system is not how well it does on the questions it thinks it can handle. The fair measure is against all the questions it is told to handle.

I'm not boasting for this year, by the way. I'm near the bottom of the Leaderboard. But I'm doing my homework. Maybe I'll get back up towards the top when I'm done with it: https://www.linkedin.com/pulse/parsing-ifps-iarpa-gfc2-lars-ericson/
Modified on July 5, 2019, 4:06 a.m. PDT
Let these people know about your message