Concerns Regarding the Effectiveness of Peer Review in Crunch 3

many-kalin · April 7, 2025, 3:11pm

Hi,

I would appreciate it if you could provide a rationale for why the peer review evaluation in Crunch 3 should be considered reliable and effective. I have two primary concerns:

Requirement for Expert-Level Knowledge: The peer review process appears to assume that participants possess expert-level knowledge, particularly in the development of gene panels and the related algorithmic processes. Given that Crunch 3 is open to a wide range of participants, many of whom may not have specialized expertise in these areas, it seems unrealistic to expect that all reviewers will have the necessary background to evaluate the work accurately. How can we ensure that non-experts are able to provide meaningful feedback, especially when the tasks demand a deep understanding of technical and scientific concepts?
Conflict of Interest: There seems to be a built-in conflict of interest in the peer review scoring system. If participants are incentivized to rate others highly to receive positive feedback themselves, there may be little motivation to provide honest or critical assessments. This creates a potential for inflated scores that do not accurately reflect the quality of the work. Additionally, if ranking in the system is directly affected by peer review scores, what prevents participants from strategically downgrading others’ work to improve their own ranking?

raghvendramall · April 8, 2025, 2:31am

Very well put @many-kalin . We have similar doubts about the peer review part and wondering if there is a quantitative way to rank submissions.

prehistoric-cruncher · April 9, 2025, 5:44pm

I think those are good points, but also, most participants in the latest milestone already have some background or familiarity with this field. Incorporating peer review alongside our evaluation of top discriminative genes would actually complement our methodology by adding qualitative insights rather than undermining it.

@raghvendramall They mentioned they are going to use a quantitative and objective approach to rank submissions based on my understanding.

Quoting from broad 3 page:

“Classification Accuracy: We’ll use your top 50 genes to train a model that distinguishes between dysplasia and noncancerous mucosa. The better your genes help the model correctly identify these regions, the higher your accuracy score will be. This is the main factor in determining your ranking.”

many-kalin · May 3, 2025, 3:08pm

We haven’t heard back on these points yet.

Unfortunately, there are now additional concerns following the decision to share the top 50 gene list with reviewers. Doesn’t this risk introducing bias - where reviewers may prioritize submissions that align closely with their own gene lists? This could undermine the objectivity of the evaluation process and affect the diversity of selected entries.

We look forward to hearing your thoughts on this.

enzo · May 5, 2025, 4:28pm

The answer from the broad team:

To help the reviewer provide meaningful feedback, we have added information/aspects suggesting to cover in the review platform (the team, already did this). For the conflict of interest or bias, we will evaluate the reviewer reports.

Since the gene list is one of the major concrete results, showing them during the reviewing process is valuable.

selected-lars · May 10, 2025, 1:28pm

In my opinion the Evaluation step on page 18 of the challenge doc should determine the final winner. This is under the general principle that the cards speak. All final submissions, whether or not they are high on peer review and Crunch 1 or 2 scores, should have their gene lists assessed against the final panel chosen. This is to rule out possibility that the “right” panel was overlooked due to peer review bias or Crunch 1/2 bias. The scoring rules for final Evaluation seem a bit astrological: "For the top 5 teams (either by Route 1 or Route 2 above), these 50 genes will be the top 50 genes in their ranking; for other teams, these top 50 genes will likely be ranked differently* . Then “We will compute an overall ranking by weighting the cell classification and diversity rankings. The ranking will be mainly determined by the classification accuracy as described above and supplemented by diversity rankings.” The 3 bold phrases are weasel words (weasel phrases?). Before they do the ranking, these should be made more specific. In other words, the scoring function should be something that I can feed into an LLM and get out a reliable piece of Python code to compute the score. For example, here is a run through Perplexity of page 18: https://www.perplexity.ai/search/write-a-python-routine-that-im-6hpgCg4ORE.cjwnp9RqClQ

Topic		Replies	Views
Crunch 3 evaluation Broad Institute Crunch #3	11	115	April 28, 2025
Broad Crunch 3 gene panel selection Broad Institute Crunch #3	2	91	April 28, 2025
The current scoring is unstable Broad Institute Crunch #2	7	122	February 20, 2025
How is the Crunch 3 review process working in detail? Broad Institute Crunch #3	2	61	May 4, 2025
Crunch 1 learderboard ranking Broad Institute Crunch #1	5	108	January 4, 2025

Concerns Regarding the Effectiveness of Peer Review in Crunch 3

Related topics