Confirmation that data is fully de-identified

Hi,

I am trying to use our school’s cluster to work on this task, yet they request confirmation from the competition organizers that the data is fully de-identified. May I have a confirmation on that?

Thanks.

Hi Chris,

I am not 100% what you mean by that.

The data still have the real gene names.

I suggest you just open the quickstarter (here is another one) and look at the previous cell executions. You can even run it on Colab and see for yourself.

Hi,

Sorry for the earlier confusion. To clarify: my school’s cluster admins are strictly concerned with Patient Privacy (PHI/HIPAA).

Could you confirm if this data comes from lab-grown cell lines (like K562, HEK293, etc.) rather than direct clinical samples from human subjects? If you could provide the name of the cell line or a reference to the source study, that would be enough to satisfy our ethics board that the data is de-identified. Thanks

Hi Chris,

The official specification document make no reference to such certification.

If you would like, I can contact the organizers directly regarding your question, as I will not be able to answer it myself. However, you may not receive an answer over the weekend.

At the end of the document, there are a lot of references that the organizers recommend reading before jumping into serious work for the competition.

Thanks. Since the cluster admins require formal proofs that the data is Non-Human Subjects Research, could you ask the organizers for a quick confirmation of the cell line name or the source study citation?

Again, thanks for your time.

I just sent the email. I will forward their response once they reply.