Join us for our weekly series of short talks: nf-core/bytesize.
Just 15 minutes + questions, we focus on topics about using and developing nf-core
pipelines. These are recorded and made available at https://nf-co.re
, helping to build an archive of training material. Got an idea for a talk? Let us know on the #bytesize
Slack channel!
This week, Edmund Miller (@edmundmiller) will talk about the newest developments in the nf-core/nascent pipeline.
Video transcription
Note
The content has been edited to make it reader-friendly
(question continued) Okay, so then the background model is built up and then the caller will call the peaks based on some random distribution in the genome. (answer) Yep.
(question) Last question, why not using MACS and other conventional callers? Why use Homer? Homer seems quite primitive, I guess, in terms of peak calling and stuff. Why not something more sophisticated like MACS? Is there more false positives? (answer) Legacy. Homer, you can also tweak some of the important things of like it picks up on the… I missed the image, but basically it picks up on the peak and then it picks up on the trailing tail of it. That is actually the piece that’s really important there instead of… Here, I’ll just pull it up. This is what Homer’s actually doing. Whereas in MACS, you might just pick up the peak. You’re actually picking up this downstream transcript is why Homer’s unique to that. (question) Okay. SIZER presumably does something similar because it calls larger peaks as well, right? It’s able to call these sorts of counts? (answer) uhum. (question continued) Okay, cool. Thanks a lot, man. (host, question) There’s also another question in the chat. Why do you use feature counts and not other quantification methods as in RNA-Seq? (answer) Feature counts is always just what I’ve used for that. I’m open to other ideas on it. It’s not the exact same as RNA-Seq and most of those are RNA-Seq specific, is part of the issue on the quantification of those. So the difference is we pass in the genes, count with those. Then we also count with the identified transcripts and identified transcriptional start sites of those and give you counts of all of those. That’s the difference. Downstream you have to do your own math behind the scenes and stats because it’s not the exact same as RNA-Seq in terms of how the math works out on those. Again, also not well-defined. (audience) You’re counting with RNA-Seq, you’re counting things that overlap, spliced transcripts, whoever’s GRO-seq, you’re looking at the entire gene body where splicing isn’t important. Feature counts can do that in this case, whereas with RNA-Seq, as we’ve known and had previous discussions, it’s not ideal for the transcript splicing type quantification. (answer continued) Exactly. Exactly. Well said. It’s just… it can work in a very simple way is the reason that we’re using feature counts. (host) Okay. Thank you. I don’t see any more questions. So with that, I want to thank you, of course, Edmund, but also the Chan Zuckerberg Initiative for funding the bytesize talks and as usual, if there are any questions, you can always go to the nf-core workspace on Slack and the nascent channel and ask your questions there. Thank you very much.