Just Like Stop Me
/Here's a mashup for your lazy Sunday.
Warning: contains Morrissey.
Here's a mashup for your lazy Sunday.
Warning: contains Morrissey.
Along the lines of one of the last post's items: Mimic.
Mimic converts input text into, well, an identical output. Visually identical output, that is. In reality, ASCII characters get replaced by very similar but non-equivalent Unicode characters, posing a challenge if that text needs to be parsed by anything other than a human.
It could have some use as a testing routine for code which absolutely has to tolerate Unicode strings as input. I mean, don't use this for anything else, even as a prank. Some lines just shouldn't be crossed.
I found Mimic through esoteric.codes, a wonderful project documenting the stranger and often more philosophical side of coding, including esoteric programming languages (esolangs).
See also: these Unicode toys.
Most bioinformatics databases I know of don't even parse Unicode strings properly, though there's usually a workaround.
It's increasingly bothersome that these kinds of meta-databases are even necessary. I know that most scientists are spoiled for choice when it comes to existing data sets but when there's more -omics data out there than any human could conceivably process, we've missed the point. Why should we keep all this data around without using it?*
A 2013 paper by Duck et al. found that most (more than 70 percent, in their data set) bioinformatics resources don't get mentioned more than once in the literature. That's quite intimidating! Is it a sign that too many bioinformatics projects are like hammers searching for nails, or are we, as human researchers, just limited in how many different resources we can use at once?
*A disclaimer: I don't think we aren't using all that -omics data, I just think it's underutilized.
Music for this week has been Hammock's Oblivion Hymns.
The whole album can be found here.
It's also here (the deluxe edition, no less):
It's the ideal background for a rainy Friday or possibly for an entire civilization slowly descending into a subterranean chasm.
With just a bit of background, you can probably tell that the material at the link above is:
It'll probably make sense if you've never read The Little Prince but might not if you're not familiar with things like printf. Without spoiling too much, I can tell you that The Little Printf is about why we code, or more broadly, what we get distracted by in the process of using computers.*
I worry about this frequently. I worry about whether my work in bioinformatics will produce anything resembling real biological phenomena. I worry about whether anyone will ever read my work, and if they do so, whether they'll find it useful or even usable. I worry about the time required to write and maintain code and when new technology (perhaps a new sequencing method, or even a well-curated data set) will render my work obsolete.
OK, I'll spoil a bit here:
Computers don't worry about how relevant their work is or whether they'll become obsolete. The people using them certainly do. Keeping the "human face" behind each problem isn't easy, though, especially with bioinformatics, where that human face may be several steps removed from the coding process. My work tends to have more of a bacterial face than anything else. So, instead, it ends up being a bit more about why we study biology in the first place.
What do we hope to find by swimming in all this data? What can we learn from it? How will that learning benefit humanity? Is the learning its own benefit, or is it a distraction? Can it be both?
*I am reminded, as usual, of _why's CLOSURE, though CLOSURE is immeasurably more opaque and potentially about how coding is inherently distancing. The theme of Why We Code remains.
Harry Caufield is a researcher at UCLA developing ways to better understand biomedical text and literature as a data resource. He is interested in information extraction, natural language processing, machine learning, protein-protein interactions, cardiovascular health, and the microbial world. He also appreciates computational creativity and generative methods.
Go ahead and send him an email. It would brighten his day.