ꜱɪʟʟy ᴛᴇxᴛ ɢᴀᴍᴇꜱ

Along the lines of one of the last post's items: Mimic.

Mimic converts input text into, well, an identical output. Visually identical output, that is. In reality, ASCII characters get replaced by very similar but non-equivalent Unicode characters, posing a challenge if that text needs to be parsed by anything other than a human. 

It could have some use as a testing routine for code which absolutely has to tolerate Unicode strings as input. I mean, don't use this for anything else, even as a prank. Some lines just shouldn't be crossed.

I found Mimic through esoteric.codes, a wonderful project documenting the stranger and often more philosophical side of coding, including esoteric programming languages (esolangs).

See also: these Unicode toys.

A few bits to chew on

A shipworm. It's quite simple, physiologically. From Wikimedia Commons.

A shipworm. It's quite simple, physiologically. From Wikimedia Commons.


Most bioinformatics databases I know of don't even parse Unicode strings properly, though there's usually a workaround.

  • Omictools.com - a database of bioinformatics databases. There are literal thousands of bioinformatics tools and resources for some applications. Even when there are only hundreds, it's difficult to tell which resource uses which kind of data, when it was last updated, or whether it's even being maintained anymore. A resource may have been released a year ago, but some funding-related disaster can quickly take it offline (or even worse, leave it in an undead state, still responding to users but providing malformed results). Omictools helps to identify the useful resources.

It's increasingly bothersome that these kinds of meta-databases are even necessary. I know that most scientists are spoiled for choice when it comes to existing data sets but when there's more -omics data out there than any human could conceivably process, we've missed the point. Why should we keep all this data around without using it?*

A 2013 paper by Duck et al. found that most (more than 70 percent, in their data set) bioinformatics resources don't get mentioned more than once in the literature. That's quite intimidating! Is it a sign that too many bioinformatics projects are like hammers searching for nails, or are we, as human researchers, just limited in how many different resources we can use at once? 

*A disclaimer: I don't think we aren't using all that -omics data, I just think it's underutilized. 

Distant faces in the code

The Little Printf

With just a bit of background, you can probably tell that the material at the link above is:

  1. Essentially a version of The Little Prince
  2. For programmers/coders/computer people

It'll probably make sense if you've never read The Little Prince but might not if you're not familiar with things like printf. Without spoiling too much, I can tell you that The Little Printf is about why we code, or more broadly, what we get distracted by in the process of using computers.*

I worry about this frequently. I worry about whether my work in bioinformatics will produce anything resembling real biological phenomena. I worry about whether anyone will ever read my work, and if they do so, whether they'll find it useful or even usable. I worry about the time required to write and maintain code and when new technology (perhaps a new sequencing method, or even a well-curated data set) will render my work obsolete.

OK, I'll spoil a bit here: 

In the end though, it is only when you solve problems with a human face that you can feel truly right; What is essential is invisible to the computer.

Computers don't worry about how relevant their work is or whether they'll become obsolete. The people using them certainly do. Keeping the "human face" behind each problem isn't easy, though, especially with bioinformatics, where that human face may be several steps removed from the coding process. My work tends to have more of a bacterial face than anything else. So, instead, it ends up being a bit more about why we study biology in the first place.

What do we hope to find by swimming in all this data? What can we learn from it? How will that learning benefit humanity? Is the learning its own benefit, or is it a distraction? Can it be both?

*I am reminded, as usual, of _why's CLOSURE, though CLOSURE is immeasurably more opaque and potentially about how coding is inherently distancing. The theme of Why We Code remains.