November 08, 2015

Just Like Stop Me

November 08, 2015/ Harry Caufield

Warning: contains Morrissey.

October 27, 2015

ꜱɪʟʟy ᴛᴇxᴛ ɢᴀᴍᴇꜱ

October 27, 2015/ Harry Caufield

Along the lines of one of the last post's items: Mimic.

Mimic converts input text into, well, an identical output. Visually identical output, that is. In reality, ASCII characters get replaced by very similar but non-equivalent Unicode characters, posing a challenge if that text needs to be parsed by anything other than a human.

It could have some use as a testing routine for code which absolutely has to tolerate Unicode strings as input. I mean, don't use this for anything else, even as a prank. Some lines just shouldn't be crossed.

I found Mimic through esoteric.codes, a wonderful project documenting the stranger and often more philosophical side of coding, including esoteric programming languages (esolangs).

A few bits to chew on

October 24, 2015/ Harry Caufield

This Small Things Considered entry about shipworm microbiomes. A version of this post was in the most recent Microbe. It's certainly relevant to the whole subject of industrial-scale cellulose metabolism.

A shipworm. It's quite simple, physiologically. From Wikimedia Commons.

This example of unexpected Unicode effects. Unicode is a wonderful, evolving thing, but its brute-force approach to providing every possible character can lead to dangerous redundancies. Is it really Unicode's fault, though? Or it is the result of EAFP programming philosophies?

Most bioinformatics databases I know of don't even parse Unicode strings properly, though there's usually a workaround.

Omictools.com - a database of bioinformatics databases. There are literal thousands of bioinformatics tools and resources for some applications. Even when there are only hundreds, it's difficult to tell which resource uses which kind of data, when it was last updated, or whether it's even being maintained anymore. A resource may have been released a year ago, but some funding-related disaster can quickly take it offline (or even worse, leave it in an undead state, still responding to users but providing malformed results). Omictools helps to identify the useful resources.

It's increasingly bothersome that these kinds of meta-databases are even necessary. I know that most scientists are spoiled for choice when it comes to existing data sets but when there's more -omics data out there than any human could conceivably process, we've missed the point. Why should we keep all this data around without using it?*

A 2013 paper by Duck et al. found that most (more than 70 percent, in their data set) bioinformatics resources don't get mentioned more than once in the literature. That's quite intimidating! Is it a sign that too many bioinformatics projects are like hammers searching for nails, or are we, as human researchers, just limited in how many different resources we can use at once?

*A disclaimer: I don't think we aren't using all that -omics data, I just think it's underutilized.

October 16, 2015

Tiny particles

October 16, 2015/ Harry Caufield

Music for this week has been Hammock's Oblivion Hymns.

The whole album can be found here.

It's also here (the deluxe edition, no less):

It's the ideal background for a rainy Friday or possibly for an entire civilization slowly descending into a subterranean chasm.

October 13, 2015

Distant faces in the code

October 13, 2015/ Harry Caufield

The Little Printf

With just a bit of background, you can probably tell that the material at the link above is:

Essentially a version of The Little Prince
For programmers/coders/computer people

It'll probably make sense if you've never read The Little Prince but might not if you're not familiar with things like printf. Without spoiling too much, I can tell you that The Little Printf is about why we code, or more broadly, what we get distracted by in the process of using computers.*

I worry about this frequently. I worry about whether my work in bioinformatics will produce anything resembling real biological phenomena. I worry about whether anyone will ever read my work, and if they do so, whether they'll find it useful or even usable. I worry about the time required to write and maintain code and when new technology (perhaps a new sequencing method, or even a well-curated data set) will render my work obsolete.

OK, I'll spoil a bit here:

“In the end though, it is only when you solve problems with a human face that you can feel truly right; What is essential is invisible to the computer.”

Computers don't worry about how relevant their work is or whether they'll become obsolete. The people using them certainly do. Keeping the "human face" behind each problem isn't easy, though, especially with bioinformatics, where that human face may be several steps removed from the coding process. My work tends to have more of a bacterial face than anything else. So, instead, it ends up being a bit more about why we study biology in the first place.

What do we hope to find by swimming in all this data? What can we learn from it? How will that learning benefit humanity? Is the learning its own benefit, or is it a distraction? Can it be both?

*I am reminded, as usual, of _why's CLOSURE, though CLOSURE is immeasurably more opaque and potentially about how coding is inherently distancing. The theme of Why We Code remains.

J. Harry Caufield

J. Harry Caufield

severalog

J. Harry Caufield

Just Like Stop Me

ꜱɪʟʟy ᴛᴇxᴛ ɢᴀᴍᴇꜱ

A few bits to chew on

Tiny particles

Distant faces in the code

J. Harry Caufield