An Essay about Scientific Articles (work in progress)
Oct. 26th, 2026 @ 07:31 pm
George Orwell, "Politics And The English Language"
(i) Never use a metaphor, simile or other figure of speech which you are used to seeing in print.
(ii) Never use a long word where a short one will do.
(iii) If it is possible to cut a word out, always cut it out.
(iv) Never use the passive where you can use the active.
(v) Never use a foreign phrase, a scientific word or a jargon word if you can think of an everyday English equivalent.
(vi) Break any of these rules sooner than say anything outright barbarous.
I would like all of you computer scientists out there to read this list, read it again, and repeat until you've internalized the idea that the point of technical writing is to convey information. A sentence like "It is envisaged that the algorithm is to be applied to real-world problems." should hurt. This sentence was from the first page of the nearest computer science paper I had on hand.
To an outside observer, it seems like the goal of publishing in computer science is to snow your audience as much as possible while sounding plausible enough that the reviewers can at least pretend to understand. Passive voice is everywhere, to make the paper seem professional. Researchers use field specific (indeed, sometimes even *lab* specific) jargon instead of common words; reviewers seem to prefer new jargon over accessible writing. Given that many researchers struggle to conform to strict length requirements, the pathological use of the typically less succinct passive voice is even more confusing.
Finally, there is the math. Math is wonderful stuff, not to mention frequently the meat of the paper. However, using complex mathematical notation without explanation or to convey a simple idea is at best obnoxious and at worst criminally arrogant. If your result is an ugly expression, fully define all of your variables. Don't expect a reader to track back through your adviser's c.v. to figure out your parameters.
(An aside, while we're on the subject of variables: There are 26 letters in the Roman alphabet and 24 in Greek. Taking out letters that are obviously bad variable names (In math: capital and lower case pi, capital and lower case sigma, i. You do take these out, don't you?), you are left with 52 24 8 - 6 = 78
characters left to name your variables as you see fit. There is no excuse for writing a paper where B,B
, and B
are different unrelated variables. When I rule the world, the punishment for this will be hanging by your fingers until they fall off.)
To be continued...
"It is envisaged that the algorithm is to be applied to real-world problems."
"Hey, it looks like this could be a useful algorithm!"
:) ::reads medical papers, which are quite possibly worse::
Also: capital and lower case o and omicron, and a bunch of capital Greek letters that look exactly like their Roman counterparts.
Not that that changes your basic point, with which I agree wholeheartedly.
|Date:||October 27th, 2006 03:56 am (UTC)|| |
I took out all the Greek caps that look Roman already (hence the 8 added in). o nd omicron ::shudder::....
When I rule the world, the punishment for this will be hanging by your fingers until they fall off.
I could be wrong, but it seems like the punished would fall off the nails rather than the other way around.
|Date:||October 27th, 2006 04:11 am (UTC)|| |
|Date:||October 27th, 2006 04:07 am (UTC)|| |
Unfortunately, it is easier to complicate an easy thing than it is to actually do something complicated. Thus, things that are pretty easy but explained in a difficult manner are very frequently attempted to be published. Then, the law of large numbers says that at least a small percentage of these make it through, and since such a large percentage of submissions are this easy stuff...
|Date:||October 27th, 2006 04:09 am (UTC)|| |
You're close to graduating, aren't you? ;-)
Yeah. One of the things that really annoyed me is that papers that were deriving something that would show most of the fairly trivial or well understood step, then hide the actual important step in an 'assumption' that isn't mentioned or only mentioned in passing.
I ran into that a number of times and got pretty fed up.
I think people like to use lab-specific jargon because they are hoping to coin a term ...
|Date:||November 1st, 2006 01:06 am (UTC)|| |
That just exacerbates the problem! We have plenty of terms! If there is a ready, non-ambiguous alternative there is no reason to throw a new term into the mix.
|Date:||October 27th, 2006 07:14 am (UTC)|| |
Let me just add: LABEL YOUR GODDAMN AXES. With units. Or else.
Also, as I was bitching about earlier to amoken
, when one paper says that the average foo is 5, and then in your paper you say "... according to [ref], most foo are 5," you are committing (at least) a venial sin.
|Date:||October 27th, 2006 07:11 pm (UTC)|| |
Units? What are you trying to do, get me to fall in love with a paper? Next, you're going to suggest that they use real data instead of simulations.
|Date:||October 27th, 2006 11:31 pm (UTC)|| |
sometime how he feels about "results" taken from simulation. You might want to be outside of the range at which he can conveniently throw things at you.
I hear you loud and clear, but I doubt the problem is exclusive to computer science. It's prevalent enough in physics.
|Date:||October 27th, 2006 07:10 pm (UTC)|| |
I started by writing "science", but that felt I haven't read papers in enough disciplines to make so broad a claim. That, and molecular biologists actually write pretty well :-P.
|Date:||October 28th, 2006 10:16 pm (UTC)|| |
What is the purpose of scientific writing? Is it to advance human knowledge? Or is to to show big vocabulary, ability to govern complicated sentence structures, and generally demonstrate one's brilliance? If I'm eventually going to become a research scientist, clarity is not my top priority.
|Date:||November 1st, 2006 01:07 am (UTC)|| |
Alright, who's baiting me?
|Date:||November 1st, 2006 04:51 am (UTC)|| |
I'm not trying to bait you. I am
agreeing with your statement "To an outside observer, it seems like the goal of publishing in computer science is to snow your audience as much as possible..." It's not just scientific articles, of course. Dressing up trivial and false ideas with fancy language is a useful skill, a skill that will serve these people well as they enter industry and sell software snake-oil to semi-technical decision makers.
|Date:||October 31st, 2006 12:21 am (UTC)|| |
I have a mild issue with (v)...mostly that we could reduce almost everything to "everyday" words, but then we'd have to spend a lot longer explaining due to contexts, ambiguities, etc. If there's a convenient one- or two-word substitution with acceptably low ambiguity, I'm all for it, but frequently the jargon came about because there wasn't
. And, of course, we all see different divisions and tolerances here, which cause us to argue until we're blue in the face about whether something is "equivalent". In other words, "equivalent" is a poor guideline, though it does ask the right question.
|Date:||November 1st, 2006 01:09 am (UTC)|| |
Fair enough. Maybe a 30 second rule. If it takes your intended audience less than 30 seconds to disambiguate the sentence, leave it as is. (And with that, if it takes them on average more than 30 seconds to look up your arcane language, ditch it.)
|Date:||November 3rd, 2006 04:52 pm (UTC)|| |
achieving clarity through overlay of narrative arc
"It's because these articles aren't written with conflict and drama. If they had character development and suspense, you'd remember