(a) Getting started.
Many years ago I worked with a team of students who claimed to have little experience with writing and were quite perplexed about how to write a research paper. But they were good at coding. So I asked them if they were comfortable writing in C++ and they said "of course". So I told them we were going to write something new, but instead of writing in C++ we were going to write in "English". How is this helpful? Many STEM students claim to be uncomfortable writers but they understand how to learn good style and formatting for programming by reading other people's work and using structure from others work. This same idea can be very helpful for the novice paper-writer.
If you are a novice science writer, find examples you consider to be good papers in the journal of interest. Follow the format of those papers. You will not be copying the paper but you will be paying attention to the different sections, how long they are, how much detail is used for references, methods, equations, proofs, figures, etc. Try to present your own work in a similar style.
(b) Content style. You will probably need several iterations of the content until it reads really well. Avoid long run-on sentences. Avoid using the same word or phrase multiple times in the same sentence or in adjacent sentences. Let the content be your friend in this matter. Simple sentence structure - subject-verb-object is your friend in the first or second iteration of the content. More complex sentence structure can be worked out later when you are ready to have some nuance in the writing.
(c) Tense. Please make sure your tense is consistent throughout the paper. The simplest thing to do is to use present tense everywhere. You do not need to use past tense when referring to a previous section of the same paper. Work done 20 years ago can have past tense. When proof-reading your paper for tense consistency, you can check for the word "will" and for verb conjugations that end in "ed". Note that these two things will not find all of the future and past tence. For example some common verbs such as "to do" have past tense (e.g. did) that do not have "ed" at the end. So read carefully when proof-reading for tense consistency.
(d) Articles. Non-native English speakers, especially, - watch your articles - e.g. the, a, those, these etc. The difference between a and the can change the connotation of the sentence thereby altering its intended meaning.
(e) Learn to understand the difference between denotation and connotation. Denotation is a precise statement. Connotation is the use of language to imply something without saying it directly. English is full of both and both can be used effectively in scientific writing but also should not be interchanged because they have very different effects. Misuse of these things could actually lead to incorrect things being said in your paper.
(f) Language models. Suddenly in the last few years we see the use of Language models to help write papers. Be very careful if you want to use this technology! First of all you have to acknowledge it if you use it. This is imperative. Also do not assume that chatGPT or similar is going to produce a better style than careful writing the old fashioned way! Chances are it won't. I have a recent paper with a student and after the first draft was done with a LLM, I had to rewrite all the text to put it in better style than the LLM did. Also I note that these things also hallucinate so you could end up with false information in your paper if you rely too heavily on automated methods like LLMs.
On the other hand, for a research paper, the background should have a different sort of content. This is not a pedagogical paper. However you want it to address a certain readership. So it is important to put yourself in the shoes of the broader readership of the journal and ask yourself - what do you need to tell them in order for them to want to keep on reading your paper. To little info and they will be lost. Too much and they will fall asleep because they won't have time to get to the new original work. Referees are known to dock authors who spend too much time on background info - because they are concerned that the work is just not original enough.
(b) Rigorous proofs - how much detail? How to format? You want to the ideas to flow and sometimes this can be hard when you have some technical lemmas or propositions needed to prove a major theorem. Here you can use appendices to put in details of proofs that are needed but not central to the new argument. A prop or lemma that is a twist on something that is already in the literature - that can often go in an appendix and you can refer to the proof as appearing later in the paper. The details of the main theorem you want to keep in the body of the work especially if this is central part of the original research.
When writing a rigorous proof for a journal you want to avoid repeating ad nauseum details that are consistently the same idea - you can punt a bit here - by saying - following the same argument as in ... we can also derive this formula ... but at the same time do not make the mistake of assuming that the reader has the same expert level that you, who did the original work, have. If it is an exercise for a reasonably good graduate student trained in X, some details could be removed - in an advanced math paper. If your audience is scientists rather than mathematicians, be careful not to snow the scientists too much - some degree of hand-holding on the math may be warranted in favor of readership for a science paper.
(a) Punctuate your displayed equations correctly!!! Yes in fact your displayed equations are actually complete sentences. E.g.
Ax = b
has the subject Ax, the verb equals (=), and the object b. This means you put a period at the end, in the display, if it is the end of the sentence. Sometimes you want a comma rather than a period and sometimes you want no punctuation. Go through and check all displayed equations for correct punctuation.
(b) Section and chapter headings. Make sure that the capitalization of these things is consistent across the different section types and consistent across the headings. If you have a table of contents (e.g. in a dissertation) you can easily scan this on one page, but otherwise you have to go and check it by hand.
(c) The use of natural numbers in the text. As in there are two pets living in my friends house but my in-laws have 43 birds. Note that the number two is written out and the number 43 in Arabic form. The old fashioned rule is that natural numbers with one digit are written out in English and two or more digits are written in Arabic format. Also to the best of my knowledge, my in-laws do NOT have 43 birds, however that example is a simple one to state and it conveys proper use of the number.
(d) For compound sentences, learn the difference between phrases and clauses and how to punctuate them correctly. Learn when too much compound is officially a run-on sentence and thus requires altering. These details are too complex to describe here but should be learned by students and used properly in writing. Generally speaking, if your reader has to read the sentence more than one to make sense of it and digest its content in relation to other information in the same paragraph, the sentence needs restructuring.
(e) That vs. Which. These two simple words are commonly misused in scientific writing. The general rule is that "that" refers to specific things and "which" refers to a qualification that is true whether you state it or not. For example - I like to swim in oceans that are warmer than 70 degrees. Note that there is additional content here specified after the word "that". Alternatively we could say - the 2024 Jeep Cherokee, which comes in the colors red, blue and black, is selling very well this year. The information after which does provide extra content, but the content is not needed to impart the new information conveyed by the sentence. The reader could go and find out the colors being offered using a different source of information. One of my previous English teachers said that you should use the word "which" if you can insert ,by the way, after which and the meaning is still the same. So in the above example we would say "The 2024 Jeep Cherokee, which, by the way, comes in the colors of red, blue and black, is selling very well this year. [Note that the content of this sentence is fiction, made up for the purpose of demonstrating the use of "which" and should not be interpreted as information related to your upcoming car purchase]. The same use of "by the way" does not make sense in the sentence about swimming in warmer oceans.
(f) Possessive. 's is used for possessive. Except when the word being modified is plural. The cat's fur is dirty - refers to one cat. The cats' fur is dirty refers to multiple cats and presumably from the context, which cats you are speaking about. You will go far just adhering to this rule.
(g) Contractions. I won't write any more because you will not stop pestering me about writing. Here the correct use is "will not" for formal writing. In an informal email to someone, go ahead and use contractions. But for formal writing try to avoid the contractions. Note that contractions are not possessives. They are different things. Also not that it's is a contraction and means "it is" in certain contexts. Such as - It's going be a lovely day today. Here you should write: It is going to be a lovely day to day - in formal writing.
(h) The gerund. One way to avoid run-on sentences is careful use of the gerund. This the form of speech with the "ing" at the end. As in - My cousin Bob, while walking to school, found a four-leaf clover. This is a better way than saying - My cousin Bob walked to school. On the way he found a four leaf clover. In scientific writing, the first way results in several ideas in one sentence that is easy to read. It does not require reading it several times to understand it and it allows the reader to move on to the next sentence with new ideas. Be careful about overuse of the gerund. My personal experience is that my students tend to under-use the gerund in favor of longer and more sentences.
(i) Short sentences. Never underestimate the power of a short sentence. If you have a strong message that you want to convey with a punch - put it in a short sentence. It stands out. At the same time, if you happen to write something in a very short sentence, it WILL stand out whether you want it to or not. So think carefully about sentence length in terms of conveying ideas. In formal writing, sentence length play a role similar to voice inflection in spoken words.
(j) Spell check. If you are using a tool such as overleaf to write your latex paper (or are writing in msword or similar), the software often comes with an automated spelling checker. Please review all of the text or pre-compiled code to make sure you do not have spelling mistakes. If you are using latex native to your computer, then you have to spell check the source code directly. On linux machines there is a terminal command call ispell that you can run. You type ispell file.tex to check the spelling of the words in the tex file file.tex. It will come back and show you the questionable word with a list of keyboard prompts asking what to do with the word. Just follow the menu items as it scrolls through your document. You can also use this to edit the source code for html documents on web page (I just used it to check this web page).
(b) Font in figures. The font for tick labels, tick marks, and axis labels should appear stupidly HUGE when you are editing the figure on the screen. This is because it will shrink down to a smaller figure in the publication. The version that shows up in the paper should have the font no smaller than the font that shows up in the figure caption. Do not assume people are going to be able to read tiny font unless you only want people younger than say 39 to read your paper.
(c) Line width. Use a wide enough line width so that when the figure is shrunken down in the actual paper that the lines are easy to read. You may need to make them wider in the original figure.
(d) Save your figure data and figure formatting! If you are working on figures, save everything. The data AND code to generate the figure. If you don't know what I am talking about as far as the latter is concerned - check your software for figure generation. It should allow you to save a formatting file along with the data - this file has information like the tick label sizes, font sizes etc. It will allow you to make small changes to the figures quickly and easily as you edit the paper. And save the revised version again!!! Everyone working on figures for papers should learn how to save this data before doing any substantial work on figures. This will save you a lot of time later on and you will be grateful you did it like this.
(e) Captioning. The figure should stand alone. This means that the content of the figure should be self-explanatory in the caption. Readers should be able to understand the figure and its purpose without having to read the text. So tell them what the data is in the figure caption. In WORDS. If it refers to a specific equation in the text, cite that equation. Generally you should try to avoid interpretation of the figure in the caption - use the text for that. Keep the figure caption to basic facts about that particular figure. Comparatives and interpretation can be put in the text.
(f) Reproducibility of your data. Respect the gold standard for science which is reproducibility. So put in enough detail so that someone could read the paper and go and reproduce all your data from scratch. If you generate the data from a computer code - consider putting the code and the output in a public place like Github or similar. Some journals now allow you to have auxiliary information included in the paper and the code and data could be such a thing. This does require extra work but it will result in more robust research that is much more likely to be cited especially if people are downloading your code and and using it.
(a) Is this reference the arxiv version? Then you are not done! You have to go and hunt down to the best possibility of a final published version. This means check the website of the authors. Do a google search on the title (not google scholar but google itself - because scholar is not updating in real time). Look in several places and look at the second and third page of the google search, not just the first page. Once you have exhausted all things you can think of - and do not be lazy about this - then you can go ahead and use the arxiv version. But not before that time. A careful author will list the final published version on the arxiv but many people do not bother to do this so you have to do your own hunting.
(b) Read every single title for proper names. Like Dirichlet. Gaussian. Laplace. Fourier. Los Angeles. You get the idea. Also acronyms that should be capitalized. Note that bibtex information downloaded from journal websites is essentially a copy and paste from a general bibliographic database. This means that the title of articles will not necessarily be formatted correctly for Bibtex. Thus you have to HAND EDIT all the titles of papers to guarantee that the capitalization comes out correctly. Put a {} around the proper name in the title field. So, for example, a paper titled "Machine learning for street data in Los Angeles County" needs to have a bibtex entry like this:
title={Machine learning for street data in {Los Angeles County}},
(c) Check every published paper listed in your references to make sure that you have the title, volume, year, pages etc. and that they are printing correctly. By "Check" I mean that you should read the compiled PDF file and check the references printed there line-by-line looking at each line carefully for missing or incorrect information. Do not assume that if the information is in the bibtex file, that it will compile correctly in the printed version of the manuscript. A simple typo in the bibtex file could lead to major changes in the output, so you have to read the compiled version WITH YOUR EYES and check it to make sure that the ref is complete. Common mistakes are things like using the incorrect reference type in the bibtex file - e.g. using preprint instead of article format, once the article is published. You need to change it to article if you want all the fields to compile correctly. Also you need to use the correct syntax for the author names in order for it to compile correctly. This is why you have to read the PDF version carefully to check for details.
(d) Check for duplicate refs. This is easy if the refs are alphabetical. If not then compile a version with alpha refs to check them for duplication. If you need sequential referencing then recompile again with the other formatting.
(e) Double check whenever you edit a reference. This means that after changing a reference - you have to recompile the paper and go a read the reference section to make sure that the material you added or changed appears correctly. If you do not do this - you run the risk of a small typo suddenly cutting out half of your references!
(f) And finally, last but certainly not least, is the content. Be fairly generous with your citations if you are allowed to include many works. Enemies are made by not citing other people's work. So avoid making enemies. Why start this? What is the point? Also references are used by students and other reading your paper - you are doing them a service by putting in good references. References are also looked at by journal editors when trying to identify appropriate referees. At the same time be reasonable. You are not required to go back two hundred years with prior science that led up to your work. The most recent work and most relevant may be fine for citations. It really depends on what context you want to provide with your paper. These days some authors subscribe to automated email about papers that cite their work. So if you want people in the field to read your paper, one way do accomplish this is to cite their work and they might be prompted by Google scholar or similar, to look at your preprint or reprint online. But do not do this gratuitously. Use citations judiciously when they coincide with content.
If you have read this tutorial in its entirety, you may feel that this list is too fussy or is going to create a lot of work for you beyond what you expected in writing your first paper. You would be correct. But also this is not a complete list. If you have a research mentor - please check with them about any additional items they suggest that you check. I am also not including scientific details related to content - for example checking the mathematical details in your proof or checking for bugs in your computer code. These are not writing details they are details in the research and should also be checked. Another takeaway, after reading this, is that good writing takes effort. If you are writing your very first research paper, you should expect that the amount of time you take on the writing, compared to the research and discovery, is quite long. The fun part of the work may be only 10-20% of the total time. At the same time, if you do not write up this work and publish it in the peer reviewed literature, the work may be forgotten and it is as if you never did the research. So consider the writing part of the work to be integral to the entire job. Also good writing is a craft unto itself. If you can master it, you will be proud of your work and you will be able to teach the next generation of researchers how to do it properly.