Yegor Bugayenko

Куликовы поля

2024-04-02T00:00:00+00:00

Я не историк и даже в школе историю не любил. Не любил, но верил, что так все и было: и Французская революция, и восстание Спартака, и война Аттилы с Римом, и охоты на мамонтов с последующими иллюстрациями на стенах древних пещер. Все это было именно так, как рассказывали нам учителя истории (они в моем детстве почему-то часто менялись). Позже, когда появился Internet, я узнал о Новой Хронологии и стал скептиком — перестал верить на слово тому, что говорят историки. Стал в любом описании событий прошлого искать альтернативные версии. Еще позже я узнал, что это называется конспирологией и в порядочном обществе осуждается, Пелевин не даст соврать.

Поединок Пересвета с Челубеем, Михаил Авилов

Спрашивается, можем ли мы достоверно утверждать, что именно происходило, скажем, шесть веков назад, когда ни Instagram ни даже книгопечатания еще не существовало? Достоверно не можем. Можем лишь предполагать, какие события могли произойти, опираясь на имеющиеся в нашем распоряжении артефакты и учитывая возможность их фальсификации. Чем артефактов меньше и чем они старше, тем больше возможностей совершить ошибку. А ошибки совершаются, и часто.

Например, до недавнего времени считалось, что Иван Грозный убил своего сына, ударив тяжелым царским посохом по голове. Откуда нам это было известно? Через пять лет после якобы убийства, Антонио Поссевино, папский легат и секретарь ордена иезуитов, опубликовал трактат “Московия”, в котором и описал события, о которых якобы имел точные сведения. Затем, через тридцать лет, когда у власти уже была династия Романовых, эту версию повторили, например, в “Хронографе” 1617 года. Еще через двести лет, Карамзин Николай Михайлович в своей “Истории Государства Российского” пересказал версию легата, а еще через полвека Иван Репин написал свою известную картину. Однако, в 1963 году, гробница царя и его семьи была вскрыта и очевидных признаков убийства обнаружено не было: в волосах царевича не было крови. Более того, были обнаружены признаки отравления, но не сына, а всей семьи. Таким образом, версия об убийстве, в которую верил весь мир более четырех веков, оказалсь лишь выдумкой итальянца закрепленной Карамзиным.

Однако, не стоит поспешно обвинять Карамзина. Попытайтесь поставить себя на его место. Ведь если взглянуть на ситуацию с его точки зрения: какой-то Иван Грозный, два с половиной века назад, убил кого-то, или нет — разве это имеет принципиальное значение? Задача была поставлена, заказчик ждал выполнения, а зарплата была внушительной. Так почему бы не написать, что убийство имело место, особенно если есть “первоисточник”, на который можно опереться? Пусть потомки рассудят, если им это будет важно: они могут провести эксгумацию, сопоставить факты, написать диссертации. “По крайней мере, я никого не убивал”, — так, вероятно, размышлял Николай Михайлович, задумываясь о том, соответствует ли истине рассказ того итальянца, о правдивости которого он не имел представления.

А если бы Карамзин пожелал докопаться до истины, не обращая внимания на возможную потерю гонорара, какие бы методы и инструменты он мог использовать? Каковы доступные историкам средства для выявления правды? Ещё более важный вопрос касается точности каждого из этих инструментов: какова их погрешность? Можно предположить, что арсенал методов у историков похож на тот, что используют криминалисты: на основе косвенных и прямых доказательств необходимо воссоздать образ событий, стараясь обеспечить согласованность улик. Также стоит принимать в расчёт показания свидетелей, при этом учитывая степень их заинтересованности в определённой интерпретации событий.

После завершения работы криминалистов следователь объединяет все собранные улики, реконструирует картину произошедшего и направляет материалы в суд, где обе стороны получают возможность высказаться. С одной стороны, адвокаты ищут неточности в собранных данных, стараясь убедить суд в том, что предложенная реконструкция событий неверна — например, что Иван Грозный не убивал своего сына. С другой стороны, прокурор усиливает доводы в пользу правдоподобности своей версии событий. В итоге суд формирует своё, субъективное мнение о случившемся и выносит вердикт. Этот процесс является частью судебного права, представляющего собой механизм установления истины в правовом государстве. Наличие доступной всем гражданам судебной системы — несомненный признак развитого общества. К этой мысли мы ещё вернёмся через минуту.

Пока же, от времен Ивана IV перейдем к ещё более древним эпохам и к событию, предположительно, случившемуся двумя столетиями ранее: грандиозному сражению 8-го сентября 1380-го года между народным ополчением Дмитрия Донского и ордынским войском хана Мамая, с огромным количеством жертв, сыгравшему поворотную роль в русской истории, и позднее названному Куликовской битвой. Несмотря на то, что сам факт массового кровопролития не вызывает сомнений среди историков, убедительно доказать, где именно это произошло, до сих пор проблематично.

Действительно, как можно восстановить координаты поля, на котором случилась кровавая сеча шесть веков тому назад? Где оно, поле это? Наверное, можно снова попытаться опереться на воспоминания “итальянцев-очевидцев”, но в то время не было ни атласов автомобильных дорог, ни Google Maps, да и параллели с меридианами еще не придумали. Даже если у битвы были очевидцы владеющие письмом, могли бы они описать координаты места так, чтобы сегодня их можно было бы найти на Яндекс Картах?

Умением письма кое-кто тогда как раз владел. Некто Софоний Рязанец. Он написал книгу Задонщина, как считается, еще при жизни Дмитрия, в которой рассказал о месте встречи православного князя с татарским темником следующее: “проли́тсѧ крóви наречьке напрѧде” где-то “между дóном и́ непрóм” (на фото Ундольский список, страница 185, выделение мое). Считается, что Софоний — наиболее достоверный свидетель тех событий, а значит нужно искать на карте упомяную речку, где-то рядом будет и поле. Вот только карты Руси XIV века с названиями мелких рек у нас нет.

А если бы такая карта была, я бы искал речку “Напряду”, где-то между Доном и Днепром. Однако, в переводе почему-то эта фраза звучит так: “пролится крови на речьке Непрядве!”. Как именно историки из “напрѧда” сделали “Непрядва”, мне сложно судить. Очевидно, что такой перевод вносит некоторую погрешность в ход дальнейшего расследования по нахождению места проведения битвы. Однако, это лингвистическое допущение ничуть не смутило Степана Нечаева, который в 1848 году обнаружил речку Непрядву на землях своего имения, предложил называть одно из своих полей Куликовым, и стал собирать средства на памятник. Ему наверняка позавидовал, например, владелец речки Навля (Брянская область), тоже подходящей названием под древнюю летопись, но инициатива уже была перехвачена.

Почти полтора века эта версия многих устраивала, до тех пор пока уже в наше время математик Анатолий Фоменко с коллегами, в рамках работы над Новой Хрологией, не предложили перенести Куликово поле в центр нынешней Москвы, аргументируя это в основном 1) большей целесообразностью, с военной точки зрения, такого расположения в те времена, 2) почти полной бесплодностью раскопков в имении Нечаева, 3) множественными захоронениями погибших воинов, обнаруженными на территории Сретенки (район Москвы) и 4) обнаружением практически всех географических названий (а их много!), упоминаемых в источниках по Куликовской битве, на территории Москвы. Например, по их мнению, речка в Москве, известная нам сегодня под именем Напрудная, и есть та самая речка “Напрѧда”.

Можно сказать, что и версия Нечаева и версия Фоменко или одинаково достоверны или одинаково недостоверны: и в том и в другом случае оригинальное название реки не соответствует ее современному названию. Однако, на мой взгляд, противоречий в версии Нечаева больше, хотя бы потому, что раскопки не дали никаких значимых результатов. Однако, историческое научное сообщество решило не воспринимать команду Фоменко всерьез и даже слова его версии на суде истории не давать. Например, русскоязычные страницы Wikipedia, посвященные Куликовой битве и Куликовому полю не упоминают Новую Хронологию вообще.

Сейчас речь не о том, где на самом деле находится Куликово поле. Меня, да и вас, уважаемые читатели, это напрямую никак не касается, и установление истины жизнь нашу никак не изменит. Однако, задумайтесь, в развитом ли обществе мы живем, если альтернативные версии исторических событий не имеют права на голос в “суде”? И дело даже не в государстве, не в Wikipedia, не в учебниках истории. Дело в нас с вами. Способны ли мы беспристрастно анализировать работы историков, критиковать их, спорить с ними, опровергать их версии, но при этом уважать их? Уважать их так, как справедливый судья в равной мере уважает как прокурора, так и адвоката. Или в нашей голове есть место только для одного “канонического” Куликова поля?

Позвольте закончить свою мысль одной поучительной историей. Несколько месяцев назад мне повезло иметь возможность пообщаться с Анатолием Тимофеевичем Фоменко и Глебом Владимировичем Носовским на своем YouTube канале, в формате интервью. Мы говорили в основном о Новой Хронологии, о их взгляде на историю и о сложностях, с которыми сталкиваются авторы альтернативных взглядов на нее. Через некоторое время после публикации обоих видео роликов я пригласил на интервью другого ученого, доктора технических наук, и получил отказ. Привожу его полностью:

Мне было стыдно читать это письмо. Стыдно за наших ученых.

У меня все.

Ping Me, Please!

2024-04-01T00:00:00+00:00

There is a big difference between distributed and collocated teams: the communication in distributed teams is asynchronous, which essentially means that when you ask something, a response doesn’t arrive immediately. Moreover, it may never arrive. This can be very uncomfortable for those who are used to the office work setup, where most communications are synchronous: any question is answered immediately, one way or another. In open-source repositories, everything is asynchronous. Here is a simple rule that may help you decrease the level of frustration in GitHub projects: ping them every time you need an answer or attention to be paid to your code.

10th & Wolf (2006) by Bobby Moresco

There are four basic scenarios:

You submitted a new issue. Post a message asking the project architect to pay attention. Otherwise, the owner of the backlog may miss your issue, and it will not be resolved for a long time.
You sent a new pull request. Post a message right in the PR, asking the architect to review it. Otherwise, the architect may simply miss the pull request, and it will stay in the “waiting for review” state for a long time.
You made changes in a pull request after code reviewers asked you. Post a message asking them to look at your code again. Otherwise, they may never see the changes you just made.
You post any message in an issue or a pull request. Start it with the nickname of the person you are talking to. Otherwise, the person who should help you may not receive a notification about your message, and it will be missed.

Also, listen to what Ben Batler from GitHub said once: You essentially never “walk over” to a coworker’s desk, virtual or otherwise. Whenever possible, prefer issues and chat over “just in time” communications. Simply put, don’t text me in Telegram.

Research Flow

2024-02-06T00:00:00+00:00

Say, you are a student, and I’m your teacher. Your task is to conduct an experiment or a study and then write a research paper about it. You can do it on your own and then present me with the results in the end. Sometimes it may work, but most probably it won’t. I will have many comments, suggestions, and plain simple disagreements with your research questions, results, or conclusions. Just like in software engineering, the Waterfall approach is not an effective one. Instead, an incremental and iterative workflow may yield way better results: you take a small step forward, we discuss it, you rewrite, we agree, and you take the next step. The ultimate objective is to write a paper that will be published in a good journal or presented at a decent conference. Well, yes, a passing grade is also an objective.

Республика Шкид (1966) by Геннадий Полока

Since the goal is a research paper, your first step is to create a skeleton of it in LaTeX. If you don’t know LaTeX yet, read LaTeX: A Document Preparation System by Leslie Lamport (just 242 pages). If you think you already know LaTeX, read this short list of its best practices and Writing for Computer Science by Justin Zobel (just 284 pages).

Now, create a document in Overleaf, and share a link with me so that I can also edit the project. Use one of the following templates (you should also create an empty main.bib file too): ACM (preferred), IEEE, and Springer.

Now, you are ready to begin your research incrementally, and I will review each step in the following order:

Research Questions
Research Method — how to?
Preliminary Experiments
Related Work — how to?
Results — how to?
Limitations
Discussion — how to?
Conclusion
Introduction — how to?
Abstract
Title

Each step produces a few new paragraphs in the LaTeX document. In this blog post, you can find recommendations for each of the steps. I strongly advise against moving on to the next step unless the previous one is discussed and approved. Doing so may result in greater frustration on your part when you’ve written almost the entire paper, and we both realize that the whole piece must be rewritten, and experiments must be redone.

Before we start, please put a date on each of the steps mentioned above and send me the entire work plan. It’s better to meet every milestone as a disciplined student; otherwise, the risk of failure will be larger.

I believe that you, the reader of this blog post, are an honest and motivated student who not only cares about achieving a passing grade but also values contributing to computer science. However, not every student fits this description. Surprisingly, some may lack motivation or diligence. To prioritize the enthusiastic and dedicated students who require most of my attention, I may halt a research project when I discern a lack of genuine commitment. The use of ChatGPT, plagiarism, and negligence may lead to an unfavorable assessment of your work. I strongly advise avoiding them.

I can’t speak

2024-01-03T00:00:00+00:00

Уже несколько лет мы проводим в России научную конференцию, приглашая в ее программный комитет ученых со всего мира. Последние два года, по понятным причинам, отказов много, особенно от западноевропейских ученых. Однако, интересно вот что: если раньше отказы содержали субъективный негатив вроде “I don’t want to participate in a Russian conference” (я не хочу участвовать в российской конференции), то последнее время они все звучат примерно так: “I can’t speak at your event” (я не могу выступать у вас на мероприятии). При всем уважении к драматизму ситуации, мне все же интересно, понимают ли уважаемые ученые, что те свободы, которыми так гордится европейская цивилизация, несовместимы с выражением “I can’t speak”, особенно из уст людей науки?

BBC

Последнее письмо я получил вчера:

И таких писем много, все в подобном стиле: я, мол, и рад бы, да не пускают, мешают, ограничивают и вовсе запрещают. Вот еще одно:

И это пишут не торговцы подержанными автомобилями, которым не разрешают провозить через границу старые тойоты. Это пишут ученые, многие из которых помнят и Берлинскую стену, и Пражскую весну. Они понимают, что такое свобода слова. Не могут не понимать.

Многие из них наверняка помнят цензуру времен Холодной войны, когда каждую статью, перед тем как опубликовать на конференции в лагере идеологического противника, нужно было согласовывать с соответствующими органами. Однако, даже во времена Андропова нельзя было представить, чтобы всем без разбора ученым было запрещено публиковать свои статьи и выступать на всех конференциях в стране “врага”.

Ученые всегда, даже во времена Карибского кризиса, Афганского конфликта, Вьетнамской и Корейских войн, были выше политики. Например, в 1977-м году в Новосибирске состоялась конференция IFIP посвященная “Построению качественного программного обеспечения”, на которой выступали (очно!) David Parnas, Barbara Liskov, Tony Hoare, Edsger Dijkstra и Peter Neumann. И все это на фоне куда более жесткого идеологического противостояния, чем сейчас.

По всей видимости, в те времена люди у власти понимали, что наука и техника — это наше общее, всего человечества, движение вперед, а пули и бомбы — назад. Видимо, они понимали, что изоляция ученых вредит сразу обеим сторонам — и тем, кого изолируют, и тем, кто изолирует, автоматически оказываясь в изоляции. Очень жаль, что люди власти в современной Европе этого не понимают.

Более же всего тревожит молчаливая покорность ученых, которые “can’t speak”, и которые наверняка понимают, к чему их и нас всех это приведет.

Defend Me Against ChatGPT

2023-12-26T00:00:00+00:00

I do enjoy ChatGPT a lot. The blog post you’re reading now was written by me and then given to ChatGPT to fix its grammar and polish the writing style. Until recently, since 2014, when I wrote my first blog post, I used the service of a few proofreaders, who charged me $20-40 per hour to rewrite all of my 350+ texts. Now, I pay a few dollars a month to OpenAI. However, while the value of this generative AI is obvious, I also experience serious harm from ChatGPT, especially when reading papers written by my students with its help.

Terminator (1984) by James Cameron

Should students be allowed to use ChatGPT when they write their coursework, diplomas, and research papers? Nature, The Wall Street Journal, The New York Times, and MIT Technology Review believe that despite all the risks, we have no other choice: students will use it, no matter what teachers think about it.

Indeed, why not? What’s wrong with letting kids write those boring documents faster? Nothing, if we ignore the obvious threat: most of them will never read what the robot wrote. They simply prompt a very short description of the task and get back a full-blown piece of text with all the necessary bells and whistles. Moreover, with the next prompt, the text can be made even more academic, sophisticated, smart, and deep. The text, not the student.

But it’s not the threat I worry about. I’m much more concerned about the quality of feedback teachers will provide to students equipped with ChatGPT or a similar paper-writing robot. My relatively short experience in teaching (just three years) tells me that the biggest challenge in teaching is quickly dividing students into smart+enthusiastic (20%) and unmotivated (80%), before the latter category entirely exhausts me, and I classify all students as “pointless waste of time” and give everybody an “A” just to get rid of them.

When students write papers by themselves, without the help of generative AI, they make mistakes that are easy to spot: the grammar is wrong, the structure is messy, the logic of the discussion is weak, and so on. Lazy and/or stupid students reveal themselves in the first round of paper review. I can quickly understand who I’m dealing with and stop paying attention to them. The students who are smart and enthusiastic win, because they get my entire attention. The unmotivated ones lose, … but who cares.

However, with the help of ChatGPT, the situation changes dramatically. Now, the papers I have to review all look perfect: the grammar is spotless, the structure is solid, and the flow of thoughts is logical. In other words, the unmotivated students now look like smart and enthusiastic ones, while they are not. Now, it takes much more time for me to understand who is who. Sometimes I can’t figure it out for weeks, especially if the teaching is remote and I don’t see students but only communicate with them in chats or conference calls.

I keep wasting my time on students who don’t care. All they need from me is a passing grade, but ChatGPT makes them look like promising talents who I should invest my time in. In the end, the students who really need my time don’t get it, … thanks to ChatGPT.

Thus, I see ChatGPT as a big threat to the education process and believe that very soon, tools that validate texts for the presence of generative AI in them will become powerful enough to defend me from ChatGPT.

Review a Research Paper: Constructive Critique in Five Steps

2023-12-17T00:00:00+00:00

I’m helping organize the ICCQ conference this year for the fourth time, with the in-cooperation support of the IEEE Computer Society. Based on this short-term experience, I can assert that reviewing research papers is a skill that even some reputable and experienced academicians either don’t possess or are too lazy to apply. We often encounter sketchy, subjective, and disputable reviews that don’t assist authors but only frustrate and discourage them. In this short blog post, as an absolute amateur in the subject matter, I will try to summarize how to review an academic research paper (thus mostly helping other newbies).

Берегись автомобиля (1966) by Эльдар Рязанов

Structure your review as a plain text of five paragraphs, each answering one question:

How did you understand the thesis of the paper?
What are the positive points?
What are the major inconsistencies?
What are the minor mistakes?
What’s next?

First, provide a brief summary of the paper. The main purpose of this paragraph is to ensure that you, the reviewer, have actually read the paper and understood what it’s about. Such summarization helps build rapport between you and the readers of your review—the authors of the paper, whom you intend to criticize. The better your summary, the more they will respect your negative points, taking them constructively.

Then, identify the positive points of the paper, again demonstrating that you have read and appreciate it. Here is a cheat list of the most typical merits a good research paper may have (most important at the top):

The idea is new and contributes significantly to the field.
The research question(s) are clear, well-defined, and relevant.
The method is robust, appropriate, and clearly described.
The paper thoroughly reviews existing literature.
The data found effectively supports the conclusions.
The results are significant.
The conclusion effectively summarizes the research.
The paper is well-organized.
The paper is well-written with clear and concise language.

Next, highlight the major inconsistencies. This list of typical mistakes may help you (the most severe ones are at the top):

Certain parts of the paper are obvious plagiarism.
The purpose of the research is unclear.
The need for this research in the field is not obvious.
The research questions are not clearly defined.
Literature review misses important related work.
Conclusions are not fully supported by the data.
Data is distorted or cherry-picked to fit a hypothesis.
The viewpoints presented are biased and/or lack balance.
Important details of the research method are missing.
Not all limitations or threats to validity are identified.
The structure is not coherent.
Some statements are ambiguous.

Then, mention minor mistakes. The difference between minor and major problems is that a minor problem is not a “show stopper”: a paper with minor mistakes but without major ones may be accepted for publication, while the opposite is not true. A paper with major issues must be rejected with a suggestion for rework by the authors. Here is a list of the most typical minor issues:

The abstract is inadequate.
Sources are misquoted or incorrectly cited.
The title of the article is inappropriate.
There are grammatical errors, typos, and unclear language.

Finally, conclude your review: what should be done next? Do you suggest publishing the paper? Do you think the authors are moving in the right direction? Should they continue working on this topic, or would it be better to abandon it for something more meaningful? Be honest and sincere; don’t be afraid of offending them: the review is anonymous anyway.

Obviously, I’m joking. It’s easy to offend an author, especially a young one. Thus, as a good reviewer, you must understand your mission: the review you provide should help the authors by encouraging and educating them. Making them feel miserable is definitely not the purpose of the review, though it is sometimes an unfortunate side effect. Try to minimize it.

Here is a toy example:

In their research, the authors claim that all programmers 
are lazy and selfish creatures, grounding their conclusion 
on a survey of 150 respondents.

Pros:
  - An important topic was addressed.
  - The reasoning is clear and concise.
  - The conclusion is very true.

Major cons:
  - Similar research done by Dean [1] is overlooked.
  - It's obvious that they are lazy; why another research?
  - Only Java programmers were interviewed.

Minor cons:
  - Typos and broken English here and there.
  - The font in most figures is too small.

Even though the subject of the research is important, 
I believe this paper is not yet ready for publication 
and requires significant rework.

[1] Dean et al., Programmers Are Super Lazy, 2022

Results and Discussion: Facts and Interpretation

2023-12-11T00:00:00+00:00

Almost every empirical research paper contains two essential sections: Results and Discussion. The former presents the facts collected through the research method, while the latter interprets them to answer the research questions. When interpreting the data, you must address the most obvious concerns that readers may have. For example, in the Results section, you might state: “85% of respondents refused to participate in our survey” (this is a fact). Then, in the Discussion section, you might say: “We believe that programmers are innately lazy and irresponsible” (this is an interpretation). You might also add, “Perhaps not all of them were lazy, but just busy.” While the Results section leaves no room for doubt, summarizing findings “as is,” the Discussion section engages in an open debate with an imaginative reader.

Mulholland Drive (2001) by David Lynch

In the Method section, you’ve already explained how you collected, processed, and analyzed the data. Now, in the Results section, you present the actual data collected and generated. The simpler the method of data representation, the better. Thus, in order of preference (with the last being your last choice):

Plain text
Lists (\begin{itemize})
Table (\begin{tabular})
Graphs and diagrams (\begin{figure})

If the data is too extensive to show in the paper, you can store it in a GitHub repository, create a Zenodo artifact, and mention its address in the Results section. For example:

\section{Results}

We contacted 135 programmers from three 
software companies: ACME Inc, Google, and
Amazon. We asked them kindly to answer
a short questionnaire of just 128 questions.
115 people refused, which is 85%.

Don’t forget to create a new GitHub repository with all the data that you collected during the research: CSV files, scripts, survey answers, and so on. When the repository is ready, publish it to Zenodo (this is how), and then cite it:

The full list of those who refused, along with
their names and home addresses,
is published in Zenodo: \url{https://zenodo.com/...}.

In the Method section, you posed several Research Questions. Now, in the Discussion section, you answer them using the data you’ve just presented in the Results. This is the time for an opinionated interpretation of the data: be brave and direct, yet careful.

When you’ve answered the Research Questions, you initiate a debate with your readers, imagining them asking difficult and important questions. The answers you provide are your speculation, imagination, improvisation, etc. Also, through the Q&A format, you acknowledge the limitations of your research and suggest potential future research topics.

Consider these questions (re-phrase them for your own context):

Why are these results important?
Why has no one discovered this before?
Is it possible that we made a mistake?
What data have we missed?
How else could the data be interpreted?
What’s next?

I suggest dedicating exactly one paragraph per question, starting with a bold-faced formulation of it, followed by your answer to your imagined opponent. Here’s an example:

\section{Discussion}

\textbf{RQ1: How many programmers are lazy?}
Since 85% of our respondents refused to complete
our short questionnaire, we strongly believe 
that most programmers are lazy.

\textbf{RQ2: Why are programmers lazy?}
Since the majority of programmers refused to complete
the 128-question questionnaire, we believe
they become lazy when confronted with a number
that is a power of two.

\textbf{Is it possible that programmers are 
just busy?} Yes, it's possible, but highly
unlikely, as \citet{x2019} previously found 
that programmers spend 90% of their office time 
reading jokes on the internet.

The more you overlook in the Discussion section, the greater the chance of your paper being rejected. Reviewers are often knowledgeable individuals with many years of experience in the field; they will certainly have concerns about your Method, Results, and answers to the Research Questions. If you don’t address these concerns explicitly in the Discussion section, they may think you are either concealing the research’s weaknesses or are not astute enough to recognize them. In either case, it could lead to a rejection of your paper.

You may find inspiration in these papers (use Google Scholar to download their PDFs):

Zhaowei Zhang et al., Diet Code Is Healthy: Simplifying Programs for Pre-trained Models of Code, ESEC/FSE 2022
Norman Peitek et al., Correlates of Programmer Efficacy and Their Link to Experience: A Combined EEG and Eye-Tracking Study, ESEC/FSE 2022
Jennifer Bauer et al., Indentation: Simply a Matter of Style or Support for Program Comprehension?, ICPC 2019

These opinions might also be helpful:

Guide to Writing the Results and Discussion Sections of a Scientific Article by Tyasning Kroemer
How to write the Results and Discussion by Michael P. Dosch
How to Write Discussions and Conclusions by PLOS
The Discussion by USC Libraries
Fourteen Steps to Writing an Effective Discussion Section by Paul Kretchmer
Discussion Section for Research Papers by Rowan Dunton
Distinguishing between Results and Discussion by University of Vermont

Be Indirect in Your Research Questionnaire to Gain More Honesty

2023-11-28T00:00:00+00:00

Let’s say you are conducting research to discover programmers’ opinions about their work environments: whether they appreciate their office spaces or not. Preparing a survey with a few questions is essential. Their responses will reveal their thoughts and feelings. After working with several student groups, I’ve noticed a common mistake in questionnaire design—they are too obvious with their questions, simply asking, “How do you feel about this?” There’s a more effective approach.

Coffee and Cigarettes (2004) by Jim Jarmusch

Typically, to understand people’s thoughts and feelings, we might ask directly, “What do you think and feel?” This is akin to a doctor inquiring, “What is your disease? What kind of pill should I prescribe?” While straightforward, this method suits a doctor less concerned with patient recovery.

Asking directly also exposes the survey’s intent. Savvy respondents may realize our research goals, potentially skewing results by conforming or sabotaging the study. Some might claim they enjoy their work environment, while others might express dissatisfaction. However, few will be entirely candid, feeling more like researchers than participants.

Here’s an example of an ineffective survey structure:

Q1: Is your work environment comfortable?
  - Agree
  - Neutral
  - Disagree
Q2: Do you feel tired at the end of an office day?
  - Agree
  - Neutral
  - Disagree
Q3: Do you enjoy working in the office?
  - Agree
  - Neutral
  - Disagree

A skilled doctor, rather than directly asking about diseases, inquires about symptoms: “How often do you urinate?” or “Are you thirsty upon waking?” Similarly, in empirical computer science studies, we can engage respondents with hypothetical scenarios.

By asking respondents directly, we inadvertently shift our research responsibilities onto them. Our role is to determine if they enjoy their office space. We should observe their behavior, symptoms, and reactions to draw conclusions. Simply asking, “Do you feel comfortable?” suggests a lazy or inexperienced interviewer. Responding to such a generic question, respondents will have to put together their entire experience of being in the office, analyze it, make some conclusions, and then summarize them for us—this is the work we researchers have to do, not our interviewees.

Consider this revised questionnaire:

Q4: With a looming strict deadline, where would you 
prefer to work on a critical software module?
  - At home
  - In a café
  - In the office
Q5: When did you last feel exhausted at the end of 
an office day?
  - A few days ago
  - A few weeks ago
  - Don't remember
Q6: How would you rate the office coffee 
machine's quality?
  - Excellent
  - It's OK
  - Poor

The first two questions, Q4 and Q5, are situational, placing respondents in specific scenarios. We then interpret their reactions to deduce answers to our primary question: Do they like their work environments? This interpretation method should be clarified in the research paper.

Question Q6, while not situational, is superior to Q1-Q3. It avoids asking respondents to self-diagnose, subtly probing their opinions about office coffee machines. The responses indirectly indicate their overall satisfaction with the work environment.

In summary, avoid directly inquiring about illnesses; instead, ask about symptoms to discreetly pursue your research objective. This approach elicits more honest responses.

When the list of questions is ready, you can draw a table in your research paper, listing all questions on a vertical axis and possible answers on the horizontal one. Under each answer you mention the impact it makes to one of your research questions, for example:

	A₁	A₂	A₃
Q₄: With a looming strict deadline, where would you prefer to work on a critical software module?	At home RQ1	In a café	In the office RQ1
Q₅: When did you last feel exhausted at the end of an office day?	A few days ago RQ1	A few weeks ago	Don’t remember RQ1
Q₆: How would you rate the office coffee machine’s quality?	Excellent RQ1	It’s OK	Poor RQ1

This table clearly explains to readers of your research, why did you ask these questions and how the responses provided by the respondents helped you answer your research questions.

Avoid Soft Line Breaks Inside a Paragraph

2023-11-22T00:00:00+00:00

An email, a document, a research paper, a presentation, and even a JavaDoc code block consist of paragraphs, which are “self-contained units of discourse in writing dealing with a particular point or idea.” Visually, paragraphs are supposed to be separated by a vertical space that is a bit larger than a vertical spacing between lines. To achieve this, for example, in HTML, we wrap paragraphs in the

tag, while in LaTeX, we use \par or just an empty line between them. However, some people insert what are called “soft line breaks” inside paragraphs—this is a bad practice that I suggest you stay away from.

Prêt-à-Porter (1994) by Robert Altman

This is how a paragraph should look, in HTML (no soft breaks, just

and

Tyler gets me a job as a waiter, after
that Tyler's pushing a gun in my mouth and
saying, the first step to eternal life is you
have to die. For a long time though, Tyler
and I were best friends. People are always
asking, did I know about Tyler Durden.

This is how it would look with soft breaks () after each sentence:

Tyler gets me a job as a waiter, after
that Tyler's pushing a gun in my mouth and
saying, the first step to eternal life is you
have to die.

For a long time though, Tyler and I were best
friends.

People are always asking, did I know about
Tyler Durden.

Don’t do this.

Let the software format paragraphs for you, deciding where their lines must break to form new lines. By injecting linebreaks into the body of a paragraph, you express distrust in the document formatting software, be it an HTML browser, a LaTeX compiler, or a Javadoc generator. In the end, it looks ugly, because we are much worse designers than the creators of LaTeX or browsers.

The Method Section: A Recipe for Research

2023-10-11T00:00:00+00:00

Every empirical research paper must have a section titled “Method” (or “Methodology,” or “Study Design”). In this section, you describe what was done to obtain the data presented in the following “Results” section. You explain the recipe, which may be replicated later by another researcher, leading to the same (or very similar) results. You tell the reader what ingredients you used, how you mixed them, and—most importantly—why.

Underground (1995) by Emir Kusturica

You start the section with a paragraph where you state the main objective of the research, then break it down into a few research questions.

Then, you explain the procedures of the method (strictly one procedure per paragraph). In each step, you either collected, combined, or generated data. First, you explain what you did. Second, you highlight how your procedure contributed to one of the research questions. Third, you justify your actions by providing strong enough reasons for why you performed these specific manipulations with the data.

Use past tense only.

Here is a toy sample of the Method section:

\section{Method}

The goal of this study is to understand whether 
cats love fruits. This leads to the following 
research questions:   
\begin{description}
\item[RQ1] What is a correlation between the color 
of a cat's fur and its passion for fruits? 
\item[RQ2] Which fruits are preferred by cats: 
bananas, apples, or marakujas?
\end{description}

First, we found 15 cats: 2 white, 3 black, 
and 10 of mixed color. It is important for RQ1 
that they are of different colors. We believe 
that 15 is enough because this is a toy research.

Second, we excluded 5 cats: those who were 
younger than one year old or older than 8 years 
old. This was motivated by RQ2; we believe that 
young and old cats may have difficulty cracking 
the hard cover of a marakuja.

Third, we gave our cats all three fruits mentioned 
in RQ2, left them for an hour, and observed their 
behavior. We believe one hour is enough for a hungry
cat to make a decision.

All cat owners agreed to have their cats 
participate in the study. 

At the end of the section, we mentioned that all participants in the experiment provided informed consent—this is important if humans (or cats) are involved, so don’t forget about it.

In the “Results” section, which follows the Method, you present the data that were collected, combined, or generated (without giving any opinion or subjective interpretation of it!). Some of this data may have already been mentioned in the Method section, but not the most important details. For example, we’ve already said that we found 15 cats, but we didn’t provide their names, ages, or breeds—this information goes into the Results, in the form of a nicely formatted table. How much “results” to show in the Method and how much in the Results is, I believe, a matter of taste.

In the “Discussion” section, which follows the Results, you engage in a dialogue with yourself, questioning the procedures of the Method. This is where you are allowed to have an opinion about the data collected, combined, and generated. For example, we may discuss whether the results of our research are trustworthy enough, taking into account that we only analyzed the behavior of just 15 cats, while in the Method, we were absolutely sure that we were doing the right thing. In the Discussion, you play the opposite role by doubting every single step of the Method, highlighting its weaknesses and limitations.

You may find inspiration in these papers (use Google Scholar to download their PDFs):

Melina Vidoni, Evaluating Unit Testing Practices in R Packages, ICSE 2021
Reem S. Alsuhaibani et al., On the Naming of Methods: A Survey of Professional Developers, ICSE 2021
Anastasia Ruvimova et al., An Exploratory Study of Productivity in Software Teams, ICSE 2022

These opinions might also be helpful:

How to Write Research Methodology by Imed Bouchrika
Writing the ‘Research Methods’… (video) by David Russell
How To Write Methodology… (video) by Marek Kiczkowiak
Formulation of Research Question by Simmi K. Ratan