Read

If you are pursuing a PhD degree like I am, you need to read papers. A frequent problem that I have is that I read and forget in a few days. This might be acceptable for certain papers that are not important; but it is not acceptable for papers that are listed here. First of all, I have to pass the PhD qualifying exam. Since I obviously cannot read all the fifty papers within three days before the exam, I have to distribute the papers to a long period. I am likely to forget some of the earlier ones. Second, these fifty papers and many other papers are important to my research. With a good understanding and memory of the papers, I might be able to develop better/newer ideas to solve problems. Knowing what is old helps to identify what is new.

I want to remember the key ideas of a paper, the problems the authors try to solve and the challenges they meet. When I forget, I want to have something that can help me to recall it. When I need to take an exam/talk the paper with someone, I want to have something that can help me to recall more details of the paper - what is the structure of the paper? What is each paragraph about? Do I understand them? What are the limitations? What can one infer/generalize from the paper?…

Although it is not about memory, recalling the paper is expected (by me) to get a better understanding of the paper, because you can only recall when you understand it. The key of reading a paper is to “re-implement” the paper, or “say it in your own words”. Often times, you think you understood. But you don’t really do until you can say it from your own words, which is equal to “answer an exam question”, “tell someoner you idea”, or “write a paper”. It forces you to truly understand the problem. One side benefit of “say it in your own words” is that you can review the problem by reading your own words, which is much faster than reading the paper again. To force you say it from your own words, you can imagine that you have an exam tomorrow and all the questions are from the paper you are reading.

Example

I take a paper I read recently as an example. I do the following when reading a paper.

  • Put the paper to Evernote and tag it with "paper", "to.read.list". Set an appropriate priority for it.
  • Print and mark on the paper while reading. Holding a paper paper is better than holding a mouse and looking at an e-paper because: 1. easier to draw graphs for understanding, 2. easier to write a few words (in a pdf editor, you need to find the mouse, move to a insert position, switch to keyboard, type, adjust the position. Often you have to use sticky note tool and CLICK to see your comments.) 3. less distraction. you mouse may "accidentally" move to Facebook, Twitter... or some notifications may pop out and drag you away. Shut down/sleep your computer while reading a paper. Regarding marks, somebody says using a highlighter is better than underlining. I will try that.
  • Summarize each important section, paragraph and write it down on the margin. This is a part of "say it in your own words".
    • Be critical. Imagine you are re-implementing the paper, you may find out the limitations, false assumptions/conclusions..
    • Be creative. What are the extensions of this technique? How can it help my research?
  • Scan the paper and put it to Evernote. Tag it with "paper", "have.read". Evernote can search in your scanned PDFs.
  • Write a summary of the whole paper in Evernote. This helps you recall the paper after you forget it. Sometimes, you can write down all your understandings and details if it is really an important paper. When you try to summarize, you are being tested if you have fully understood it. Doing it in Evernote is because typing large paragraphs is faster with a keyboard. And it is easier for Evernote to search.
  • Sometimes, make a mindmap, scan it and put it to Evernote. Mindmap shows the structure of the paper well. Also, you can draw graphs to illustrate..

The paper that has been marked is here. This is a summary that I wrote:


A Fast file system for unix

This is a very important paper in FS field. It presents techniques to improve the Unix file system’s throughput by introducing cylinder group and cluster related data into the same group, by increasing the block size for better locality (larger single transfers), by enabling two block sizes for better fragment management. It compares the old file system and the new file system (presented in this paper). The old file system has a very low utilization at about 2%. The new file system can have up to 47% due to less disk seeks.

Section 2 talks about the old file system, why the old one is bad. It serves as the motivation of the paper. Something needs to remember. Small block size is bad because the blocks are often allocated randomly due to a bad free block list and the allocation algorithm. In addition, when there are two files being written gradually, blocks are allocated for the two files interleavely. The blocks are likely to be not contiguous. Having bigger block size directly increases the contiguousness of the data, at the price of wasting more space potentially. The waste can be mitigated by using two block sizes, at the prices of bigger metadata and management cost. Bigger block size also reduce indirect blocks which introducing more seeks. The design of the things above is talked in 3.1.

The Section 3.2 describes how to design a CPU- and disk-aware file system. Consecutive allocation is NOT always the best! This is because of CPU. If a CPU has I/O channels, then consecutive blocks are good because CPU does not have to interrupt and prepare for a new I/O transfer. If CPUp does not have I/O channel, then blocks should be set apart in a proper distance. After the CPU is busy preparing and disk head moving, the head moves to the right position for the next access and can start accessing right there.

Layout policies include a global one and a local one. The global one tries to cluster related data and spread unrelated data. Basically, inodes of files in the same directory are placed in the same cylinder group. inodes of directories to where there are lots of free inode space and little existing directory. The local one optimizes locally, trying to allocate the next available inode for file. For data, it tries to place data in the same cylinder group.

Performance. In the old file system, write is faster than read because the data is buffered and the block allocation is simple. In the new file system, write is slower because block allocation is more complicated because of the fragment management.

Directory is organized as directory chunks. Chunk size is fixed. A chunk consists of directory entries with varying sizes.

This article describes how to read papers by 3-pass technique.