README Generation Project Post 2
Well. Progress has been made while working on this project. Though not as much progress as I would have liked given It’s been nearly 2 months sense my last post on it. But progress has been made never the less.
Parallelization of retrieval of information.
As noted in my last post on slower connections the program suffered. And through some informal testing I found that while I have little control of the speed of making a single request I do have control of the number of requests at once. This doesn’t seem to affect the speed of an individual request. But it can greatly increase how fast all the requests can be made and satisfied. This isn’t much of a surprise. As I had parallelized this process in a previous iteration of the project that is described in my last post. This was done using pthreads. Which was an interesting experience. This time around though I decided to use OpenMP to make the requests run in parallel. The reasoning behind this being purely because I saw examples of it in my Programming Language Design class and I wanted to try to use it. But I did not anticipate in an issue I had.
Issued with OpenMP
The issue in question came when I was testing for memory leaks. And I found that for some reason I had a memory leak when I was using OpenMP. And thinking that there was no way for the library to be at fault I thought it was something I did. So I proceeded to spend a few hours pulling my hair out trying to find the source the memory leak that I clearly caused. But then I realized that if I removed OpenMP that it eliminated the memory leak. And so I did a little looking and found this article on the subject. It turns out that there is a pseudo memory leak that valgrind detects due to how OpenMP allocates itself memory. So that ended that hair pulling adventure.
What Was Parallelized
Overall I ended up making two main things run in parallel within this project. Those being the retrieval of individual project’s descriptions and the retrieval of the links for the projects. These two tasks before they ran in parallel took the most amount of time. And while it can still take some time it is greatly improved. For example in one test I did without them running in parallel it took the whole program 1:26.83 to run. Where with them running in parallel it took 0:23.32. Which is 330% faster. And while it isn’t always that much of a difference it is more often than not faster. I also only tested this on my laptop which has 2 cores and 4 threads. So it’s possible that it will preform better on other computers. But at this point it works well enough. And from when I’ve tested on slower connections it still improves the time it takes for the program to execute.
Progress Indication
Another thing I did sense the last most was add a progress indicator. This was also mentioned in the last post as something I wanted to do. And I must say it’s simple but I’m pleased with how it operates. It simply makes use of the carriage return to repeatedly overwrite the same line within the terminal. With each overwrite printing the name of the task being preformed, the progress bar and a counter of how many items have been processed and the total number of items that need to be processed.
But due to some of the tasks being run in parallel it presents it’s own challenges. Which were solved in this case by making use of OpenMP’s critical section. In this case it protects a counter variable that increments the number of completed items and when that number is updated the progress indicator is printed.
Once the progress indicator reaches it final value it then also adds the word DONE and prints a new line character to start the next line. It’s not the most fancy of progress indicators but it works well for what I’m doing. I also made it so I can make use of it else where if I want. With the size of the progress bar being adjustable using a single variable. The progress bar part of the indicator can also be disabled by setting that same variable to any number less than 1. Though I’m not sure how useful a progress bar with only one indicator section would be.
Example of Progress Indicator
File exists
Getting repo names: 63 found
Getting descriptions [=========================] ( 63/ 63) DONE
Getting links [========= ] ( 25/ 63)
What to Tackle Next
I still haven’t worked on everything I mentioned in the last post that I wanted to add to this program. Such as adding proper command line arguments and documentation. But I’d also like to make it so that the output is organized into categories or folders. But before I do that I think I’ll need to make some sort of config file to control and store that kind of information. Either way it should be at least something to work on. And the annoyances I mentioned in the last post are mostly resolved now. So I’d call that a win.