Showing posts with label Programming. Show all posts
Showing posts with label Programming. Show all posts

Wednesday, January 12, 2022

Git Commit Messages

I've been using GitHub to host my codes for a long time already. However, I often get lazy and  just put "edits" in my commit messages. This is obviously a bad practice!

I decided to look around to learn more about how to write good commit messages. I came across these resources [1] [2] [3]. 

The following format looks good:

Commit Type(Scope): Subject Line

Body

Commit Types

  1. Feature
  2. Fix
  3. Style
  4. Refactor
  5. Test
  6. Docs
  7. Chores

Subject Line

Short text (less than 50 characters) that summarizes the commit. 

Body

This is an optional detailed description of the commit. Wrap at 72 characters.


References

[1] https://cbea.ms/git-commit/ 

[2] https://dev.to/wordssaysalot/art-of-writing-a-good-commit-message-56o7 

[3] https://dev.to/thelogeshwaran/how-to-write-good-commit-messages-714


Friday, July 9, 2021

Humans of Computer Systems

Professor Murat  has an interesting section in his blog called the Humans of Computer Systems. I've been thinking about documenting my own "history in computing/systems" so I decided to answer some of the questions in HCS.

Programming

How did you learn to program?

I first learned to program using programmable calculators which I borrowed from my rich high school classmates. I was amazed how using variables saves time when computing some formula. Some of my classmates even have graphing calculators. I usually borrow their calculator and the manual overnight to try it out. I then learned BASIC on my own when my father brought home an IBM PS/2 laptop. I learned other programming languages in school. 

Tell us about the most interesting/significant piece of code you wrote.

When I was in college and taking an assembly language programming course, I wrote a text editor in C, which I called ASMEdit.  It allows me to assemble and link inside the editor. For me, this was an interesting project since I learned how to use pointers to functions to implement the menu system. I also learned to call external programs, TASM.EXE and TLINK.EXE, inside another program. I also implemented syntax coloring for the assembly instructions. This project was developed for the MS-DOS operating system.

Who did you learn most from about computer systems?

I learned about computer systems in my undergraduate OS class, mostly by reading the dinosaur book by Silberschatz et. al. It was in this class that was able to use a Unix OS called Solaris running on Sun hardware. My undergraduate SP/Thesis adviser was a systems and networks guy so I also learned a lot from him. I even learned a lot more about systems when I switched to linux desktop starting with Red Hat 7.3. 


Who is the greatest programmer you met, and what is impressive about them?

Some of my college classmates were really good programmers. They can easily implement advanced data structures and algorithms, especially graph and network algorithms. There was no Stack Overflow then.

What is the best code you have seen?

Over time, I realized that there is actually no best code. I do admire readable and maintainable code. OS kernel source code is quite messy.

What do you believe are the most important skills to be successful in your field?

Desire to learn new things. Oral and written communication. Working in a team. Navigating the academic politics.

What quality or ability do you value most in a computer systems person?

The desire to learn and experiment or tinker with various things. The ability to "see" the big picture at the same time can work on the specifics. Courage to break things.


Personal

Which of your work/code/accomplishments are you most proud of?

I am proud that I was able to get tenure at the university. This gave me the freedom to work on various areas in computer systems that interest me without worrying too much about job security despite the low pay. The ICS-OS paper actually gave me tenure. I enjoyed working on it and using it in my classes.

What comes to you easy that others find hard? What are your superpowers?

Understanding systems. Connecting/integrating things together.

What was a blessing in disguise for you? What seemed like a failure at the time but led to something better later for you?

I was not accepted in the private company that I applied to after graduation. My rejection in that company led me to apply as an Instructor in the university since I also want to pursue graduate studies. 

What do you feel most grateful for?

I feel grateful for everything I have right now. 

What does your perfect day look like?

Learning something new. Helping some people. Exercising and playing sports.

What made you most happy in the last year?

I was able to survive despite the pandemic. Though anxiety kicks in from time to time.


Work

What was your biggest mess up? What was the aftermath?

Some colleagues were pissed when they lost internet access because my private cloud setup has an exposed DHCP server which assigned IP addresses to their machines. We were able to isolate and resolve the problem but it was already late in the afternoon.

What was your most interesting/surprising or disappointing interaction at work?

I need to babysit the son of my colleague on a weekend because he needs to argue/discuss with another colleague about the "draconian" network access filtering.

What do you like most about your job/profession?

The freedom to tinker. The opportunity to share what I know. The chance to mentor and help others. Working with smart people. Playing the publications game. Navigating academic politics.

What would be the single change that would improve your work environment most?

Improving the research culture. Most of my colleagues are great teachers but they disregard the research aspect of the profession. CS is a fast-changing field. We need to keep up with the advances.


Technical

What do you think are the hardest questions in your field?

System reliability and performance. Ethics. Should we build this system because we can?  Is there one operating system to rule them all?

What are you most disappointed about the state-of-the-art in your field?

Sometimes the state of the art is just an incremental step or just scaling up. 

What are the topics that you wish received more attention? What do you think is a promising future direction in your field?

System reliability and performance. Ethics. 

What is your favorite computer systems paper? Why?

*XEN and the Art of Virtualization *A view of cloud computing *MapReduce: Simplified Data Processing on Large Clusters

I reread these papers from time to time.


Story

Is there an interesting story you like to tell us?

Yes.

Tell us your story.

I wrote an EXE non-overwriting computer virus bundled with ASMEdit I described above. My classmates and instructor who copied the program had no idea of the presence of the virus. The virus just replicates though, there is no destructive payload. AV then were signature-based so they never detected the virus I wrote.

Rant your heart out.

We are in a research university. Why are we not reading at least one research paper per week? :)


Tuesday, April 14, 2020

ROOTCON's Easter Egg Hunt Event 2020: Power

Since we are in ECQ, I tried some of the problems. I decided to focus on the Power problem which is a crypto problem.

The flag is: rc_easter{p0w3r_1s_n07h1n6_w17h0u7_c0ntr0L}

You can read the full writeup here.

Thursday, September 5, 2019

Introduction to debugging C programs using GDB


Instead of just reading the code, a debugger such as GDB, can be used to find errors in C programs. GDB is available in linux distributions.

Example code, prod.c :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <stdio.h>
#include <stdlib.h>

int mul(int x, int y){
   int prod;
   int i;

   prod=0;
   for (i=0;i<y;i++){
      prod=prod+x;
   }
   return prod;
}

int main(){
   int a=4;
   int b=3;

   printf("The product of %d and %d is %d\n",a,b,mul(a,b));
   
   return 0;

} 

The following are the typical activities when debugging C programs:

1.  Create the executable with debug information

$ gcc -g -o prod.exe prod.c

For assembly language programs:

$ nasm -g -F dwarf -felf64 prod.asm
$ ld -o prod.exe prod.o


2.  Load the program in GDB

$ gdb prod.exe

3. View the source code listing

(gdb) list
 
4. Set a breakpoint

(gdb) b * main

5. Execute until breakpoint

(gdb) r

6. Execute next line

(gdb) n

7. View current line being executed

(gdb) frame

8. Step into a function

(gdb) s

9. View local variables

(gdb) info locals

10. Print variables

(gdb) print a

11. Set new values for variables

(gdb) set variable a=5

12. Continue execution until next breakpoint

(gdb) c

13. Quit

(gdb) quit

Saturday, May 4, 2019

Programming Tips for Student Projects

  • Use git for version control. Follow this simple workflow model.
  • Create separate folders for frontend and backend components especially if using the MERN stack. You can also create a data folder inside the backend. An example application template.
  • Use config files to set values for database configuration such as dbhost, dbuser, dbpass, dbname
  • Create .sql files that contains initialization data and stored procedures.
  • Use relative URLs in your app.
  • Write an INSTALL text file that describes how to install your application. Indicate the dependencies (OS version, package names, version number). If possible, create an install.sh or setup.sh to automate the installation process.
  • Use a coding convention for naming variables, functions, methods, etc.
  • Do not store passwords in plaintext.
  • Learn Docker and Docker Compose and TravisCI.  Check my app template.
  • Write automated tests.
  • (more to follow)

Thursday, March 21, 2019

Learning Windows 7 Internals



I've been using Linux(Ubuntu distro) for a long time and somehow have a deeper understanding of its internals. I guess it is time for me to focus on Windows.


Books
  • Windows Internals (Parts 1 and2), 6th Ed. by Russinovich et. al.
Software

Compiling Code
  • SetEnv.cmd /Debug /x86 /win7

Tuesday, April 17, 2018

Memory and Linux Processes

Physical Memory and Virtual Memory

The CPU of the computer is responsible for executing instructions (machine code). These instructions, as well as the data used by these instructions, should be placed in the physical memory (which is on the actual memory chip).  Each byte in the physical memory is accessed through a physical address. The physical address is placed in the memory address register (MAR) when writing or reading to/from memory. The size of the MAR and address bus determine the range of addresses that can be used. For example, if MAR is 32 bits, then addresses from 0 to 0xFFFFFFFF (up to 4GB) can be accessed.

Image result for Memory Address Register image
(https://archive.cnx.org/contents/6876272b-9b8f-463c-848b-1d388dccf776@1/module-5)


Modern computer architectures however provide a virtual or logical memory view to the CPU. The CPU accesses each byte through a virtual or logical address. The range of virtual addresses is usually the same as the range of the physical addresses, although the actual amount of physical memory may be less or more than the addressable range.

Virtual addresses must eventually be translated to physical addresses to access instructions and data from the physical memory. This translation/mapping is technically called address binding. The translation is performed by the Memory Management Unit (MMU) hardware component. Schemes such as segmentation and/or paging are often used to support different features and needs, such as protection . The operating system also performs some operations related to this address translation by invoking specialized CPU instructions.


(https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/MMU_principle_updated.png/325px-MMU_principle_updated.png)


Application programmers need only to concern themselves with the virtual memory. Kernel developers, however, need to be concerned with both virtual and physical memory as part of implementing the memory management component of the operating system.

Having a virtual memory view provides flexibility, especially in multiprogramming and timesharing operating systems. It allows a process to "believe" that it has exclusive and full access to the entire physical memory, even though it does not. Also, virtual memory allows processes to access code and data that are in secondary storage (disk) as if they are in physical memory through a scheme called swapping.

Process Memory Map in Linux (Ubuntu 16.04 x86_64, GCC 5.4.0)

When writing C programs, variables are used to hold data and functions are used to implement operations. The variables and functions have names, which are symbolic. In compiler design, names are generally called symbols. Consider a payroll program, the variable named age can be used to hold the age of an employee. Also, the function named compute_salary can be used to perform the operation of computing the salary of an employee.

When a C program is compiled and linked, the variables and functions are converted to memory locations/addresses(in virtual memory) and machine code(object code) respectively.  Variable names and function names become memory addresses and are stored in a symbol table. The result of this conversion is stored in an executable file (aka program binary image). The executable file is what is usually run. Executable file format is usually dependent on the operating system. In Linux for example, ELF is the standard format for executable files.

Running a program actually means the loader reading the executable file and creating a process for it. When the executable file is loaded by the operating system for execution, for example via the exec() system call in Linux, the operating system allocates a portion of memory in an area dedicated for user processes.  The data and instructions are read from the executable file and placed in the allocated memory in locations which are also specified(in the symbol table) from  the executable file. Again, it is emphasized that the memory locations being referred to here is in virtual memory. Once the data and instructions are in memory, a new process control block(PCB) is created representing the process. The allocated memory becomes the process' memory map or address space and is usually a field in the PCB. The process is then scheduled for execution.

A process' memory map is divided into sections which serve different purposes. A typical memory map is shown below. The text section for instructions, the data section for initialized data, the bss section for uninitialized data, the stack section for function calls (and traditionally, parameters), and the heap section for dynamically allocated memory (via the function malloc()). Some of these sections are already defined during the compilation and linking process. Although it appears below that the memory is contiguous in virtual memory, it may not be the case in the corresponding physical memory.

(https://www.hackerearth.com/practice/notes/memory-layout-of-c-program/)
The example C program below will illustrate in what section of a process' memory map the different symbols are placed. Download binding.c, create an object file and executable file [1]. Run the executable several times and observe which variables change in address. The variables are so named to show in which section they will reside.

Compile time (output is object file):
$ gcc -fno-common -c -o binding.o binding.c

Link time (output is executable file):
$ gcc -fno-common -o binding.exe binding.c

Run time (a process is created) :
$ ./binding.exe

Next, examine the symbol table of the object file and the executable file. The first column refers to the assigned address and the fourth column refers to the assigned section.
$ objdump -t binding.o 
$ objdump -t binding.exe

Compare the entries for some of the symbols in the object file and executable file. Which file contains an address for the symbol, the object file or the executable file?
$ objdump -t binding.o | grep data_global_initialized_nonzero
$ objdump -t binding.exe | grep data_global_initialized_nonzero 

$ objdump -t binding.o | grep -w bss_global
$ objdump -t binding.exe | grep -w bss_global

$ objdump -t binding.o | grep -w text_func
$ objdump -t binding.exe | grep -w text_func

Can the symbols that start with stack_ and those stored in the heap be found?No. The stack section and heap section are allocated at run time.

Let us look where each of the sections start in memory and the symbols in each section.
$ objdump -t binding.exe | grep -w .text
$ objdump -t binding.exe | grep -w .data
$ objdump -t binding.exe | grep -w .bss

We will now use GDB to examine the process address space at run time. GDB will allow us to examine the state of the execution by allowing us to run one instruction at a time. (Try to compare the addresses at link time and at run time. Are they the same?)
$ gdb ./binding.exe
(gdb) set disassembly-flavor intel
(gdb) b main+99
(gdb) r
(gdb) disas main
(gdb) info proc mapping
   
Study the memory map. Notice that there is no heap section yet. This is because no call to malloc() has been made yet.
(gdb) ni +6
(gdb) disas main
(gdb) info proc mapping

The heap section is now present. Let us look for the variables in the sections.
(gdb) find 0x601000,+0x1000,"JACH_IN_DATA"
(gdb) find 0x601000,+0x1000,"JACH_IN_BSS"
(gdb) find 0x602000,+0x21000,"JACH_IN_HEAP"
(gdb) find 0x7ffffffde000,+0x21000,"JACH_IN_STACK_LOCAL"

How about the parameter?Is it in the stack?
(gdb) find 0x7ffffffde000,+0x21000,"JACH_IN_STACK_PARAM"

The string is not in the stack! It is in the text section! Traditionally however, parameters are pushed to the stack.
(gdb) find 0x400000,+0x1000,"JACH_IN_STACK_PARAM"

Finally, run the process to completion.
(gdb) c
(gdb) quit 

Conclusion

This post discussed some concepts in memory management in relation to C programs and Linux processes.

Figure 1. Sample Memory Map (no heap section yet)