Assignment 3 FAQ, Errata, and Addenda

Due March 23, 2015 6:00pm on-line via sakai

Last update: Monday, March 23, 2015 13:57 EDT

Addenda

Group submission

Submissions from a larger group will be held to far higher standards than individual submissions or submissions from a smaller group. If there are more than three of you in a group, I expect the assignment to be perfect: all tests must pass and the code must look beautiful (nicely commented, functionally partitioned).

If you are doing the assignment individually, be sure to have command processing fully working. Then work on built-in commands, and then get pipes working. You can still get a decent grade if you do not get pipes working properly.

Regardless of the size of the group, you should submit a test report that summarizes your results following the tests presented in the Guidelines for Testing.

Be sure to close your pipes!
If any process has a writing end of a pipe open, the process that is reading from that pipe will never get an end-of-file indication since the process that is holding it open could conceivably write something. You need to ensure that the parent closes the files (pipes) as soon as it no longer needs them ... before it executes wait. By the time wait is called, every pipe should be closed at the parent.
GENERAL NOTE: Don't put too much logic in the main function.
Do not implement the bulk of the program within the main function. You'll lose points for this. Write a bunch of smaller functions. Break your code into bite-sized chunks.
GENERAL NOTE: Do not check for any specific characters to determine an end of file

Important: You detect an end of file when fgets returns a 0 (getc and fgetc return an EOF, which is -1). You should never check for specific characters, such as control-D. If that happens to be your end of file keyboard character (that's the default on Linux/Unix/OS X systems), the terminal driver handles processing that. Moreover, there is no end of file character when reading from a file.

FAQ

What if the user types some garbage as an argument for exit built-in command? For example, if the user types a string or a floating point number as an argument instead of an int value. What should my program do in this case?

You want the shell to exit, even if exit is given a bad argument (for example, pretend you're running a shell script - you would not want the shell to continue executing just because you made a mistake). It’s up to you to decide if you want to print a message. If you look at bash, for example, it prints a message such as:


$ exit abcde
bash: exit: abide: numeric argument required

It then exits with an exit code of 255. You can do something similar.

How should we treat "" ? E.g., if we have the command


"" -l

or


command ""

The "" will be treated as a separate argument containing the empty string.


	arg[0] = "command"
	arg[1] = ""
	arg[2] = "-1"
	arg[3] = 0

How should the program behave if the user provides the following input:


command "h  -g

or


command h -g"

or


command "

Meaning we do not have open and closing quotes, we have only one quote. How should we treat it?

You should print a message stating that there is a mismatched quote. You’ll know this because your parser needs to keep state of whether you are inside a quote (so spaces and tabs become part of the argument) or outside (where spaces and tabs mark the end of the token).

You can then choose one of two designs: (1) warn about the mismatched quote and complete the string by assuming the missing quote is at the very end; (2) don't execute the command at all. In the case of a real shell, like bash, it would prompt you for more text after you press the return key, waiting for the ending quote character

What is we have multiple pipes one after another, e.g.,


command -g || -f

How should we treat this case?

I will not test this, so you don't really have to worry about this case. You can either (1) print an error, (2) ignore extra pipes, or (3) treat it as a null command that ignores any input and produces no output.

In shells like bash, the double-pipe is a special directive that tells you to execute the next command only if the error code of the previous one was non-zero (that is, the first command reported an error).

Should we allow commands to contain non-alphabetic characters (except of quotes)? As I understood there is no requirement for that, but just to make sure.
Yes. Commands may have non-alphabetic characters. For example, you can have a command with a space in it (which will have to be quoted) or a control character. Anything that is a valid file name can be a valid command.
What is the maximum amount of commands we should expect? In the assignment it is said that we may assume maximum 50 arguments for a command, but I do not see any notion regarding the number of commands.
There should not be a limit to the number of commands. For commands, you may use malloc to allocate a command struct (e.g., argument list, file descriptors for the pipe, argument count, next command) and create a linked list.

In the example below


"k", "lm | no"

Do we mean that "k" is a command and "," and "lm | no" are its arguments?

For this assignment, you can assume that quotes are present only at the beginning and end of each argument. In this case, you are correct in your example. In shells such as sh and bash, quotes may appear inside a command, so "k", is equivalent to "k," which is equivalent to k, . As another example, abc"de fg"hij is the same as "abcdefghij". You don't have to implement this behavior.

How should we treat input such as


"he llo"world

Is it a command "he llo" and an argument "world" or a one token with quote mismatch?

It should not give you a quote mismatch.

When you see the first quote ("), you pick up everything from the next character until you come across the next quote ("). While doing so, you should check for the end of string. If you see one, then you have a quote mismatch. Otherwise, everything to becomes part of one token (i.e., an argument). You don't have to handle the case of a non-space character immediately after the token in any specific way but should make sure your program never crashes and handles this case in some determinate manner. For example, you may choose one of these two approaches:

  1. if the next character is not a space (space or tab) or end of line, then turn off your quote flag and continue adding non-space characters to the token. In this case, you will have one argument "he lloworld".
  2. treat the second quote as the end of string and start parsing the next token. In this case, you get two arguments: "he llo" and world.

The assignment isn’t about parsing, so I don't want you to have to spend too much time dealing with these conditions. However, make sure you handle everything in a graceful and well-defined manner.

As I understand if we have input such as


hello""
It is being treated as one token along with 2 quotes (e.g a token is hello"" ). Right?

Again, you don't have to handle the case of embedded quotes. I expect the quotes to surround an argument that has spaces preceding it and/or following it. You do one of several things here too:

  1. You’re adding characters to your token and get to the first ". Since you're not in a quoted string, this quote does not toggle an in_quoteflag, so you continue grabbing all non-space characters and making them part of the argument. In this case, the argument becomes

    
    	hello""
    
  2. You get to the " and decide to now set an in_quote flag and continue grabbing input until the next matching quote. In this case, you grab 0 additional characters and your argument becomes
    
    	hello
    
What is meant by the instruction You may not use the system library function?
Unix/Linux/etc. systems have a library function named system (run man 3 system or look here). It runs a shell command by forking and execing sh -c your_command. You cannot use this. Your assignment must use the basic system calls. You also cannot use the popen library function (not that it will help).
What is the maximum string length for a user input? This is smth I should know for fgets function.
You can assume 512 bytes.
What is the maximum length for command name and an argument?
You need not make any assumptions on the maximum length of a command name and argument since you can create pointers to input strings containing those names (or, if desperate, allocate space).
Should we throw an error if user starts command with pipe?
That would be nice but isn’t critical.

I did not understand first part of specification:

If the standard input is coming from a terminal, then print a prompt ("$ " or anything else you'd like; preferably something different than your normal shell prompt so that you'll remember when you're running this shell). You can detect if the standard input to the shell is a terminal or not via the isatty() library function.

Particularly I do not understand when we need to print the prompt. I thought that when our program is run we immediately should give a prompt to a user.

Let’s say you're running a test. Instead of getting the input from the keyboard, you're giving the shell input from a file:


	./myshell <testfile

If you print a prompt without checking if the input is from a user at a keyboard, you'll see a prompt printed for each command, which is unnecessary because you, as a user, are not entering commands: they are read from the file.

Try something like:


	char line[MAXLINE];
	int give_prompt = isatty(0);

	if (give_prompt) fputs(prompt, stderr);
	while (fgets(but, MAXLINE, stdin)) {
		// parse & run the command in line
		if (give_prompt) fputs(prompt, stderr);
	}

Part 5 of specs states:

"Neither of the built-in commands have to function in a pipeline of commands."

What does that really mean? What should be done if user types a cd or exit command in between other commands separated by pipe?

It does not make sense to put a cd or exit command into a pipe since neither of those commands consumes input or checks output. I won't check this condition. Make sure your program does not crash. You can handle the situation either by printing an error or by running the command locally and ignoring the input/output.
In the pipe and dup2 tutorial, you wrote sample code for a pipe sharing two processes in pipe-exec.c. I am confused as to how you manage to make the parent process, which contains the first command, execute first and the child process execute second without using a locking mechanism like a mutex or a semaphore

No need to use locking mechanisms. The parent simply forks off one child for each command in the pipeline. All commands run as child processes – the parent is still the shell. There is no ordering involved. The operating system can schedule the commands in whatever order it chooses. If a command is trying to read from a pipe that has nothing in it, it will block.

The example I had in the pipe tutorial has a parent sending data to the child. In your case, you have to do things differently since your parent never execs anything directly since that would overwrite your shell.

If you have one command (cmd1) that needs to pipe its output to another command (cmd2), first create a pipe. For example,


	int pipefd[2];
	pipe(pipefd);

Then you fork the two child processes (the parent remains the shell):


fork:
	child:
		cmd1 sends its output to the pipe, so change the standard output to the pipe
		dup2(pipefd[1], 1)
		execvp(cmd1, ...)
fork again:
	child:
		cmd2 reads its input from the pipe, so change the standard input to the pipe
		dup2(pipefd[0], 0)
		execvp(cmd2, ...)

parent:
	close(pipefd[0])
	close(pipefd[1])
	wait for all child processes to exit
How is it possible not to allocate space for a command name and command arguments if we still need to send them to execvp as an argument?
If you cannot come up with a way, then you can use malloc (or strdup). However, it's not really necessary to go and allocate more memory when all the data you need is already in the string and you can create pointers that point to the parts you want. Take a look at this tutorial for an example.

You said that I should have pipe[2] in each command struct. I do not understand why I need that. Can't I use one pipe for all commands?

I am still a bit confused on how to use pipes. I checked several examples, but all of them were showing the case with one pipe. It is not exactly clear for me as of now how to work with multiple pipes.

As I understand we should have one parent process that forks child processes per each command. The child process then reads from pipe[0] and writes to pipe[1]. Correct?

You will have to create a new pipe for each set of piped commands. If you have:

	A | B | C | D | E

then you will create four pipes and fork 5 child processes. For each child that you fork, you will implement logic along the lines of:

  • If this process gets input from a pipe then dup2() to change the standard input to that pipe.
  • If this process sends output to a pipe, then dip2() to change the standard output to that pipe.

The parent will close all pipes after it forks the children. You’ll need to maintain a list since you may have an arbitrary number of pipes.

I am not sure when we should use close function. As I understand in the case

	cat moby.txt |tr A-Z a-z|tr -C a-z '\n' 

a child for our second command will need to access read-end to read what was produces by first process and write-end to write its own output. Is it correct or no? This process is not fully clear to me yet.

You have three commands in this example (cat, tr, and tr). What your shell will do is create three child processes (one for each command) and two sets of pipes.

	fork:
		child: // command 1 (cat)
			command sends its output to the pipe, so change the standard output to the pipe
			dup2(firstpipe[1], 1)
			execvp(“cat”, …)
	fork:
		child: // command 2 (tr)
			command sends its output to the pipe, so change the standard output to the pipe
			dup2(secondpipe[1], 1)
			command reads its input from the pipe, so change the standard input to the pipe
			dup2(firstpipe[0], 0)
			execvp(“tr”, …)
	fork:
		child: //command 3 (tr)
			command reads its input from the pipe, so change the standard input to the pipe
			dup2(secondpipe[0], 0)
			execvp(“tr”, …)
	parent:
		close all pipes

What if we encounter commands that do not do any reads/writes? How should we treat them? e.g we have smth like:


	cat moby.txt |cd /src/smth |tr -C a-z '\n' 

How my program should behave in this case?

Don't do anything special but make sure your program does not crash. You don't have to handle the case of built-in commands in a pipe. For regular commands, you have no idea if they generate output or consume input, so treat them as if they do (see previous example).

In the example on the assignment specification:


	'abc' "de f'g" hij| k "lm | no"

you had a single ' after the f. How should we handle a nested sequence such as "ab 'e f' g"?

With nesting, you just treat any other quote as an ordinary character. For example. "a'b'c’d" gives you a token a'b'c'd
Which shell functions do we have to implement? For example, "echo" is a command. Which other ones are we required to do for this assignment?
You only need to implement cd and exit as built-in shell functions. Every other command is an executable program that you’ll run. For example, echo is a program (/bin/echo).
The pipe of commands I run never exits.
Chances are that you have not closed your pipes at the parent. If a command is reading its input from a pipe, it will never detect an end of file if the parent still has the writing end of that pipe open (the parent has the option to send data to that pipe as well, not that it needs to).
Can I have an extension? I was out for spring break, my machine crashed, I had other assignments due, etc.
I mentioned there would be no extensions when I posted the assignment 18 days ago. Please submit what you have. If you were stuck at any point, you had ample opportunity to ask questions. This assignment did not involve a tremendous amount of programming (perhaps around 300 lines of code). I’d expect that a single student should be able to get most of it working within an evening of coding (I provided examples for things such as pipe, dup, fork, and exec) but gave you a full week not counting the break and the option to work in a group.