KPLUG Logo

A File's Journey to Paper

Printing is one of the often used but seldom understood command suites. The ease with which moden GUI tools allow us to configure printers leaves us often baffled when things go wrong. By exploring the process of printing, we'll develop the understanding necessary to correct the problems that crop up. Several excellent references, including the Linux Printing HOWTO and the Printing Usage HOWTO are available on the net, but wading through piles of documentation can be a frustrating experience. Hopefully this will help.

We'll start with the overall process, then continue on with detailed discussion of the various programs that do the nuts and bolts, including the lpd daemon, then explore printcap entries, discuss the lpr, lprm, and lpc commands, and finish up with a discussion of print filters. As an extra added bonus, we'll cover printing from Windows .

Overall Process

The Overall process

Understanding the overall process is probably the single most important thing this tutorial will provide. The diagram shows a simplification of the printing process as it normally occurs. When a print command is issued, you're not sending the file straight to the printer. Where you're sending it is the Queue, or "Spool Area". Note: Yes, I know... the diagram is not complete.... it really sends it to the lpd daemon, and it's the lpd daemon that puts it in the spool queue, but this is, after all, a simplification of the process. In the spool area you can find not only the data you sent off to be printed, but control files and various helper files as well. It's the job of programs like lpr to correctly generate these extra files. Then it's the job of the lpd daemon to take it from there. LPD is really the workhorse of the printing world, performing the job of a traffic director. When it's time for a job to go off to the printer, lpd uses the instructions found in the printcap database to decide whether or not to run the data through a print filter, and to which output device to send the filtered data. The print filter's main job is to convert your file into a format that your printer understands. A very popular filter is called ghostscript, which we'll discuss later.

LPR -- The voyage begins

We use lpr to begin a file's jouney to paper. Even programs like netscape and abiword print by invoking the lpr command. Taking a look at the man page tells us that lpr can direct output to different printers, use different print filters, and a host of other options. Very versitile.

How does lpr decide which printer to use by default? The first thing the command looks for is the -Pname option. If none is specified, it will look for an environment variable with the name PRINTER (the LPRng version, like the one supplied with RedHat 7, will also look for LPDEST, NPRINTER, and NGPRINTER), and finally for an entry in the printcap called "lp". Whichever one it finds first is the printer it will use by default.

LPR will then take the data and command line options that you've given it, generate a "control file", and send both of these files to the lpd daemon. Among the things it sends is the print filter type you've specified using the -Ffiltercode option. There are options for cifplot, tex, plot, troff, raster file formats and more, each with a corresponding entry in the printcap database specifying the path the filter. If no filter is specified, the entire print run is sent through the standard filter. What a pain! Sooo may options! Back in the bad-old-days (the early '80's) configuring and maintaining a set of print filters for a medium enterprise could be a full time job. Today a popular solution is to run everything through the default filter (a "magic" filter), and let it decide what type of data you're trying to print and how to handle it.

Once the file has been sent off to lpd, lpr's work is done. That's it.

LP -- an alternate LPR

Some distros include the lp command. This is, for most purposes, the same as the lpr command, although there are some differences in command line options. It is usually included simply for historical reasons and backward compatibility, but the LPRng printing suite has revived it. I won't go into detail on the command, directing you instead to.... you guessed it.... the man page. Suffice it to say, there is some added functionality but it is mostly useless, and having to memorize two sets of command line options instead of just lpr's set never made much sense.

The LPD Daemon

First off, like any other old UNIX head, I'm going to tell you to read the man page. But you can do that later. LPD operates as a service, so is normally started by your startup scipts. Because lpd is sitting back there in the background, it's tough to see what's going on and to pass instructions, so it communicates via TCP on port 515. If you really get curious about how to talk to it, you could always read RFC1179 . LPD is the job scheduler and print spooler program, starting and stopping print jobs, but it is also the program that manages and reports on the operation of the printing process, processes accounting information, logs print jobs and printing errors, manages the spool area, checks file permissions, and monitors the status of the attached printers. Whew! That's a lot of stuff! Luckily, aside from getting the printcap file configured right, there's nothing else to do.

Normally, lpd will scan it's queue periodically looking for files. It sorts them by time of arrival and priority, and then prints them. Printing is usually a straightforward process.

LPD will keep doing this forever... check the queue, print what it finds, wait.... check the queue, print what it finds, wait.... pretty basic.

The Printcap database

The printcap database is about the toughest thing to handle. In the "modern" Linux age, we've got plenty of GUI tools to help insulate us from the horror of printcap, but sometimes there's something we want to do that the cute little point and clicks just don't know how to handle. The format of the database is fairly free-form, and the rules are simple. Whitespace (spaces, tabs, etc., ) is pretty much ignored. Any line that does not begin with a ":" or a "|" starts a new printer definition. Aliases to a printer are defined immediately after the new printer name, and are separated by the "|" character. The ":" character starts and ends a database entry. Boolean values may be declared as TRUE just by referencing the entry, and false by adding a "@" to the entry. And lastly, the "\" character, as usual, denotes a continuation of the line.
(Example, the entry :sh: means set "sh" on, suppressing banners, while :sh@: means to set "sh" to OFF, enabling banners.)

There are, unfortunately, literally dozens of possible entries in the printcap database, but in most circumstances we only have to deal with a few of them. The following example is the printcap entry for my ALPS MD-1300 MicroDry printer:
lp|MD1300:\
    :sh:ml=0:mx=0:sd=/var/spool/lpd/lp:\
    :lp=/dev/lp0:if=/usr/local/bin/my_magic_filter:
Listed below are the most common and useful printcap entries. It is likely that you will never need to use any more than these, but naturally you can always get more information and definitions by referring to the printcap manpage.

Common Printcap Entries
Entry Value Type Default Value Definition
sh Boolean False Suppress banners and headers
mx Numeric 0 Maximum size of job (in K). 0=unlimited
ml Numeric 32 Mimimum size of printable file
sd path NULL The spool directory to use for this printer
lp path NULL The device name to which output is sent
if path NULL Filter to invoke on every file sent
lf path "log" Debugging and error log file.
Going back to the printing process outlined in the lpd section, we can now fill in a couple interesting points....

See how it's all starting to come together?

LPRM -- Killing a print job

LPRM is, in a real sense, why I started this tutorial in the first place. Back in June, Jesse Weigert sent an e-mail to KPLUG's user-list saying he had a problem. The answer is really pretty simple, but is often overlooked by "point-and-click sysadmins". That's not a curse-word.... it's a sign that Linux community is moving towards the mass-market, with a few faltering steps at first, but gaining speed.

LPRM signals the lpd daemon to remove a job from the print queue. You can tell it to remove the last job, any given job, or even all jobs, and even which print queue from which to remove them. The examples in the man page spell out exactly how this is accomplised, so I won't go over them here. One little line in the man page says

If the job is active, the LPD server will stop printing the job and then restart printing operations.
What it fails to mention is that when stopping the printing process, there is often an active filter program (or script) in process. If that's the case, lpd will not just kill it, but will instead send it the SIGINT signal. If there is no filter or it is poorly written, this will have the same effect as killing it, but if the filter is smart, it will catch the SIGINT and exit with style and grace. Catching this signal and sending a RESET and FORM FEED to Jesse's printer is one of many possible ways to solve his problem. Another would be to use the ld printcap entry (no... you're right, we didn't talk about that one...) to ensure that the printer is RESET at the start of each print job but, depending on the printer, this might not correctly eject half-printed pages, and if we sent a FORM FEED at the start of every job to make sure, we'd waste a lot of paper. See what we've learned already?

LPC, LPQ, LPSTAT -- Monitoring the process

We often want to see what's going on, or to manipulate the printing process... temporarily shutting down a printer while we replace an ink cartridge, for example.

The way we are supposed to do this is via the lpc command. Because many of the functions of lpc are priveledged (we wouldn't want just any randowm user to shut down our printers at a whim), the lpc command is usually located at /usr/sbin/lpc , which means that it is usually not in the normal user path, but rather intended as an administrator's command. lpc, when run without command line arguments, will enter an interactive mode, allowing you to start or stop the printing process, enable or disable the print queue, set a printer's stutus to "up" or "down" (a combination of printing and queuing), show the status of a printer/queue, and other functions. The easiest way to get familiar with the lpc command is to enter interactive mode, and play around with it. It's exact features are dependant on whether you are running a BSD style print suite (like supplied with RedHat prior to 7, SuSE, SlackWare, and older Debian distros), or the newer LPRng suite. The best way to find out which commands are available in your particular version is to read your manpage and to issue the "help" command from lpc's interactive mode.

lpq is a useful command normally available to all users that will list the contents of the print queue, and lpstat prints the status of a particular printer and it's associated queue. When installed from the LPRng suite, these programs clock in about half-a-megabyte each (compare that to the measy 11K for the BSD versions found on older distros), so if disk space is at a premium, you might consider replacing them with small custom shell scripts that either redirect the request to lpc or talk to the lpd daemon directly. How to do this is left as an exercise for the student. Between the lpc manpage and the RFC, all the information you need is part of this tutorial. The most original, elegant, and functional scripts will be appended to this tutorial as examples. However, as a hint, try telnet localhost 515, and when the client connects, type "Ctrl-D lp " and hit return. You can take a look at RFC1179, section 5.4 to see why this does what it does. Note: This will not work with a BSD lpd daemon, but will with LPRng. I will give a free sheet of used printer paper to the first person who posts the correct reason why to the KPLUG discussion forums......

Print Filters -- Where the magic happens

Print filters are useful for a variety of things. In the old days, we used the pr filter to insert page breaks and add page headers/footers to our multi-page plain-text documents. Today we use filters to convert files to a language our printer understands, like postscript, HPGL, or PCL. We used to use a variety of filters, each one carefully interfaced and entered into the printcap database, and tried to remember arcane command line switches to convince lpr to enter the correct data into the control file, which it then sent to lpd according to RFC1179, where it got converted on it's way to the printer like we outlined above..... It was like "The House That Jack Built". We still use filters, but we're smarter than we used to be. Most types of data can be identified by examining them... either by using the mime-type like HTTP and E-Mail does, or by examining the contents of the file. Once the type of file is determined, the filter itself can use that information to decide what needs to be done in order to print it. For historical reasons, such "smart" filters are usually called "magic" filters. With a magic filter, we only have to have one, so we dont use extra command line switches to lpr and just let everything pass through the default filter (the if printcap entry).

Filters are simply programs or scripts that take data and convert, or filter, it to another format. Our concern with print filters is converting incoming data into our printer's native language. The lpd program simply calls the appropriate filter, passes the data to it, and then passes the output of the filter on to the printer. A "Magic" filter, in it's simplest form, would operate something like this:

START
   Parse command line options
   Send standard input to a temporary file
   Determine filetype of temporary file

   If filetype is GIF data
      Then convert temporary file from GIF to Printer Language
      Print the result to standard output
   End IF

   If filetype is Microsoft Word Document
      Then convert temporary file from Word to Printer Language
      Print the result to standard output
   End If

   << Handle more file types here >>

   File is already in Printer Language
      Print the temporary file to standard output
   End If

   Delete the temporary file
DONE

In practice, what usually gets done is to convert everything to PostScript, then convert the PostScript data to the printer language. This useful intermediate step allows us to have lots of smaller programs doing the type-specific conversion, and only one program doing the conversion to our printer language. That way, if we change or upgrade our printer, we only have to change one program, not several. Because of the different ways Windows and UNIX handle the concept of print filtering, this very useful intermediate conversion step makes
integrating Windows and Linux printing just a tad difficult, but not impossible, as we'll see later. The most popular program to convert from PostScript to various native printer languages is GhostScript. If you chose to write your own printer filters, refer to the documentation for your print suite carefully, as LPRng and the BSD suite normally use different command line parameters.

Ghostscript -- The King of Filters

Ghostscript is a quite complicated program to explain in few brief paragraphs, but I'll provide an overview. On the ghostscript command line, you specify where the output should go, what the printer type is, how many DPI, what size paper you want, and a host of printer specific options. You feed it postscript data, and it spits out output suitable to stuff down your printers cable. That's it in a nutshell. Trying to explain how to do this for each and every possible printer just isn't possible in a short tutorial. There are excellent docs that come with it, such as the Background for new users, the Ghostscript Overview , and of course, the manpage. Ghostscript has quite a few nice uses besides as a printer filter, such as the ability to create pdf documents, so it's well worth your time to explore the documentation and experiment with more of the options than just the printer language output capabilities. Ghostscript has an interesting licensing scheme, which makes new versions available for non-profit use, and older versions available as GPL. If you have a newer model printer and find that it is unsupported by the version of ghostscript supplied with your distro (most distro's have it), you might try looking on the web for the latest and greatest.

For my DeskJet, I used the following:

gs -q -dNOPAUSE -dSAFER -sDEVICE=djet650c -r300x300 -sPAPERSIZE=letter -sOutputFile=- -
Explanation of example switches
  Switch   Meaning
-q Quiet. Don't print extra information to stdout.
-dNOPAUSE Sets the NOPAUSE options, which disables the pause and prompt
at the end of each page. Useful for piped output.
-dSAFER Sets the SAFER option. The main use is to ensure that files
are opened READ-ONLY, so malicious PostScript is less likely
to cause you grief.
-sDEVICE=djet650c Output device type. In this case, an HP DeskJet 660Cse, but a
Color Deskjet 650 was the closest I could get. It works OK,
but it's not perfect. We'll see what's available in version 7
-r300x300 Device Resolution in DPI. In this case, 300x300
It can do 600x600, but for high-q prints, I use my Dye-sub
printer anyway, and just use the DJ660 for drafts
-sPAPERSIZE=letter Paper size. Most named sizes are legal arguments.
-sOutputFile=- Where to direct output. In this case, "-", which means stdout.
- Input file. In this case, "-", which means stdin.

Printing from Windows -- Linux as Server

Windows and UNIX have different ideas about how print filtering should be done. Windows believes that the computer that requests the print job should be responsible for filtering and formatting it, UNIX believes that the computer to which the printer is attached should handle all of these chores (why call it a print server unless it serves?). Each approach has advantages and disadvantages. Windows offloads processing chores from the server to the client, with the disadvantage of higher requirements for network bandwidth, and increased administrative overhead (each client must have installed a proper set of drivers and printer information for each printer it wishes to access), while UNIX has the disadvantage of increased server processing load.

What this means to the Linux Print Server is that when it receives a print job from a Windows box, the print job is already formatted and ready to run down the printer cable, no translation to the printer's native language is left to be done. This can be a difficult condition to detect for the print filters, which might try to format already formatted data, with unpredictable results. Fortunately, there are a several workable stategies. The first is to install UNIX printing services on all Windows boxes that will access the Linux print server, and configure each Windows client to consider the Linux printer as a pure PostScript device, accessed via TCP/IP. This is easy for your normal filter to recognize and deal with, and allows you to use a single printcap entry for both Linux and Windows, and does not require the SAMBA suite. Details of the configuration process differ from version to version of Windows. The only configuration needed on the Linux server is to add each host to /etc/hosts.lpd.

With SAMBA, other options become available. There are some new options available with Samba 2.2. Previous to the 2.2 release of Samba, only the LanMan printing mechanism was supported, but now native Windows NT printing is implemented via MS-RPC, such as downloading printer drivers on demand, support for the Add Printer Wizard, and NT Access Control Lists. While advanced Samba configuration to take advantage of all these options is not what this tutorial is all about, it is well worth looking at the latest smb.conf manpage and the SAMBA Project Documentation. For those with a working SAMBA configuration up and running who just want to get print sharing working, the simplest configuration that will work is to ensure the following is in your smb.conf file:

[printers]
        path=/var/spool/public
        guest ok = yes
        printable = yes
Typically, the path you use should be world-writable with the sticky bit set. (Use chmod 1777 to set these permissions) Since we've already discussed the problems you can encounter with print filtering, one option to get around this is to set up a printcap record with no filter that points to the same printer, configure your Windows clients with the proper print drivers, and have your Windows clients print to the unfiltered queue. With many newer printers not fully supported by GhostScript, that can sometimes produce superior output quality. You can also configure your Windows clients to simply print PostScript and allow your magic filter to handle the necessary conversion, or you can get tricky, and set up a set of printcap entries that directs your Linux boxes to one print queue, which filters the data and forwards it to a second, raw (or unfiltered), printer, and use the raw printer for Windows. Examples of how to accomplish this are given in Section 11.6 of the Printing HowTo, though you will have to re-work a few of the entries to accomodate incoming rather than outgoing print requests. (This is left as an exercise for the student, as they say) The advantage of this admittedly complicated setup is avoiding "race conditions", where two separate print queues are competing for one single printer port so it is generally the preferred method for perfectionists, although most people find that their print services will normally handle race conditions with at least a modicum of grace. Where it becomes essential is in high-volume environments where experience has shown that print jobs wind up in the bit bucket instead of getting printed out, but knowing how to accomplish this task sets the pros apart from the .... not so pros. Give it a try, it's an interesting exercise.

Conclusion

While this tutorial is by no means exhaustive, and may raise many more questions than it answers, it provides the basic tools needed to understand what happens during the printing process. Whether you are using BSD style lpd, LPRng, CUPS, or PDQ, most of the essential remain the same, and for most people the tools provided with the distribution will work just fine. For those cases where problems arise, you can use this tutorial to get you started on solving your problem. Most questions you have can be answered by referring to the copious amounts of other peoples fine documentation accompanying this tutorial, by simple experimentation, or by re-reading this tutorial, but feel free to bring any questions you have to KPLUG meetings, preferably accompanied by the computer(s) and printer(s) your're having trouble with, or feel free to donate exact duplicates of your home network to me so I can duplicate your problems, diagnose them, and correct them, in the comfort of my own home. For problems with business networks, standard rates apply ;-)