Montana State University

About File Formats

File Format Basics

As we move further along with the transition from paper-based media to electronic media, the sharing of electronic resources becomes increasingly important. Unfortunately, there is no single standard for all electronic files to ensure this process. In fact, the number of different electronic formats is quite large. In many cases this is due to a necessity for different file types to contain styles of information not supported by other formats. In some cases it is a matter software companies seeking competitive advantage. For users needing to share electronic files it can sometimes be a challenge to move these "inter-application," or between different software, and "cross-platform," from a PC to a Mac for instance. A newer challenge is moving from proprietary formats to the Web, which is platform independent through the use of HTML.

Often the request to "just send me and electronic file" can be almost meaningless. There may be numerous electronic files that the requestor would not even be able to open. This is particularly true in the graphics and print production field where highly specialized software applications may require proprietary formats to support features not available in common applications. The classic example of this is page layout software. QuarkXpress is the industry standard for building and producing publications. But its feature set is so specialized that there is no way to save a Quark file and send it to someone so that they can open it with Microsoft Word. Another common example is the difference between Corel WordPerfect and Microsoft Word formats. Although each tries to be somewhat compatible with the other through a conversion process, there are still enough differences to make absolute conversion unreliable.

More elaborate ways of achieving file exchange are often required. These usually involve trade-offs and limitations. There are stand-alone file translation utilities that can be very useful if you exchange files with other users with different applications or platforms. For most text and few graphic conversions these include software such as DataViz MacLink Plus for Macintosh users and DataViz Conversion Plus for PC users. For graphical formats, Adobe Photoshop has the ability to read and convert a wide variety of raster formats on either platform. Vector graphic format conversions tend to be more problematic due to the complexity of the format. For PC users one solution is IMSI HiJaak Pro.

Fortunately, there are some common formats that are supported by most the major software applications. We'll discuss the categories related to graphics and text below. These all enable some degree of cross platform or inter-application compatibility and are the most useful for our purposes.

Graphic Formats

Graphic files can be created and saved using two completely different methods. These are called "vector" and "raster" and may exist either singularly or together as part of the same file format.

Vector is method where graphics are created and stored as "objects" using coordinate geometry. Each object has attributes that govern how it will display. Vector images are usually comprised of numerous objects combined to collectively portray the overall image.

One advantage of vector graphics is that they exhibit something called "device independent resolution." This simply means that they will always display or print at the highest-possible resolution since the image is transmitted mathematically to whatever display device is used. It also means that they can be scaled and modified in ways that will always preserve the highest image quality. Fonts are one example of vector graphics. Both Postscript Type 1 and TrueType fonts use vectors, also called outlines, to achieve the best possible display, at any point-size, to any printer. Adobe Illustrator is an example of a vector-based software application. There are numerous others and these are often referred to as "drawing" applications -- as opposed to raster software which is generally referred to as "painting" applications.

Raster graphics are created and stored using "pixels" to describe the image. Like the pieces of a jigsaw puzzle, pixels are the individual bits, or dots that collectively comprise the larger image. Each pixel has characteristics that affect how the image will look. These characteristics determine things such as whether the image is color, grayscale or line-art, or the amount of the resolution. Pixels are how scanners, computer monitors and digital cameras record and display information. Ultimately, even vector graphics get converted to raster images when they are viewed on or printed. This process is called "rasterization." The device independence of vectors ensures that this will always happen at the highest pixel resolution.

One strong advantage of raster graphics is that they are much better suited to depicting photographic or highly detailed images. A significant disadvantage is that raster images are much more dependent on resolution, which is a factor of their physical size often expressed in dots-per-inch (dpi), and they generally cannot be resized larger without diminishing their quality. Another disadvantage is that very large file sizes are required for high-resolution applications such as printing. Adobe Photoshop is an example of a raster-based software application.

A note about extensions: file extensions such as the old DOS style suffixes are not required for Macintosh users but are still very helpful for PC users. It is highly recommended that all users should apply this convention in order to aid file compatibility.

Postscript (PS or .ps)
Adobe Postscript is a page description, or graphic imaging language. It is the primary technology that enabled and advanced the desktop publishing revolution. It is basically an ASCII text file that contains coded program language that instructs a graphic interpreter, such as a Postscript enabled printer, how to create an image. Whether it's page of type, a vector drawing, a raster image, or any combination thereof, it can be described by Postscript. Very few applications save files in pure Postscript format, but many applications include Postscript code as part of their file information. In many applications Postscript code is created-on-the-fly by the application when needed for printing. It is also useful as a print-to-disk file that can be downloaded to printers or used to distill PDF files. Only a few applications can "parse," or create images directly from Postscript files. Other applications will simply show the code, or do nothing.

Encapsulated Postscript (EPS or .eps)
You cannot generally view a Postscript formatted image -- it is text file. However, when importing Postscript files into other applications, it is always useful to be able to see a representation of the image. Encapsulated Postscript is a variation of Postscript that also contains a preview image. This is by far a more common and useful version of Postscript and can be used by many applications as an exchangeable format. It is a highly recommended vector format.

Tagged Image File Format (TIFF or .tif)
TIFF is a high-resolution raster image file format that supports RGB, CMYK, grayscale and bitmap images. It is a very widely used format that is supported by most graphic applications and is ideal for cross-platform and inter-application exchange. It is also a highly favored format for photographs to be printed. TIFF does not use file compression in the base standard. However, it does work well with certain types of loss-less compression schemes such as LZW. When in doubt, make it a TIFF.

Joint Photographic Experts Group (JPEG or .jpg -- also know as JFIF)
JPEG is a very popular and useful format for raster images on the Internet. It is platform independent and can be exchanged among different computers. It supports both RGB and CMYK color modes and uses compression to minimize file sizes. The level of compression is configurable and can be optimized to maintain better image quality or smaller file sizes. In some cases it may be used for images intended for print. However, this must be done carefully since JPEG's compression is lossy, meaning it physically alters the image permanently by discarding information. This is more critical for print images. Moreover, continual re-saving in the JPEG format can compound the degradation. As a general rule, JPEG is best suited for Internet use.

Graphics Interchange Format (GIF or .gif)
This is the most popular format for raster images on the Internet. It is platform independent and can be exchanged among different computers. It only supports RGB color mode and must index colors to a color table. Consequently, although it does not support the highest image quality, it is an extremely efficient compressed format and displays flat color especially well. Compression is somewhat configurable by modifying the depth of the color table. In no case may the table contain more than 256 colors. It is not a useful format for working in print and is best suited exclusively to the Web.

Device Independent Bitmap (BMP or .bmp -- also known as DIB)
This is the standard windows format for raster graphics. It is not a preferred production format since it does not support CMYK color. But due to prevalence of Windows it is a format that is frequently encountered. Professional graphic producers will typically convert these to other formats, such as TIFF, for the final production. BMP does support RGB, grayscale and bitmap modes.

Portable Document Format (PDF or .pdf)
This is a hybrid format based on Postscript that can contain both raster and vector information. It provides the highly compatible exchange of all kinds of documents across virtually all platforms. It is based on Adobe's Acrobat system of tools. In order to view PDFs the user must have the Adobe Acrobat Reader and associated operating system files installed. The reader and its Web browser plug-ins are freeware and are available from the Adobe Web site. To create PDFs a special printing driver or separate conversion utility is required. This software is called Acrobat Exchange or Acrobat Distiller and is only available for purchase.

PDF is an extremely useful format for what it does, but due to the necessity for it to do so many things, it does have limitations. One of these is a limited ability to edit or modify files once created -- a very important aspect of print production workflow. It is a highly configurable format that can provide from low-to-high compression results depending on the fidelity of the match required for the exchange. Acrobat uses a font substitution scheme that may allow for some variation in appearance for the sake of file size. When working in professional print production, high fidelity requirements generally result in large file sizes. It is rapidly becoming a new preferred standard for dealing with professional pre-press work. It is not a recommended format when you need someone to be able to edit your file.

Photoshop Document (PSD or .psd)
Although the Photoshop format is a proprietary format, due to its prevalence as a standard image-editing application, it is an extremely useful format for exchange. The native PSD format is cross-platform and can be used by both Macs and PCs. The PSD format is primarily a raster format that provides extended support for features such as layering, alpha channels, true spot color, vector type and paths, and many other features. Converting a PSD to another raster format will "flatten" or disable many of these features. It is a highly recommended format where professional image editing may be required.

Text Formats

Word Document (DOC or .doc)
Due to the prevalence of Microsoft Word, its document format has become a de-facto standard for text exchange. This is a cross-platform format that can easily be shared between Mac and PC users. Some consideration much be given to font selection since these are not transportable with the file and may not match with other users who do not have the same font. Most print production shops will translate it into other formats for final production.

Plain Text (.txt)
Plain test is a bare bones, text-only format that virtually every word processing application supports. It is uses ASCII encoding for text characters and does not support any internal formatting for font selection or styles. It is a workhorse for text file exchange. When in doubt, make it Plain Text.

There are also a couple of variations of Plain Text that can be useful when dealing with certain spreadsheet or database files. These include Tab-delimited Text and Comma-delimited Text. Both of these are still basic Plain Text files.

Rich Text Format (RTF or .rft)
RTF is a more robust ASCII text format that also supports some internal text styles and formatting, such as color, bold, italic, tabs, etc. RTF is not as widely supported as Plain Text.

WordPerfect (WPD or .wpd)
This is a proprietary word processing format. Due to limited compatibility features between Microsoft Word and WordPerfect, it is still somewhat viable as an exchange format. Most print production shops do not use WordPerfect and will translate it into other formats for final production.

Portable Document Format (PDF or .pdf)
PDF is also a viable text exchange format. For more information, see the PDF description in the above graphic section.

Hypertext Markup Language (HTML or .html - also .htm)
HTML is the file format for Web pages. It is essentially a Plain Text format with special coded tags that control the display of text and graphics on a Web page. The key to HTML is the tags and understanding how to use them. There are numerous resources for this available on the Web. HTML files can be created manually using text editors or automatically by applications that create Web pages. Making sense of pure HTML pages can be difficult for non-programmers, though the plain text can be view by numerous applications, including the Web browsers themselves. By far the most common use for these files is for Web browsers to parse into formatted pages. Except for a few specialized cases, this is not a very useful format for file exchange.

Written August 2001.