Glindra
Documentation Index Download
Command Line File Handling and ASCII Tools

charmap Character Maps

Introduction

The filter command allows us to map 8-bit characters into arbitrary other 8-bit characters. It can also remove certain characters entirely.

To specify which characters should be mapped into what, we create a file that contains a charmap character map, and give that file to the filter command.

Example
The command
> filter in.dat -output=out.dat -map=mymap.charmap
will apply the character mappings specified in the file mymap.charmap to the file in.dat, and write the result to the file out.dat .

The rest of this page describes the syntax for charmap files.


charmap Syntax

Example
Suppose we want to do the following transformations on an input file: We can achieve this with the filter command above, and the following charmap parameter file:
# Example: mymap.charmap

# Map curly brackets to parentheses, and hash marks to dollar signs
"{}#" -> "()$"

# Convert lower case a-z to upper case
"abcdefghijklmnopqrstuvwxyz" ->
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"

# Map all printing characters over octal 177 to question marks.
" ¡¢£¤¥¦§¨©ª«¬-®¯°±²³´µ¶·¸¹º»¼½¾¿" -> "?"   # non-breaking space, soft hyphen
"ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß" -> "?"
"àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ" -> "?"

# Remove all control characters
'\000\001\002\003\004\005\006\a\010\t\n\v\f\r\016\017\020' -> ""
'\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177' -> ""
'\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217' -> ""
'\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237' -> ""


Comments

The first line in the example is a comment.
# Example: mymap.charmap
Comments start with a # character that is not inside a quoted string, and continue to the end of the line. They do not have to be the first thing that appears on the line, so this is okay as well:
" ¡¢£¤¥¦§¨©ª«¬-®¯°±²³´µ¶·¸¹º»¼½¾¿" -> "?"   # non-breaking space, soft hyphen

Mappings

The first mapping in the example is the line
"{}#" -> "()$"
The two strings on either side of the -> token are the same length. The first character in the left hand string is mapped to the first character of the right hand string, the second to the second, etc.

The strings can be enclosed in either single or double quotes. See Quoted String Syntax for further details.

Free flowing format
Line breaks can be inserted wherever a whitespace is allowed. This allows us to split a mapping on two lines when that improves readability, like in:
"abcdefghijklmnopqrstuvwxyz" ->
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Single target character
If the left hand string contains several character that should all be mapped to the same target character, we can give a one character right hand string. The following line says that each of the characters in the left hand string should be mapped to a question mark.
"ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß" -> "?"
Empty target character
To specify that one or more characters should be removed completely, we give an empty right hand string. Example:
'\000\001\002\003\004\005\006\a\010\t\n\v\f\r\016\017\020' -> ""
The '\n' character cannot be mapped with filter
Although the previous mapping states the the newline character '\n' should be removed, this will not actually happen when you run the filter command. The reason is that filter reads the input file one line at a time, and applies the charmap to each individual line. This means that it will never see any '\n' character in the input, and therefore will never apply the mapping.
"Conflicting" mappings
If a charmap specifies different mappings for the same source character, the last specification overrides any previous ones.
Example
"abcdefghijklmnopqrstuvwxyz" ->
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"i" -> "i"
Makes upper case of all the English alphabet letters except "i", which is left in lower case.
Default mapping
Characters that do not appear in any of the left hand strings are left unchanged.

This means that an empty file counts as a valid charmap that leaves all characters as they were.