Basic String Operations

part of Tcl for Web Nerds by Hal Abelson, Philip Greenspun, and Lydia Sandon; updated July 2011
If your program receives data from a Web client, it comes in as a string. If your program sends an HTML page back to a Web client, it goes out as a string. This puts the string data type at the heart of Web page development:
set whole_page "some stuff for the top of the page\n\n"
append whole_page "some stuff for the middle of the page\n\n"
append whole_page "some stuff for the bottom of the page\n\n"
# done composing the page, let's write it back to the user
ns_return 200 text/html $whole_page
If you're processing data from the user, typically entered into an HTML form, you'll be using a rich variety of built-in string-handling procedures. Suppose that a user is registering at your site with the form variables first_names, last_name, email, password. Here's how we might build up a list of exceptions (using the Tcl lappend command, described in the chapter on lists):
# compare the first_names value to the empty string
if { [string compare $first_names ""] == 0 } {
    lappend exception_list "You forgot to type your first name"
}

# see if their email address has the form
#   something at-sign something
if { ![regexp {.+@.+} $email] } {
    lappend exception_list "Your email address doesn't look valid."
}

if { [string length $password] > 20 } {
    lappend exception_list "The password you selected is too long."
}
If there aren't any exceptions, we have to get these data ready for insertion into the database:
# remove whitespace from ends of input (if any)
set last_name_trimmed [string trim $last_name]

# escape any single quotes with an extra one (since the SQL
# string literal quoting system uses single quotes)
regsub -all ' $last_name_trimmed '' last_name_final

set sql_insert "insert into users (..., last_name, ...) 
values 
(..., '$last_name_final', ...)"

Looking for stuff in a string

The simplest way to look for a substring within a string is with the string first command. Some users of photo.net complained that they didn't like seeing classified ads that were simply pointers to the eBay auction site. Here's a simplified snippet from the code that inserts ads into the database:
if { [string first "ebay" [string tolower $full_ad]] != -1 } {
    # return an exception
    ...
}
an alternative formulation would be
if { [regexp -nocase {ebay} $full_ad] } {
    # return an exception
    ...
}
Both implementations will catch any capitalization variant of "eBAY". Both implementations will miss "e-bay" but it doesn't matter because if the poster of the ad includes a link with a URL, the hyperlink will contain "ebay". What about false positives? If you visit www.m-w.com and search for "*ebay*" you'll find that both implementations might bite someone selling rhododendrons or a water-powered mill. That's why the toolkit code checks a "DisalloweBay" parameter, set by the publisher, before declaring this an exception.

If you're just trying to find a substring, you can use either string first or regexp. If you're trying to do something more subtle, you'll need regexp (described more fully in the chapter "Pattern Matching"):

if { ![regexp {[a-z]} $full_ad] } {
    # no lowercase letters in the ad!
    append exception_text "
  • Your ad appears to be all uppercase. ON THE INTERNET THIS IS CONSIDERED SHOUTING. IT IS ALSO MUCH HARDER TO READ THAN MIXED CASE TEXT. So we don't allow it, out of decorum and consideration for people who may be visually impaired." incr exception_count }
  • Using only part of a string

    In the ArsDigita Community System, we have a page that shows a user's complete history with a Web service, e.g., http://photo.net/shared/community-member.tcl?user_id=23069 shows all of the postings by Philip Greenspun. If a comment on a static page is short, we want to show the entire message. If not, we want to show just the first 1000 characters, which can be accomplished with the string range command:
    if { [string length $message] > 1000 } {
        set complete_message "[string range $message 0 1000]... "
    } else {
        set complete_message $message
    }
    

    Fortran-style formatting and reading of numbers

    The Tcl commands format and scan resemble C's printf and scanf commands. That's pretty much all that any Tcl manual will tell you about these commands, which means that you're kind of S.O.L. if you don't know C. The basic idea of these commands comes from Fortran, a computer language developed by John Backus at IBM in 1954. The FORMAT command in Fortran would let you control the printed display of a number, including such aspects as spaces of padding to the left and digits of precision after the decimal point.

    With Tcl format, the first argument is a pattern for how you'd like the final output to look. Inside the pattern are placeholders for values. The second through Nth arguments to format are the values themselves:

    format pattern value1 value2 value3 .. valueN
    
    We can never figure out how to use format without either copying an earlier fragment of pattern or referring to the man page (http://www.tcl.tk/man/tcl8.4/TclCmd/format.htm). However, here are some examples for you to copy:
    % # format prices with two digits after the point
    % format "Price:  %0.2f" 17
    Price:  17.00
    % # pad some stuff out to fill 20 spaces
    % format "%20s" "a long thing"
            a long thing
    % format "%20s" "23"
                      23
    % # notice that the 20 spaces is a MINIMUM; use string range
    % # if you might need to truncate
    % format "%20s" "something way longer than 20 spaces"
    something way longer than 20 spaces
    % # turn a number into an ASCII character
    % format "%c" 65
    A
    
    The Tcl command scan performs the reverse operation, i.e., parses an input string according to a pattern and stuffs values as it finds them into variables:
    % # turn an ASCII character into a number
    % scan "A" "%c" the_ascii_value
    1
    % set the_ascii_value
    65
    % 
    
    Notice that the number returned by scan is a count of how many conversions it was able to perform successfully. If you really want to use scan, you'll need to visit the man page: http://www.tcl.tk/man/tcl8.4/TclCmd/scan.htm. For an idea of how useful this is for Web development, consider that the entire 250,000-line ArsDigita Community System does not contain a single use of the scan command.

    Reference: String operations



    Continue on to list operations.


    Return to Table of Contents

    lsandon@alum.mit.edu
    Add a comment | Add a link