String manipulation in Erlang. (Learning Erlang 9)

Published on Nov 23, 2012

Other articles in the Learning Erlang series.

  1. Learning Erlang.
  2. Heads and tails, working with lists. (Learning Erlang 2)
  3. Variables, comparisons and dynamic typing. (Learning Erlang 3)
  4. Modules and functions, Erlang building blocks (Learning Erlang 4).
  5. Module attributes and compilation (Learning Erlang 5).
  6. Playing with recursion, (Learning Erlang 6).
  7. Playing with the file module (part 1). (Learning Erlang 7)
  8. The file module (part 2). (Learning Erlang 8)
  9. Records. (Learning Erlang 10)
  10. Functional arrays. (Learning Erlang 11)

Yesterday I was asked how you do string manipulation in Erlang, since they are treated internally as a list of numbers that represent characters.

I suspected that there should be such a think as a string module but I didn’t really know so I say as much.

After coming back home I decided to open the Erlang docs and sure enough, there is a string module.

Let’s take a look.

Note of caution. These are my notes while learning Erlang. You are welcome to follow along and use them as a guide. Please make sure to check the Erlang language site

Some basic operations.

  1> Str = "this is a string".
  "this is a string"
  2> string:len(Str).
  3> length(Str).

We can easily check for the string length using the len/1 function.

Notice that we can also use length on Str since it’s actually a list.

We can also check for the number of words using words/1 that uses a space as the word separator or words/2 that takes a character as the word separator.

  1> string:words("this is a string").
  2> string:words("this is a string", $a).

Building strings.

We can create new strings using a number of functions.

join/2 takes a list of tokens and a string as a separator and “stitch” them together.

  1> string:join(["this", "is", "a", "string"], " ").
  "this is a string"

concat/2 join two strings together.

  1> string:concat("this is a string", ", and now is longer.").
  "this is a string, and now is longer"

We can copy a string a given number of times with copies/2.

  1> string:concat("de",string:copies("do", 3)).

We can use chars/3 to build a string using the same character multiple times.

  1> string:concat("this is g", string:chars($r, 10, "eat")).
  "this is grrrrrrrrrreat"

There is another variation without the tail.

  1> string:chars($+, 20).

You can use a number of functions to pad a string with either spaces or some character.

  1> Msg = "To be continued".
  2> string:left(Msg, string:len(Msg)+3, $.).
  "To be continued..."
  3> string:left(Msg, string:len(Msg)+3).
  "To be continued   "
  4> string:centre(Msg, string:len(Msg)+3, $.).
  ".To be continued.."
  5> string:centre(Msg, string:len(Msg)+4, $.).
  "..To be continued.."
  6> string:right(Msg, string:len(Msg)+3, $.).
  "...To be continued"

Splitting and slicing.

We can easily split a string into a list of tokens using the tokens/2 function.

  1> Tokens = string:tokens("this is a string", " ").

Notice that token/2 will return a list of tokens without the separator.

  1> string:tokens("this is a string", "i").
  ["th","s ","s a str","ng"]

You can get parts of a string using the familiar substr and sub_string functions.

  1> string:substr("abcdefghijklm", 5).
  2> string:substr("abcdefghijklm", 5, 3).
  3> string:sub_string("abcdefghijklm", 5).
  4> string:sub_string("abcdefghijklm", 5, 8).

Both substr/2 and sub_string/2 are equivalent but substr/3 takes the start and length while sub_string/3 takes the start and ending index instead.

You can find a given word in a string by index with sub_word/2.

  1> string:sub_string("this is a string of characters", 4).

Sometimes a string is not a human language and we want to use a different character as the word separator using sub_word/3.

  1> string:sub_word("this is a string of characters", 4, $i).
  "ng of characters"

Finding and comparing strings.

String equality is easy, just call equal/2.

  1> string:equal("equality", "equality").
  2> A = "equality".
  3> B = "equality".
  4> string:equal(A, B).

In most cases you may want to normalize the strings before comparing them.

You can use to_lower or to_upper to normalize the capitalization of a string.

  1> A = "EQ".
  2> B = "Eq".
  3> string:equal(A, B).
  4> string:equal(string:to_lower(A), string:to_lower(B)).
  5> string:equal(string:to_upper(A), string:to_upper(B)).

Using the strip functions you can remove spaces or filled characters from the string.

  1> string:strip("    no blank    ").
  "no blank"
  2> string:strip("    no blank    ", left).
  "no blank    "
  3> string:strip("    no blank    ", right).
  "    no blank"
  4> string:strip("    no blank    ", both).
  "no blank"
  5> string:strip("+++++++no plus signs++++++", both, $+).
  "no plus signs"
  6> string:strip("+++++++no plus signs++++++", left, $+).
  "no plus signs++++++"
  7> string:strip("+++++++no plus signs++++++", right, $+).
  "+++++++no plus signs"

We can check for inclusion of a character or a string in another string using the chr/2 and str/2 functions and their reverse versions.

They will return the position in the string or 0 if not found.

  1> Str = "this is a string".
  2> string:str(Str, "n").
  3> string:str(Str, "p").
  4> string:rstr(Str, "n").
  5> string:rstr(Str, "s").
  6> string:str(Str, "s").
  7> string:chr(Str, 116).
  8> string:rchr(Str, 116).
  9> string:rchr(Str, 11).

Float and integer conversion.

These functions are very interesting.

If the string starts with an integer it will parse that part of the string returning the Integer and Rest in a tuple {Integer, Rest}.

  1> string:to_integer("98.87").
  2> {Ia, Irest} = string:to_integer("09.10").
  3> Ia.
  4> Irest.
  5> string:to_integer(Irest).
  6> {Ic,_} = string:to_integer("+3").
  7> {Id,_} = string:to_integer("-3").

The to_float/1 function has a similar behaviour.

  1> string:to_float("2.67").
  2> string:to_float("2.67 - 10").
  {2.67," - 10"}
  3> string:to_float("-10").
  4> string:to_float("-10.2").
  5> string:to_float("-10.").
  6> string:to_float("-10.0").

This covers the string module that give us most of the tools we are used to have in some other languages.