String manipulation in Erlang. (Learning Erlang 9)
Published on Nov 23, 2012Other articles in the Learning Erlang series.
- Learning Erlang.
- Heads and tails, working with lists. (Learning Erlang 2)
- Variables, comparisons and dynamic typing. (Learning Erlang 3)
- Modules and functions, Erlang building blocks (Learning Erlang 4).
- Module attributes and compilation (Learning Erlang 5).
- Playing with recursion, (Learning Erlang 6).
- Playing with the file module (part 1). (Learning Erlang 7)
- The file module (part 2). (Learning Erlang 8)
- Records. (Learning Erlang 10)
- Functional arrays. (Learning Erlang 11)
Yesterday I was asked how you do string manipulation in Erlang, since they are treated internally as a list of numbers that represent characters.
I suspected that there should be such a think as a string module but I didn’t really know so I say as much.
After coming back home I decided to open the Erlang docs and sure enough, there is a string module.
Let’s take a look.
Note of caution. These are my notes while learning Erlang. You are welcome to follow along and use them as a guide. Please make sure to check the Erlang language site
Some basic operations.
1> Str = "this is a string".
"this is a string"
2> string:len(Str).
16
3> length(Str).
16
We can easily check for the string length using the len/1 function.
Notice that we can also use length on Str since it’s actually a list.
We can also check for the number of words using words/1 that uses a space as the word separator or words/2 that takes a character as the word separator.
1> string:words("this is a string").
4
2> string:words("this is a string", $a).
2
Building strings.
We can create new strings using a number of functions.
join/2 takes a list of tokens and a string as a separator and “stitch” them together.
1> string:join(["this", "is", "a", "string"], " ").
"this is a string"
concat/2 join two strings together.
1> string:concat("this is a string", ", and now is longer.").
"this is a string, and now is longer"
We can copy a string a given number of times with copies/2.
1> string:concat("de",string:copies("do", 3)).
"dedododo"
We can use chars/3 to build a string using the same character multiple times.
1> string:concat("this is g", string:chars($r, 10, "eat")).
"this is grrrrrrrrrreat"
There is another variation without the tail.
1> string:chars($+, 20).
"++++++++++++++++++++"
You can use a number of functions to pad a string with either spaces or some character.
1> Msg = "To be continued".
2> string:left(Msg, string:len(Msg)+3, $.).
"To be continued..."
3> string:left(Msg, string:len(Msg)+3).
"To be continued "
4> string:centre(Msg, string:len(Msg)+3, $.).
".To be continued.."
5> string:centre(Msg, string:len(Msg)+4, $.).
"..To be continued.."
6> string:right(Msg, string:len(Msg)+3, $.).
"...To be continued"
Splitting and slicing.
We can easily split a string into a list of tokens using the tokens/2 function.
1> Tokens = string:tokens("this is a string", " ").
["this","is","a","string"]
Notice that token/2 will return a list of tokens without the separator.
1> string:tokens("this is a string", "i").
["th","s ","s a str","ng"]
You can get parts of a string using the familiar substr and sub_string functions.
1> string:substr("abcdefghijklm", 5).
"efghijklm"
2> string:substr("abcdefghijklm", 5, 3).
"efg"
3> string:sub_string("abcdefghijklm", 5).
"efghijklm"
4> string:sub_string("abcdefghijklm", 5, 8).
"efgh"
Both substr/2 and sub_string/2 are equivalent but substr/3 takes the start and length while sub_string/3 takes the start and ending index instead.
You can find a given word in a string by index with sub_word/2.
1> string:sub_string("this is a string of characters", 4).
"string"
Sometimes a string is not a human language and we want to use a different character as the word separator using sub_word/3.
1> string:sub_word("this is a string of characters", 4, $i).
"ng of characters"
Finding and comparing strings.
String equality is easy, just call equal/2.
1> string:equal("equality", "equality").
true
2> A = "equality".
3> B = "equality".
4> string:equal(A, B).
true
In most cases you may want to normalize the strings before comparing them.
You can use to_lower or to_upper to normalize the capitalization of a string.
1> A = "EQ".
2> B = "Eq".
3> string:equal(A, B).
false
4> string:equal(string:to_lower(A), string:to_lower(B)).
true
5> string:equal(string:to_upper(A), string:to_upper(B)).
true
Using the strip functions you can remove spaces or filled characters from the string.
1> string:strip(" no blank ").
"no blank"
2> string:strip(" no blank ", left).
"no blank "
3> string:strip(" no blank ", right).
" no blank"
4> string:strip(" no blank ", both).
"no blank"
5> string:strip("+++++++no plus signs++++++", both, $+).
"no plus signs"
6> string:strip("+++++++no plus signs++++++", left, $+).
"no plus signs++++++"
7> string:strip("+++++++no plus signs++++++", right, $+).
"+++++++no plus signs"
We can check for inclusion of a character or a string in another string using the chr/2 and str/2 functions and their reverse versions.
They will return the position in the string or 0 if not found.
1> Str = "this is a string".
2> string:str(Str, "n").
15
3> string:str(Str, "p").
0
4> string:rstr(Str, "n").
15
5> string:rstr(Str, "s").
11
6> string:str(Str, "s").
4
7> string:chr(Str, 116).
1
8> string:rchr(Str, 116).
12
9> string:rchr(Str, 11).
0
Float and integer conversion.
These functions are very interesting.
If the string starts with an integer it will parse that part of the string returning the Integer and Rest in a tuple {Integer, Rest}.
1> string:to_integer("98.87").
{98,".87"}
2> {Ia, Irest} = string:to_integer("09.10").
{9,".10"}
3> Ia.
9
4> Irest.
".10"
5> string:to_integer(Irest).
{error,no_integer}
6> {Ic,_} = string:to_integer("+3").
{3,[]}
7> {Id,_} = string:to_integer("-3").
{-3,[]}
The to_float/1 function has a similar behaviour.
1> string:to_float("2.67").
{2.67,[]}
2> string:to_float("2.67 - 10").
{2.67," - 10"}
3> string:to_float("-10").
{error,no_float}
4> string:to_float("-10.2").
{-10.2,[]}
5> string:to_float("-10.").
{error,no_float}
6> string:to_float("-10.0").
{-10.0,[]}
This covers the string module that give us most of the tools we are used to have in some other languages.