Difference between revisions of "String"

 
Line 8: Line 8:
 
*<nowiki>*</nowiki> repeats a string a (positive integral) number of times.  For example, <tt>'spam'*3</tt> will evaluate to <tt>'spamspamspam'</tt>.
 
*<nowiki>*</nowiki> repeats a string a (positive integral) number of times.  For example, <tt>'spam'*3</tt> will evaluate to <tt>'spamspamspam'</tt>.
 
*<tt>x '''in''' s</tt> will test if x is a substring of s.  That is, <tt>x in s</tt> will return true [[iff]] some contiguous [[slice]] of s is equal to x.  
 
*<tt>x '''in''' s</tt> will test if x is a substring of s.  That is, <tt>x in s</tt> will return true [[iff]] some contiguous [[slice]] of s is equal to x.  
*<tt>s[i]</tt> will return the i'th character of string s.  Note that like other [[Python sequence types]] string indices start at zero.
+
*<tt>s[i]</tt> will return the i'th character of string s.  Note that like other [[sequence]]s string indices start at zero.
 
*<tt>s[-i]</tt> will return the i'th character from the end of string s, so <tt>s[-1]</tt> is the last character of s, <tt>s[-2]</tt> is the second-to-last character, etc.
 
*<tt>s[-i]</tt> will return the i'th character from the end of string s, so <tt>s[-1]</tt> is the last character of s, <tt>s[-2]</tt> is the second-to-last character, etc.
 
*Strings can be [[slice]]d.
 
*Strings can be [[slice]]d.
  
 
==Useful Functions==
 
==Useful Functions==
 +
Most of these functions will work on any type of [[sequence (Python)|sequence]], not just strings, but they're repeated here because their conceptual behavior might seem slightly different.
 
*<tt>'''len'''(s)</tt> returns the number of characters in the string s.
 
*<tt>'''len'''(s)</tt> returns the number of characters in the string s.
 
*<tt>s.'''lower'''()</tt> returns a copy of s, but with all uppercase characters converted to lowercase. (useful if you don't want a test to be case-sensitive)
 
*<tt>s.'''lower'''()</tt> returns a copy of s, but with all uppercase characters converted to lowercase. (useful if you don't want a test to be case-sensitive)
 
*Similarly, <tt>s.'''upper'''()</tt> returns a copy of s with all lowercase characters converted to uppercase.
 
*Similarly, <tt>s.'''upper'''()</tt> returns a copy of s with all lowercase characters converted to uppercase.
 
*<tt>s.'''split'''([sep])</tt> returns a [[list]] whose elements are the words in the string.  The parameter <tt>sep</tt> is optional, by default strings are split on all whitespace (spaces, tabs, newlines).  For example, <tt>'To be, or not to be.'.split()</tt> will evaluate to <tt>['To', 'be,', 'or', 'not', 'to', 'be.']</tt>.  Note that split doesn't remove punctuation.
 
*<tt>s.'''split'''([sep])</tt> returns a [[list]] whose elements are the words in the string.  The parameter <tt>sep</tt> is optional, by default strings are split on all whitespace (spaces, tabs, newlines).  For example, <tt>'To be, or not to be.'.split()</tt> will evaluate to <tt>['To', 'be,', 'or', 'not', 'to', 'be.']</tt>.  Note that split doesn't remove punctuation.
 +
====Searching====
 
*<tt>s.'''index'''(x)</tt> returns the index of the first occurrence of the substring x in s.  If x isn't a substring an error is thrown.
 
*<tt>s.'''index'''(x)</tt> returns the index of the first occurrence of the substring x in s.  If x isn't a substring an error is thrown.
 
*<tt>s.'''find'''(x)</tt> does the same thing as <tt>index</tt>, but returns -1 if x isn't a substring, rather than throwing an error.
 
*<tt>s.'''find'''(x)</tt> does the same thing as <tt>index</tt>, but returns -1 if x isn't a substring, rather than throwing an error.
Line 22: Line 24:
 
*<tt>s.'''count'''(x)</tt> returns the number of times the substring x appears in s.  Note that it only counts non-overlapping occurrences, so <tt>'000000'.count('000')</tt> will return 2.
 
*<tt>s.'''count'''(x)</tt> returns the number of times the substring x appears in s.  Note that it only counts non-overlapping occurrences, so <tt>'000000'.count('000')</tt> will return 2.
 
*<tt>s.'''replace'''(x, y)</tt> replaces every occurrence of the substring x in s with y.  For example, <tt>'batCatRat'.replace('at', 'oy')</tt> will return <tt>'boyCoyRoy'</tt>.
 
*<tt>s.'''replace'''(x, y)</tt> replaces every occurrence of the substring x in s with y.  For example, <tt>'batCatRat'.replace('at', 'oy')</tt> will return <tt>'boyCoyRoy'</tt>.
 +
====Stripping====
 
*<tt>s.'''strip'''()</tt> returns s, but with any trailing whitespace removed.  This is sometimes useful when working with files.
 
*<tt>s.'''strip'''()</tt> returns s, but with any trailing whitespace removed.  This is sometimes useful when working with files.
*<tt>s.'''lstrip'''()</tt> returns s, but with any preceding whitespace removed.
+
*<tt>s.'''lstrip'''()</tt> returns s, but with any ''preceding'' whitespace removed.
  
  

Latest revision as of 01:24, 8 April 2011

A string is a primitive datatype in most programming languages that represents an ordered sequence of characters. In most languages strings are delimited with double quotes ("this is a string"), but in Python strings can be delimited by double or single quotes ('this is also a string in Python').

To a computer, strings are conceptually different from numbers, so for example the string "222" has nothing to do with the integer 222. To convert between strings and numbers in Python use type conversion functions.

String Operations

Some Python operations have special behaviors on strings. Notably,

  • + combines two strings in a process called concatenation. For example, 'Stan'+'ford' will evaluate to 'Stanford'.
  • * repeats a string a (positive integral) number of times. For example, 'spam'*3 will evaluate to 'spamspamspam'.
  • x in s will test if x is a substring of s. That is, x in s will return true iff some contiguous slice of s is equal to x.
  • s[i] will return the i'th character of string s. Note that like other sequences string indices start at zero.
  • s[-i] will return the i'th character from the end of string s, so s[-1] is the last character of s, s[-2] is the second-to-last character, etc.
  • Strings can be sliced.

Useful Functions

Most of these functions will work on any type of sequence, not just strings, but they're repeated here because their conceptual behavior might seem slightly different.

  • len(s) returns the number of characters in the string s.
  • s.lower() returns a copy of s, but with all uppercase characters converted to lowercase. (useful if you don't want a test to be case-sensitive)
  • Similarly, s.upper() returns a copy of s with all lowercase characters converted to uppercase.
  • s.split([sep]) returns a list whose elements are the words in the string. The parameter sep is optional, by default strings are split on all whitespace (spaces, tabs, newlines). For example, 'To be, or not to be.'.split() will evaluate to ['To', 'be,', 'or', 'not', 'to', 'be.']. Note that split doesn't remove punctuation.

Searching

  • s.index(x) returns the index of the first occurrence of the substring x in s. If x isn't a substring an error is thrown.
  • s.find(x) does the same thing as index, but returns -1 if x isn't a substring, rather than throwing an error.
  • s.rindex(x) and s.rfind(x) (short for right index and right find) return the last occurrence of the substring x in s.
  • s.count(x) returns the number of times the substring x appears in s. Note that it only counts non-overlapping occurrences, so '000000'.count('000') will return 2.
  • s.replace(x, y) replaces every occurrence of the substring x in s with y. For example, 'batCatRat'.replace('at', 'oy') will return 'boyCoyRoy'.

Stripping

  • s.strip() returns s, but with any trailing whitespace removed. This is sometimes useful when working with files.
  • s.lstrip() returns s, but with any preceding whitespace removed.


For many more less common string methods, see the Python documentation below.

See Also