REXML Tutorial

Why REXML?

To Include, or Not to Include?

REXML is a module. To use it, you must require it:

require 'rexml' # => true

If you do not also include it, you must fully qualify references to REXML:

REXML::Document # => REXML::Document

If you also include the module, you may optionally omit REXML:::

include REXML
Document # => REXML::Document
REXML::Document # => REXML::Document

Preliminaries

All examples here assume that the following code has been executed:

require 'rexml'
include REXML

The source XML for many examples here is from file books.xml at w3schools.com. You may find it convenient to open that page in a new tab (Ctrl-click in some browsers).

Note that your browser may display the XML with modified whitespace and without the XML declaration, which in this case is:

<?xml version="1.0" encoding="UTF-8"?>

For convenience, we capture the XML into a string variable:

require 'open-uri'
source_string = URI.open('https://www.w3schools.com/xml/books.xml').read

And into a file:

File.write('source_file.xml', source_string)

Throughout these examples, variable doc will hold only the document derived from these sources:

doc = Document.new(source_string)

Parsing XML Source

Parsing a Document

Use method REXML::Document::new to parse XML source.

The source may be a string:

doc = Document.new(source_string)

Or an IO stream:

doc = File.open('source_file.xml', 'r') do |io|
  Document.new(io)
end

Method URI.open returns a StringIO object, so the source can be from a web page:

require 'open-uri'
io = URI.open("https://www.w3schools.com/xml/books.xml")
io.class # => StringIO
doc = Document.new(io)

For any of these sources, the returned object is an REXML::Document:

doc       # => <UNDEFINED> ... </>
doc.class # => REXML::Document

Note: 'UNDEFINED' is the “name” displayed for a document, even though doc.name returns an empty string "".

A parsed document may produce REXML objects of many classes, but the two that are likely to be of greatest interest are REXML::Document and REXML::Element. These two classes are covered in great detail in this tutorial.

Context (Parsing Options)

The context for parsing a document is a hash that influences the way the XML is read and stored.

The context entries are:

See Element Context.

Exploring the Document

An REXML::Document object represents an XML document.

The object inherits from its ancestor classes:

This section covers only those properties and methods that are unique to a document (that is, not inherited or included).

Document Properties

A document has several properties (other than its children);

Document Type

A document may have a document type:

my_xml = '<!DOCTYPE foo>'
my_doc = Document.new(my_xml)
doc_type = my_doc.doctype
doc_type.class # => REXML::DocType
doc_type.to_s  # => "<!DOCTYPE foo>"
Node Type

A document also has a node type (always :document):

doc.node_type # => :document
Name

A document has a name (always an empty string):

doc.name # => ""
Document

Method REXML::Document#document returns self:

doc.document == doc # => true

An object of a different class (REXML::Element or REXML::Child) may have a document, which is the document to which the object belongs; if so, that document will be an REXML::Document object.

doc.root.document.class # => REXML::Document
XPath

method REXML::Element#xpath returns the string xpath to the element, relative to its most distant ancestor:

doc.root.class             # => REXML::Element
doc.root.xpath             # => "/bookstore"
doc.root.texts.first       # => "\n\n"
doc.root.texts.first.xpath # => "/bookstore/text()"

If there is no ancestor, returns the expanded name of the element:

Element.new('foo').xpath # => "foo"

Document Children

A document may have children of these types:

XML Declaration

A document may an XML declaration, which is stored as an REXML::XMLDecl object:

doc.xml_decl       # => <?xml ... ?>
doc.xml_decl.class # => REXML::XMLDecl

Document.new('').xml_decl # => <?xml ... ?>

my_xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>"'
my_doc = Document.new(my_xml)
xml_decl = my_doc.xml_decl
xml_decl.to_s  # => "<?xml version='1.0' encoding='UTF-8' standalone="yes"?>"

The version, encoding, and stand-alone values may be retrieved separately:

my_doc.version      # => "1.0"
my_doc.encoding     # => "UTF-8"
my_doc.stand_alone? # => "yes"
Root Element

A document may have a single element child, called the root element, which is stored as an REXML::Element object; it may be retrieved with method root:

doc.root           # => <bookstore> ... </>
doc.root.class     # => REXML::Element

Document.new('').root # => nil
Text

A document may have text passages, each of which is stored as an REXML::Text object:

doc.texts.each {|t| p [t.class, t] }

Output:

[REXML::Text, "\n"]
Processing Instructions

A document may have processing instructions, which are stored as REXML::Instruction objects:

Output:

[REXML::Instruction, <?p-i my-application ...?>]
[REXML::Instruction, <?p-i my-application ...?>]
Comments

A document may have comments, which are stored as REXML::Comment objects:

my_xml = <<-EOT
  <!--foo-->
  <!--bar-->
EOT
my_doc = Document.new(my_xml)
my_doc.comments.each {|c| p [c.class, c] }

Output:

[REXML::Comment, #<REXML::Comment: @parent=<UNDEFINED> ... </>, @string="foo">]
[REXML::Comment, #<REXML::Comment: @parent=<UNDEFINED> ... </>, @string="bar">]
CDATA

A document may have CDATA entries, which are stored as REXML::CData objects:

my_xml = <<-EOT
  <![CDATA[foo]]>
  <![CDATA[bar]]>
EOT
my_doc = Document.new(my_xml)
my_doc.cdatas.each {|cd| p [cd.class, cd] }

Output:

[REXML::CData, "foo"]
[REXML::CData, "bar"]

The payload of a document is a tree of nodes, descending from the root element:

doc.root.children.each do |child|
  p [child, child.class]
end

Output:

[REXML::Text, "\n\n"]
[REXML::Element, <book category='cooking'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='children'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='web'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='web' cover='paperback'> ... </>]
[REXML::Text, "\n\n"]

Exploring an Element

An REXML::Element object represents an XML element.

The object inherits from its ancestor classes:

This section covers methods:

Inside the Element

Brief String Representation

Use method REXML::Element#inspect to retrieve a brief string representation.

doc.root.inspect # => "<bookstore> ... </>"

The ellipsis (...) indicates that the element has children. When there are no children, the ellipsis is omitted:

Element.new('foo').inspect # => "<foo/>"

If the element has attributes, those are also included:

doc.root.elements.first.inspect # => "<book category='cooking'> ... </>"
Extended String Representation

Use inherited method REXML::Child.bytes to retrieve an extended string representation.

doc.root.bytes # => "<bookstore>\n\n<book category='cooking'>\n  <title lang='en'>Everyday Italian</title>\n  <author>Giada De Laurentiis</author>\n  <year>2005</year>\n  <price>30.00</price>\n</book>\n\n<book category='children'>\n  <title lang='en'>Harry Potter</title>\n  <author>J K. Rowling</author>\n  <year>2005</year>\n  <price>29.99</price>\n</book>\n\n<book category='web'>\n  <title lang='en'>XQuery Kick Start</title>\n  <author>James McGovern</author>\n  <author>Per Bothner</author>\n  <author>Kurt Cagle</author>\n  <author>James Linn</author>\n  <author>Vaidyanathan Nagarajan</author>\n  <year>2003</year>\n  <price>49.99</price>\n</book>\n\n<book category='web' cover='paperback'>\n  <title lang='en'>Learning XML</title>\n  <author>Erik T. Ray</author>\n  <year>2003</year>\n  <price>39.95</price>\n</book>\n\n</bookstore>"
Node Type

Use method REXML::Element#node_type to retrieve the node type (always :element):

doc.root.node_type # => :element
Raw Mode

Use method REXML::Element#raw to retrieve whether (true or nil) raw mode is set.

doc.root.raw # => nil
Context

Use method REXML::Element#context to retrieve the context hash (see Element Context):

doc.root.context # => {}

Relationships

An element may have:

Ancestors

Containing Document

Use method REXML::Element#document to retrieve the containing document, if any:

ele = doc.root.elements.first   # => <book category='cooking'> ... </>
ele.document                    # => <UNDEFINED> ... </>
ele = Element.new('foo')        # => <foo/>
ele.document                    # => nil
Root Element

Use method REXML::Element#root to retrieve the root element:

ele = doc.root.elements.first   # => <book category='cooking'> ... </>
ele.root                        # => <bookstore> ... </>
ele = Element.new('foo')        # => <foo/>
ele.root                        # => <foo/>
Root Node

Use method REXML::Element#root_node to retrieve the most distant ancestor, which is the containing document, if any, otherwise the root element:

ele = doc.root.elements.first   # => <book category='cooking'> ... </>
ele.root_node                   # => <UNDEFINED> ... </>
ele = Element.new('foo')        # => <foo/>
ele.root_node                   # => <foo/>
Parent

Use inherited method REXML::Child#parent to retrieve the parent

ele = doc.root                # => <bookstore> ... </>
ele.parent                    # => <UNDEFINED> ... </>
ele = doc.root.elements.first # => <book category='cooking'> ... </>
ele.parent                    # => <bookstore> ... </>

Use included method REXML::Node#index_in_parent to retrieve the index of the element among all of its parents children (not just the element children). Note that while the index for doc.root.elements[n] is 1-based, the returned index is 0-based.

doc.root.children # =>
  # ["\n\n",
  #  <book category='cooking'> ... </>,
  #  "\n\n",
  #  <book category='children'> ... </>,
  #  "\n\n",
  #  <book category='web'> ... </>,
  #  "\n\n",
  #  <book category='web' cover='paperback'> ... </>,
  #  "\n\n"]
ele = doc.root.elements[1] # => <book category='cooking'> ... </>
ele.index_in_parent # => 2
ele = doc.root.elements[2]  # => <book category='children'> ... </>
ele.index_in_parent# => 4

Siblings

Next Element

Use method REXML::Element#next_element to retrieve the first following sibling that is itself an element (nil if there is none):

ele = doc.root.elements[1]
while ele do
  p [ele.class, ele]
  ele = ele.next_element
end
p ele

Output:

[REXML::Element, <book category='cooking'> ... </>]
[REXML::Element, <book category='children'> ... </>]
[REXML::Element, <book category='web'> ... </>]
[REXML::Element, <book category='web' cover='paperback'> ... </>]
Previous Element

Use method REXML::Element#previous_element to retrieve the first preceding sibling that is itself an element (nil if there is none):

ele = doc.root.elements[4]
while ele do
  p [ele.class, ele]
  ele = ele.previous_element
end
p ele

Output:

[REXML::Element, <book category='web' cover='paperback'> ... </>]
[REXML::Element, <book category='web'> ... </>]
[REXML::Element, <book category='children'> ... </>]
[REXML::Element, <book category='cooking'> ... </>]
Next Node

Use included method REXML::Node.next_sibling_node (or its alias next_sibling) to retrieve the first following node regardless of its class:

node = doc.root.children[0]
while node do
  p [node.class, node]
  node = node.next_sibling
end
p node

Output:

[REXML::Text, "\n\n"]
[REXML::Element, <book category='cooking'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='children'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='web'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='web' cover='paperback'> ... </>]
[REXML::Text, "\n\n"]
Previous Node

Use included method REXML::Node.previous_sibling_node (or its alias previous_sibling) to retrieve the first preceding node regardless of its class:

node = doc.root.children[-1]
while node do
  p [node.class, node]
  node = node.previous_sibling
end
p node

Output:

[REXML::Text, "\n\n"]
[REXML::Element, <book category='web' cover='paperback'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='web'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='children'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='cooking'> ... </>]
[REXML::Text, "\n\n"]

Children

Child Count

Use inherited method REXML::Parent.size to retrieve the count of nodes (of all types) in the element:

doc.root.size # => 9
Child Nodes

Use inherited method REXML::Parent.children to retrieve an array of the child nodes (of all types):

doc.root.children # =>
                  # ["\n\n",
                  #  <book category='cooking'> ... </>,
                  #  "\n\n",
                  #  <book category='children'> ... </>,
                  #  "\n\n",
                  #  <book category='web'> ... </>,
                  #  "\n\n",
                  #  <book category='web' cover='paperback'> ... </>,
                  #  "\n\n"]
Child at Index

Use method REXML::Element#[] to retrieve the child at a given numerical index, or nil if there is no such child:

doc.root[0]  # => "\n\n"
doc.root[1]  # => <book category='cooking'> ... </>
doc.root[7]  # => <book category='web' cover='paperback'> ... </>
doc.root[8]  # => "\n\n"

doc.root[-1] # => "\n\n"
doc.root[-2] # => <book category='web' cover='paperback'> ... </>

doc.root[50] # => nil
Index of Child

Use method REXML::Parent#index to retrieve the zero-based child index of the given object, or #size - 1 if there is no such child:

ele = doc.root     # => <bookstore> ... </>
ele.index(ele[0])  # => 0
ele.index(ele[1])  # => 1
ele.index(ele[7])  # => 7
ele.index(ele[8])  # => 8

ele.index(ele[-1]) # => 8
ele.index(ele[-2]) # => 7

ele.index(ele[50]) # => 8
Element Children

Use method REXML::Element#has_elements? to retrieve whether the element has element children:

doc.root.has_elements?                  # => true
REXML::Element.new('foo').has_elements? # => false

Use method REXML::Element#elements to retrieve the REXML::Elements object containing the element children:

eles = doc.root.elements
eles      # => #<REXML::Elements:0x000001ee2848e960 @element=<bookstore> ... </>>
eles.size # => 4
eles.each {|e| p [e.class], e }

Output:

[<book category='cooking'> ... </>,
 <book category='children'> ... </>,
 <book category='web'> ... </>,
 <book category='web' cover='paperback'> ... </>
]

Note that while in this example, all the element children of the root element are elements of the same name, 'book', that is not true of all documents; a root element (or any other element) may have any mixture of child elements.

CDATA Children

Use method REXML::Element#cdatas to retrieve a frozen array of CDATA children:

my_xml = <<-EOT
  <root>
    <![CDATA[foo]]>
    <![CDATA[bar]]>
  </root>
EOT
my_doc = REXML::Document.new(my_xml)
cdatas my_doc.root.cdatas
cdatas.frozen?              # => true
cdatas.map {|cd| cd.class } # => [REXML::CData, REXML::CData]
Comment Children

Use method REXML::Element#comments to retrieve a frozen array of comment children:

my_xml = <<-EOT
  <root>
    <!--foo-->
    <!--bar-->
  </root>
EOT
my_doc = REXML::Document.new(my_xml)
comments = my_doc.root.comments
comments.frozen?            # => true
comments.map {|c| c.class } # => [REXML::Comment, REXML::Comment]
comments.map {|c| c.to_s }  # => ["foo", "bar"]
Processing Instruction Children

Use method REXML::Element#instructions to retrieve a frozen array of processing instruction children:

my_xml = <<-EOT
  <root>
    <?target0 foo?>
    <?target1 bar?>
  </root>
EOT
my_doc = REXML::Document.new(my_xml)
instrs = my_doc.root.instructions
instrs.frozen?            # => true
instrs.map {|i| i.class } # => [REXML::Instruction, REXML::Instruction]
instrs.map {|i| i.to_s }  # => ["<?target0 foo?>", "<?target1 bar?>"]
Text Children

Use method REXML::Element#has_text? to retrieve whether the element has text children:

doc.root.has_text?                  # => true
REXML::Element.new('foo').has_text? # => false

Use method REXML::Element#texts to retrieve a frozen array of text children:

my_xml = '<root><a/>text<b/>more<c/></root>'
my_doc = REXML::Document.new(my_xml)
texts = my_doc.root.texts
texts.frozen?            # => true
texts.map {|t| t.class } # => [REXML::Text, REXML::Text]
texts.map {|t| t.to_s }  # => ["text", "more"]
Parenthood

Use inherited method REXML::Parent.parent? to retrieve whether the element is a parent; always returns true; only REXML::Child#parent returns false.

doc.root.parent? # => true

Element Attributes

Use method REXML::Element#has_attributes? to return whether the element has attributes:

ele = doc.root           # => <bookstore> ... </>
ele.has_attributes?      # => false
ele = ele.elements.first # => <book category='cooking'> ... </>
ele.has_attributes?      # => true

Use method REXML::Element#attributes to return the hash containing the attributes for the element. Each hash key is a string attribute name; each hash value is an REXML::Attribute object.

ele = doc.root                  # => <bookstore> ... </>
attrs = ele.attributes          # => {}

ele = ele.elements.first        # => <book category='cooking'> ... </>
attrs = ele.attributes          # => {"category"=>category='cooking'}
attrs.size                      # => 1
attr_name = attrs.keys.first    # => "category"
attr_name.class                 # => String
attr_value = attrs.values.first # => category='cooking'
attr_value.class                # => REXML::Attribute

Use method REXML::Element#[] to retrieve the string value for a given attribute, which may be given as either a string or a symbol:

ele = doc.root.elements.first # => <book category='cooking'> ... </>
attr_value = ele['category']  # => "cooking"
attr_value.class              # => String
ele['nosuch']                  # => nil

Use method REXML::Element#attribute to retrieve the value of a named attribute:

my_xml = "<root xmlns:a='a' a:x='a:x' x='x'/>"
my_doc = REXML::Document.new(my_xml)
my_doc.root.attribute("x")      # => x='x'
my_doc.root.attribute("x", "a") # => a:x='a:x'

Whitespace

Use method REXML::Element#ignore_whitespace_nodes to determine whether whitespace nodes were ignored when the XML was parsed; returns true if so, nil otherwise.

Use method REXML::Element#whitespace to determine whether whitespace is respected for the element; returns true if so, false otherwise.

Namespaces

Use method REXML::Element#namespace to retrieve the string namespace URI for the element, which may derive from one of its ancestors:

xml_string = <<-EOT
  <root>
     <a xmlns='1' xmlns:y='2'>
       <b/>
       <c xmlns:z='3'/>
     </a>
  </root>
EOT
d = Document.new(xml_string)
b = d.elements['//b']
b.namespace      # => "1"
b.namespace('y') # => "2"
b.namespace('nosuch') # => nil

Use method REXML::Element#namespaces to retrieve a hash of all defined namespaces in the element and its ancestors:

xml_string = <<-EOT
  <root>
     <a xmlns:x='1' xmlns:y='2'>
       <b/>
       <c xmlns:z='3'/>
     </a>
  </root>
EOT
d = Document.new(xml_string)
d.elements['//a'].namespaces # => {"x"=>"1", "y"=>"2"}
d.elements['//b'].namespaces # => {"x"=>"1", "y"=>"2"}
d.elements['//c'].namespaces # => {"x"=>"1", "y"=>"2", "z"=>"3"}

Use method REXML::Element#prefixes to retrieve an array of the string prefixes (names) of all defined namespaces in the element and its ancestors:

xml_string = <<-EOT
  <root>
     <a xmlns:x='1' xmlns:y='2'>
       <b/>
       <c xmlns:z='3'/>
     </a>
  </root>
EOT
d = Document.new(xml_string, {compress_whitespace: :all})
d.elements['//a'].prefixes # => ["x", "y"]
d.elements['//b'].prefixes # => ["x", "y"]
d.elements['//c'].prefixes # => ["x", "y", "z"]

Traversing

You can use certain methods to traverse children of the element. Each child that meets given criteria is yielded to the given block.

Traverse All Children

Use inherited method REXML::Parent#each (or its alias each_child) to traverse all children of the element:

doc.root.each {|child| p [child.class, child] }

Output:

[REXML::Text, "\n\n"]
[REXML::Element, <book category='cooking'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='children'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='web'> ... </>]
[REXML::Text, "\n\n"]
[REXML::Element, <book category='web' cover='paperback'> ... </>]
[REXML::Text, "\n\n"]
Traverse Element Children

Use method REXML::Element#each_element to traverse only the element children of the element:

doc.root.each_element {|e| p [e.class, e] }

Output:

[REXML::Element, <book category='cooking'> ... </>]
[REXML::Element, <book category='children'> ... </>]
[REXML::Element, <book category='web'> ... </>]
[REXML::Element, <book category='web' cover='paperback'> ... </>]
Traverse Element Children with Attribute

Use method REXML::Element#each_element_with_attribute with the single argument attr_name to traverse each element child that has the given attribute:

my_doc = Document.new '<a><b id="1"/><c id="2"/><d id="1"/><e/></a>'
my_doc.root.each_element_with_attribute('id') {|e| p [e.class, e] }

Output:

[REXML::Element, <b id='1'/>]
[REXML::Element, <c id='2'/>]
[REXML::Element, <d id='1'/>]

Use the same method with a second argument value to traverse each element child element that has the given attribute and value:

my_doc.root.each_element_with_attribute('id', '1') {|e| p [e.class, e] }

Output:

[REXML::Element, <b id='1'/>]
[REXML::Element, <d id='1'/>]

Use the same method with a third argument max to traverse no more than the given number of element children:

my_doc.root.each_element_with_attribute('id', '1', 1) {|e| p [e.class, e] }

Output:

[REXML::Element, <b id='1'/>]

Use the same method with a fourth argument xpath to traverse only those element children that match the given xpath:

my_doc.root.each_element_with_attribute('id', '1', 2, '//d') {|e| p [e.class, e] }

Output:

[REXML::Element, <d id='1'/>]
Traverse Element Children with Text

Use method REXML::Element#each_element_with_text with no arguments to traverse those element children that have text:

my_doc = Document.new '<a><b>b</b><c>b</c><d>d</d><e/></a>'
my_doc.root.each_element_with_text {|e| p [e.class, e] }

Output:

[REXML::Element, <b> ... </>]
[REXML::Element, <c> ... </>]
[REXML::Element, <d> ... </>]

Use the same method with the single argument text to traverse those element children that have exactly that text:

my_doc.root.each_element_with_text('b') {|e| p [e.class, e] }

Output:

[REXML::Element, <b> ... </>]
[REXML::Element, <c> ... </>]

Use the same method with additional second argument max to traverse no more than the given number of element children:

my_doc.root.each_element_with_text('b', 1) {|e| p [e.class, e] }

Output:

[REXML::Element, <b> ... </>]

Use the same method with additional third argument xpath to traverse only those element children that also match the given xpath:

my_doc.root.each_element_with_text('b', 2, '//c') {|e| p [e.class, e] }

Output:

[REXML::Element, <c> ... </>]
Traverse Element Children’s Indexes

Use inherited method REXML::Parent#each_index to traverse all children’s indexes (not just those of element children):

doc.root.each_index {|i| print i }

Output:

012345678
Traverse Children Recursively

Use included method REXML::Node#each_recursive to traverse all children recursively:

doc.root.each_recursive {|child| p [child.class, child] }

Output:

[REXML::Element, <book category='cooking'> ... </>]
[REXML::Element, <title lang='en'> ... </>]
[REXML::Element, <author> ... </>]
[REXML::Element, <year> ... </>]
[REXML::Element, <price> ... </>]
[REXML::Element, <book category='children'> ... </>]
[REXML::Element, <title lang='en'> ... </>]
[REXML::Element, <author> ... </>]
[REXML::Element, <year> ... </>]
[REXML::Element, <price> ... </>]
[REXML::Element, <book category='web'> ... </>]
[REXML::Element, <title lang='en'> ... </>]
[REXML::Element, <author> ... </>]
[REXML::Element, <author> ... </>]
[REXML::Element, <author> ... </>]
[REXML::Element, <author> ... </>]
[REXML::Element, <author> ... </>]
[REXML::Element, <year> ... </>]
[REXML::Element, <price> ... </>]
[REXML::Element, <book category='web' cover='paperback'> ... </>]
[REXML::Element, <title lang='en'> ... </>]
[REXML::Element, <author> ... </>]
[REXML::Element, <year> ... </>]
[REXML::Element, <price> ... </>]

Searching

You can use certain methods to search among the descendants of an element.

Use method REXML::Element#get_elements to retrieve all element children of the element that match the given xpath:

xml_string = <<-EOT
<root>
  <a level='1'>
    <a level='2'/>
  </a>
</root>
EOT
d = Document.new(xml_string)
d.root.get_elements('//a') # => [<a level='1'> ... </>, <a level='2'/>]

Use method REXML::Element#get_text with no argument to retrieve the first text node in the first child:

my_doc = Document.new "<p>some text <b>this is bold!</b> more text</p>"
text_node = my_doc.root.get_text
text_node.class # => REXML::Text
text_node.to_s  # => "some text "

Use the same method with argument xpath to retrieve the first text node in the first child that matches the xpath:

my_doc.root.get_text(1) # => "this is bold!"

Use method REXML::Element#text with no argument to retrieve the text from the first text node in the first child:

my_doc = Document.new "<p>some text <b>this is bold!</b> more text</p>"
text_node = my_doc.root.text
text_node.class # => String
text_node       # => "some text "

Use the same method with argument xpath to retrieve the text from the first text node in the first child that matches the xpath:

my_doc.root.text(1) # => "this is bold!"

Use included method REXML::Node#find_first_recursive to retrieve the first descendant element for which the given block returns a truthy value, or nil if none:

doc.root.find_first_recursive do |ele|
  ele.name == 'price'
end # => <price> ... </>
doc.root.find_first_recursive do |ele|
  ele.name == 'nosuch'
end # => nil

Editing

Editing a Document

Creating a Document

Create a new document with method REXML::Document::new:

doc = Document.new(source_string)
empty_doc = REXML::Document.new
Adding to the Document

Add an XML declaration with method REXML::Document#add and an argument of type REXML::XMLDecl:

my_doc = Document.new
my_doc.xml_decl.to_s # => ""
my_doc.add(XMLDecl.new('2.0'))
my_doc.xml_decl.to_s # => "<?xml version='2.0'?>"

Add a document type with method REXML::Document#add and an argument of type REXML::DocType:

my_doc = Document.new
my_doc.doctype.to_s # => ""
my_doc.add(DocType.new('foo'))
my_doc.doctype.to_s # => "<!DOCTYPE foo>"

Add a node of any other REXML type with method REXML::Document#add and an argument that is not of type REXML::XMLDecl or REXML::DocType:

my_doc = Document.new
my_doc.add(Element.new('foo'))
my_doc.to_s # => "<foo/>"

Add an existing element as the root element with method REXML::Document#add_element:

ele = Element.new('foo')
my_doc = Document.new
my_doc.add_element(ele)
my_doc.root # => <foo/>

Create and add an element as the root element with method REXML::Document#add_element:

my_doc = Document.new
my_doc.add_element('foo')
my_doc.root # => <foo/>

Editing an Element

Creating an Element

Create a new element with method REXML::Element::new:

ele = Element.new('foo') # => <foo/>

Setting Element Properties

Set the context for an element with method REXML::Element#context= (see Element Context):

ele.context # => nil
ele.context = {ignore_whitespace_nodes: :all}
ele.context # => {:ignore_whitespace_nodes=>:all}

Set the parent for an element with inherited method REXML::Child#parent=

ele.parent # => nil
ele.parent = Element.new('bar')
ele.parent # => <bar/>

Set the text for an element with method REXML::Element#text=:

ele.text # => nil
ele.text = 'bar'
ele.text # => "bar"

Adding to an Element

Add a node as the last child with inherited method REXML::Parent#add (or its alias push):

ele = Element.new('foo') # => <foo/>
ele.push(Text.new('bar'))
ele.push(Element.new('baz'))
ele.children # => ["bar", <baz/>]

Add a node as the first child with inherited method REXML::Parent#unshift:

ele = Element.new('foo') # => <foo/>
ele.unshift(Element.new('bar'))
ele.unshift(Text.new('baz'))
ele.children # => ["bar", <baz/>]

Add an element as the last child with method REXML::Element#add_element:

ele = Element.new('foo') # => <foo/>
ele.add_element('bar')
ele.add_element(Element.new('baz'))
ele.children # => [<bar/>, <baz/>]

Add a text node as the last child with method REXML::Element#add_text:

ele = Element.new('foo') # => <foo/>
ele.add_text('bar')
ele.add_text(Text.new('baz'))
ele.children # => ["bar", "baz"]

Insert a node before a given node with method REXML::Parent#insert_before:

ele = Element.new('foo') # => <foo/>
ele.add_text('bar')
ele.add_text(Text.new('baz'))
ele.children    # => ["bar", "baz"]
target = ele[1] # => "baz"
ele.insert_before(target, Text.new('bat'))
ele.children    # => ["bar", "bat", "baz"]

Insert a node after a given node with method REXML::Parent#insert_after:

ele = Element.new('foo') # => <foo/>
ele.add_text('bar')
ele.add_text(Text.new('baz'))
ele.children    # => ["bar", "baz"]
target = ele[0] # => "bar"
ele.insert_after(target, Text.new('bat'))
ele.children    # => ["bar", "bat", "baz"]

Add an attribute with method REXML::Element#add_attribute:

ele = Element.new('foo') # => <foo/>
ele.add_attribute('bar', 'baz')
ele.add_attribute(Attribute.new('bat', 'bam'))
ele.attributes # => {"bar"=>bar='baz', "bat"=>bat='bam'}

Add multiple attributes with method REXML::Element#add_attributes:

ele = Element.new('foo') # => <foo/>
ele.add_attributes({'bar' => 'baz', 'bat' => 'bam'})
ele.add_attributes([['ban', 'bap'], ['bah', 'bad']])
ele.attributes # => {"bar"=>bar='baz', "bat"=>bat='bam', "ban"=>ban='bap', "bah"=>bah='bad'}

Add a namespace with method REXML::Element#add_namespace:

ele = Element.new('foo') # => <foo/>
ele.add_namespace('bar')
ele.add_namespace('baz', 'bat')
ele.namespaces # => {"xmlns"=>"bar", "baz"=>"bat"}

Deleting from an Element

Delete a specific child object with inherited method REXML::Parent#delete:

ele = Element.new('foo') # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.children             # => [<bar/>, "baz"]
target = ele[1]          # => "baz"
ele.delete(target)       # => "baz"
ele.children             # => [<bar/>]
target = ele[0]          # => <baz/>
ele.delete(target)       # => <baz/>
ele.children             # => []

Delete a child at a specific index with inherited method REXML::Parent#delete_at:

ele = Element.new('foo') # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.children             # => [<bar/>, "baz"]
ele.delete_at(1)
ele.children             # => [<bar/>]
ele.delete_at(0)
ele.children             # => []

Delete all children meeting a specified criterion with inherited method REXML::Parent#delete_if:

ele = Element.new('foo') # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.add_element('bat')
ele.add_text('bam')
ele.children             # => [<bar/>, "baz", <bat/>, "bam"]
ele.delete_if {|child| child.instance_of?(Text) }
ele.children # => [<bar/>, <bat/>]

Delete an element at a specific 1-based index with method REXML::Element#delete_element:

ele = Element.new('foo') # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.add_element('bat')
ele.add_text('bam')
ele.children             # => [<bar/>, "baz", <bat/>, "bam"]
ele.delete_element(2)    # => <bat/>
ele.children             # => [<bar/>, "baz", "bam"]
ele.delete_element(1)    # => <bar/>
ele.children             # => ["baz", "bam"]

Delete a specific element with the same method:

ele = Element.new('foo')   # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.add_element('bat')
ele.add_text('bam')
ele.children               # => [<bar/>, "baz", <bat/>, "bam"]
target = ele.elements[2]   # => <bat/>
ele.delete_element(target) # => <bat/>
ele.children               # => [<bar/>, "baz", "bam"]

Delete an element matching an xpath using the same method:

ele = Element.new('foo')    # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.add_element('bat')
ele.add_text('bam')
ele.children                # => [<bar/>, "baz", <bat/>, "bam"]
ele.delete_element('./bat') # => <bat/>
ele.children                # => [<bar/>, "baz", "bam"]
ele.delete_element('./bar') # => <bar/>
ele.children                # => ["baz", "bam"]

Delete an attribute by name with method REXML::Element#delete_attribute:

ele = Element.new('foo') # => <foo/>
ele.add_attributes({'bar' => 'baz', 'bam' => 'bat'})
ele.attributes           # => {"bar"=>bar='baz', "bam"=>bam='bat'}
ele.delete_attribute('bam')
ele.attributes           # => {"bar"=>bar='baz'}

Delete a namespace with method REXML::Element#delete_namespace:

ele = Element.new('foo') # => <foo/>
ele.add_namespace('bar')
ele.add_namespace('baz', 'bat')
ele.namespaces           # => {"xmlns"=>"bar", "baz"=>"bat"}
ele.delete_namespace('xmlns')
ele.namespaces           # => {} # => {"baz"=>"bat"}
ele.delete_namespace('baz')
ele.namespaces # => {}   # => {}

Remove an element from its parent with inherited method REXML::Child#remove:

ele = Element.new('foo')    # => <foo/>
parent = Element.new('bar') # => <bar/>
parent.add_element(ele)     # => <foo/>
parent.children.size        # => 1
ele.remove                  # => <foo/>
parent.children.size        # => 0

Replacing Nodes

Replace the node at a given 0-based index with inherited method REXML::Parent#[]=:

ele = Element.new('foo') # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.add_element('bat')
ele.add_text('bam')
ele.children             # => [<bar/>, "baz", <bat/>, "bam"]
ele[2] = Text.new('bad') # => "bad"
ele.children             # => [<bar/>, "baz", "bad", "bam"]

Replace a given node with another node with inherited method REXML::Parent#replace_child:

ele = Element.new('foo') # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.add_element('bat')
ele.add_text('bam')
ele.children             # => [<bar/>, "baz", <bat/>, "bam"]
target = ele[2]          # => <bat/>
ele.replace_child(target, Text.new('bah'))
ele.children             # => [<bar/>, "baz", "bah", "bam"]

Replace self with a given node with inherited method REXML::Child#replace_with:

ele = Element.new('foo') # => <foo/>
ele.add_element('bar')
ele.add_text('baz')
ele.add_element('bat')
ele.add_text('bam')
ele.children             # => [<bar/>, "baz", <bat/>, "bam"]
target = ele[2]          # => <bat/>
target.replace_with(Text.new('bah'))
ele.children             # => [<bar/>, "baz", "bah", "bam"]

Cloning

Create a shallow clone of an element with method REXML::Element#clone. The clone contains the name and attributes, but not the parent or children:

ele = Element.new('foo')
ele.add_attributes({'bar' => 0, 'baz' => 1})
ele.clone # => <foo bar='0' baz='1'/>

Create a shallow clone of a document with method REXML::Document#clone. The XML declaration is copied; the document type and root element are not cloned:

my_xml = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo><root/>'
my_doc = Document.new(my_xml)
clone_doc = my_doc.clone

my_doc.xml_decl         # => <?xml ... ?>
clone_doc.xml_decl      # => <?xml ... ?>

my_doc.doctype.to_s     # => "<?xml version='1.0' encoding='UTF-8'?>"
clone_doc.doctype.to_s  # => ""

my_doc.root             # => <root/>
clone_doc.root          # => nil

Create a deep clone of an element with inherited method REXML::Parent#deep_clone. All nodes and attributes are copied:

doc.to_s.size   # => 825
clone  = doc.deep_clone
clone.to_s.size # => 825

Writing the Document

Write a document to an IO stream (defaults to $stdout) with method REXML::Document#write:

doc.write

Output:

<?xml version='1.0' encoding='UTF-8'?>
<bookstore>

<book category='cooking'>
  <title lang='en'>Everyday Italian</title>
  <author>Giada De Laurentiis</author>
  <year>2005</year>
  <price>30.00</price>
</book>

<book category='children'>
  <title lang='en'>Harry Potter</title>
  <author>J K. Rowling</author>
  <year>2005</year>
  <price>29.99</price>
</book>

<book category='web'>
  <title lang='en'>XQuery Kick Start</title>
  <author>James McGovern</author>
  <author>Per Bothner</author>
  <author>Kurt Cagle</author>
  <author>James Linn</author>
  <author>Vaidyanathan Nagarajan</author>
  <year>2003</year>
  <price>49.99</price>
</book>

<book category='web' cover='paperback'>
  <title lang='en'>Learning XML</title>
  <author>Erik T. Ray</author>
  <year>2003</year>
  <price>39.95</price>
</book>

</bookstore>