lxml Cheatsheet

Get Data

lmxl Version
Get from a string: xml_root = etree.fromstring(data) lxmldata = etree.fromstring(data)

or from a file:

with open(opffile, 'r') as f:
    tree = etree.parse(f)
xml_root = tree.getroot()

Soup Version
soup = BeautifulSoup(data, 'xml')

Target element from data
variable = lxmldata.xpath(INSERT_COMMAND)

Serialize XML

xml_string = etree.tostring(xml_root, pretty_print=True).decode()
f.write(xml_string)

Serialize HTML

html_string = etree.tostring(html_root, pretty_print=True, method="html").decode()
print(html_string)

Or, preferably, just write the xml tree directly:

with open(opffile, 'wb') as f:
    tree.write(f, pretty_print=True, xml_declaration=True, encoding='UTF-8')

There are several tools/methods for extracting elements

Extract all 'title' elements from the XML

variable = lxmldata.xpath ('//title/text()')
print(titles) # Output: ['Python Programming', 'Mastering XML']

Extract the title of the book with id="1"

book_title = lxmldata.xpath('//book[@id="1"]/title/text()')
print(book_title)  # Output: ['Python Programming']

Get all 'book' ids

book_ids = lxmldata.xpath('//book/@id')
print(book_ids)  # Output: ['1', '2']

Find the first 'book' element

first_book = xml_root.find('book')
print(first_book.find('title').text)  # Output: Python Programming

Find all 'book' elements

all_books = xml_root.findall('book')
for book in all_books:
print(book.find('author').text)  # Output: John Smith, Jane Doe

from lxml.cssselect import CSSSelector

Select all 'p' elements with class 'highlight'

sel = CSSSelector('p.highlight')
highlighted_elements = sel(html_root)

Select element by ID

header = CSSSelector('#main')(html_root)

Select elements by attribute

links = CSSSelector('a[href]')(html_root)