Structured Markup Processing Tools¶
Python supports a variety of modules to work with various forms of structured data markup. This includes modules to work with the Standard Generalized Markup Language (SGML) and the Hypertext Markup Language (HTML), and several interfaces for working with the Extensible Markup Language (XML).
html
— HyperText Markup Language supporthtml.parser
— Simple HTML and XHTML parserHTMLParser
- Example HTML Parser Application
HTMLParser
MethodsHTMLParser.feed()
HTMLParser.close()
HTMLParser.reset()
HTMLParser.getpos()
HTMLParser.get_starttag_text()
HTMLParser.handle_starttag()
HTMLParser.handle_endtag()
HTMLParser.handle_startendtag()
HTMLParser.handle_data()
HTMLParser.handle_entityref()
HTMLParser.handle_charref()
HTMLParser.handle_comment()
HTMLParser.handle_decl()
HTMLParser.handle_pi()
HTMLParser.unknown_decl()
- Examples
html.entities
— Definitions of HTML general entities- XML Processing Modules
xml.etree.ElementTree
— The ElementTree XML API- Tutorial
- XPath support
- Reference
- XInclude support
- Reference
- Functions
- Element Objects
Element
Element.tag
Element.text
Element.tail
Element.attrib
Element.clear()
Element.get()
Element.items()
Element.keys()
Element.set()
Element.append()
Element.extend()
Element.find()
Element.findall()
Element.findtext()
Element.insert()
Element.iter()
Element.iterfind()
Element.itertext()
Element.makeelement()
Element.remove()
- ElementTree Objects
- QName Objects
- TreeBuilder Objects
- XMLParser Objects
- XMLPullParser Objects
- Exceptions
xml.dom
— The Document Object Model API- Module Contents
- Objects in the DOM
- DOMImplementation Objects
- Node Objects
Node.nodeType
Node.parentNode
Node.attributes
Node.previousSibling
Node.nextSibling
Node.childNodes
Node.firstChild
Node.lastChild
Node.localName
Node.prefix
Node.namespaceURI
Node.nodeName
Node.nodeValue
Node.hasAttributes()
Node.hasChildNodes()
Node.isSameNode()
Node.appendChild()
Node.insertBefore()
Node.removeChild()
Node.replaceChild()
Node.normalize()
Node.cloneNode()
- NodeList Objects
- DocumentType Objects
- Document Objects
- Element Objects
Element.tagName
Element.getElementsByTagName()
Element.getElementsByTagNameNS()
Element.hasAttribute()
Element.hasAttributeNS()
Element.getAttribute()
Element.getAttributeNode()
Element.getAttributeNS()
Element.getAttributeNodeNS()
Element.removeAttribute()
Element.removeAttributeNode()
Element.removeAttributeNS()
Element.setAttribute()
Element.setAttributeNode()
Element.setAttributeNodeNS()
Element.setAttributeNS()
- Attr Objects
- NamedNodeMap Objects
- Comment Objects
- Text and CDATASection Objects
- ProcessingInstruction Objects
- Exceptions
- Conformance
xml.dom.minidom
— Minimal DOM implementationxml.dom.pulldom
— Support for building partial DOM treesxml.sax
— Support for SAX2 parsersxml.sax.handler
— Base classes for SAX handlersContentHandler
DTDHandler
EntityResolver
ErrorHandler
LexicalHandler
feature_namespaces
feature_namespace_prefixes
feature_string_interning
feature_validation
feature_external_ges
feature_external_pes
all_features
property_lexical_handler
property_declaration_handler
property_dom_node
property_xml_string
all_properties
- ContentHandler Objects
ContentHandler.setDocumentLocator()
ContentHandler.startDocument()
ContentHandler.endDocument()
ContentHandler.startPrefixMapping()
ContentHandler.endPrefixMapping()
ContentHandler.startElement()
ContentHandler.endElement()
ContentHandler.startElementNS()
ContentHandler.endElementNS()
ContentHandler.characters()
ContentHandler.ignorableWhitespace()
ContentHandler.processingInstruction()
ContentHandler.skippedEntity()
- DTDHandler Objects
- EntityResolver Objects
- ErrorHandler Objects
- LexicalHandler Objects
xml.sax.saxutils
— SAX Utilitiesxml.sax.xmlreader
— Interface for XML parsersXMLReader
IncrementalParser
Locator
InputSource
AttributesImpl
AttributesNSImpl
- XMLReader Objects
XMLReader.parse()
XMLReader.getContentHandler()
XMLReader.setContentHandler()
XMLReader.getDTDHandler()
XMLReader.setDTDHandler()
XMLReader.getEntityResolver()
XMLReader.setEntityResolver()
XMLReader.getErrorHandler()
XMLReader.setErrorHandler()
XMLReader.setLocale()
XMLReader.getFeature()
XMLReader.setFeature()
XMLReader.getProperty()
XMLReader.setProperty()
- IncrementalParser Objects
- Locator Objects
- InputSource Objects
- The
Attributes
Interface - The
AttributesNS
Interface
xml.parsers.expat
— Fast XML parsing using ExpatExpatError
error
XMLParserType
ErrorString()
ParserCreate()
- XMLParser Objects
xmlparser.Parse()
xmlparser.ParseFile()
xmlparser.SetBase()
xmlparser.GetBase()
xmlparser.GetInputContext()
xmlparser.ExternalEntityParserCreate()
xmlparser.SetParamEntityParsing()
xmlparser.UseForeignDTD()
xmlparser.buffer_size
xmlparser.buffer_text
xmlparser.buffer_used
xmlparser.ordered_attributes
xmlparser.specified_attributes
xmlparser.ErrorByteIndex
xmlparser.ErrorCode
xmlparser.ErrorColumnNumber
xmlparser.ErrorLineNumber
xmlparser.CurrentByteIndex
xmlparser.CurrentColumnNumber
xmlparser.CurrentLineNumber
xmlparser.XmlDeclHandler()
xmlparser.StartDoctypeDeclHandler()
xmlparser.EndDoctypeDeclHandler()
xmlparser.ElementDeclHandler()
xmlparser.AttlistDeclHandler()
xmlparser.StartElementHandler()
xmlparser.EndElementHandler()
xmlparser.ProcessingInstructionHandler()
xmlparser.CharacterDataHandler()
xmlparser.UnparsedEntityDeclHandler()
xmlparser.EntityDeclHandler()
xmlparser.NotationDeclHandler()
xmlparser.StartNamespaceDeclHandler()
xmlparser.EndNamespaceDeclHandler()
xmlparser.CommentHandler()
xmlparser.StartCdataSectionHandler()
xmlparser.EndCdataSectionHandler()
xmlparser.DefaultHandler()
xmlparser.DefaultHandlerExpand()
xmlparser.NotStandaloneHandler()
xmlparser.ExternalEntityRefHandler()
- ExpatError Exceptions
- Example
- Content Model Descriptions
- Expat error constants
codes
messages
XML_ERROR_ASYNC_ENTITY
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
XML_ERROR_BAD_CHAR_REF
XML_ERROR_BINARY_ENTITY_REF
XML_ERROR_DUPLICATE_ATTRIBUTE
XML_ERROR_INCORRECT_ENCODING
XML_ERROR_INVALID_TOKEN
XML_ERROR_JUNK_AFTER_DOC_ELEMENT
XML_ERROR_MISPLACED_XML_PI
XML_ERROR_NO_ELEMENTS
XML_ERROR_NO_MEMORY
XML_ERROR_PARAM_ENTITY_REF
XML_ERROR_PARTIAL_CHAR
XML_ERROR_RECURSIVE_ENTITY_REF
XML_ERROR_SYNTAX
XML_ERROR_TAG_MISMATCH
XML_ERROR_UNCLOSED_TOKEN
XML_ERROR_UNDEFINED_ENTITY
XML_ERROR_UNKNOWN_ENCODING
XML_ERROR_UNCLOSED_CDATA_SECTION
XML_ERROR_EXTERNAL_ENTITY_HANDLING
XML_ERROR_NOT_STANDALONE
XML_ERROR_UNEXPECTED_STATE
XML_ERROR_ENTITY_DECLARED_IN_PE
XML_ERROR_FEATURE_REQUIRES_XML_DTD
XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING
XML_ERROR_UNBOUND_PREFIX
XML_ERROR_UNDECLARING_PREFIX
XML_ERROR_INCOMPLETE_PE
XML_ERROR_XML_DECL
XML_ERROR_TEXT_DECL
XML_ERROR_PUBLICID
XML_ERROR_SUSPENDED
XML_ERROR_NOT_SUSPENDED
XML_ERROR_ABORTED
XML_ERROR_FINISHED
XML_ERROR_SUSPEND_PE
XML_ERROR_RESERVED_PREFIX_XML
XML_ERROR_RESERVED_PREFIX_XMLNS
XML_ERROR_RESERVED_NAMESPACE_URI
XML_ERROR_INVALID_ARGUMENT
XML_ERROR_NO_BUFFER
XML_ERROR_AMPLIFICATION_LIMIT_BREACH