The document must conform to the syntax of XML. If it does not conform then it's said to have illegal syntax.
The basic idea behind a markup languages was the ability to tag or mark certain portions of a document and give them meaning.
If a document has no markup, then it is just a collection of characters. No more, no less.
A good example of non-marked up documents are the plain ASCII/UNICODE
files.
The individual words, sentences, ... have no meaning. They
are just a pile of characters.
e.g. This is a very large file
A program displaying this sentence can't give informative clues concerning the used words. It can only display the text.
Marking a piece of a document has 2 purposes
A good example of marked up documents are HTML files.
e.g. This is a <STRONG>very</STRONG> large file
Now, a program displaying this sentence can give an informative clue or extra
information. It knows that the author has put strong
emphasize on the adjective
very.
The program could for example color the adjective very in red, pop up an
exclamation mark when very is first displayed or make it blink.
The character oriented nature is one of the characteristics of XML which makes it cross-platform and powerful.
Because an XML document is technically speaking an a-cyclic graph (a tree structure), it is possible to create binary XML documents. This would however render those binary documents platform specific, because byte order (big or little Indian?) and word lengths ( is an integer 1,2,4 or 8 bytes? ) are platform specific.