|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.millscript.commons.xml.tokenizer.AbstractXmlTokenizerImpl
public abstract class AbstractXmlTokenizerImpl
This class provides an XmlTokenizer implementation that breaks
an XML document into tokens, such as a start tag, end tag, character data,
etc. This tokenizer will only perform a minimum number of well-formedness
checks, such as for illegal characters, attributes, etc. This tokenizer does
not perform checks such as for matching start/end tags, or that a DTD
appears at the start of a document.
| Field Summary | |
|---|---|
protected int |
columnNumber
The number of the current character on the current line. |
protected int |
lineNumber
The current line number. |
| Constructor Summary | |
|---|---|
protected |
AbstractXmlTokenizerImpl(AbstractXmlTokenizerImpl axti)
Constructs a new XML tokenizer which will copy it's state from the specified existing tokenizer. |
protected |
AbstractXmlTokenizerImpl(AbstractXmlTokenizerImpl axti,
java.io.Reader rr)
Constructs a new XML tokenizer which will copy it's state from the specified existing tokenizer, but will use the specified reader instead of the one from the existing tokenizer. |
protected |
AbstractXmlTokenizerImpl(java.io.InputStream is,
java.nio.charset.Charset cs,
boolean namespaceAware)
Constructs a new XML tokenizer to read from the specified input stream, using the specified character set, with optional namespace support. |
protected |
AbstractXmlTokenizerImpl(java.io.Reader r,
boolean namespaceAware)
Constructs a new XML tokenizer to read from the specified reader, with optional namespace support. |
| Method Summary | |
|---|---|
void |
appendCurrentTokenData(char ch)
Appends the specified char to the current token. |
void |
dropS()
Drops and characters from the input stream that match the S
production in the XML specification. |
char |
getChar()
Returns the next character from the input stream, throwing an alert if the end of file is reached. |
int |
getIntChar()
Returns the raw int version of the next char, handling any
push back characters and XML version dependencies. |
int |
getLineNumber()
Returns the current one-based line number in the source document. |
char |
getQuoteChar()
Returns the next char, checking that it is a legal quote
character. |
abstract int |
handleIntChar(int ch)
Handles the specified character, performing any XML version dependent line break conversions and checks on it's validity. |
boolean |
hasNextToken()
Indicates if this XML tokenizer has any more tokens to return. |
abstract boolean |
isChar(int ch)
Tests if the specified character matches the Char
production in the XML specification. |
abstract boolean |
isNameChar(char ch)
Tests if the specified character matches the NameChar
production in the XML specification. |
abstract boolean |
isNameStartChar(char ch)
Tests if the specified character matches the NameStartChar
production in the XML specification. |
boolean |
isNCNameChar(char ch)
Tests if the specified character matches the NCNameChar
production in the XML namespace specification. |
boolean |
isNCNameStartChar(char ch)
Tests if the specified character matches the NCNameStartChar production in the XML namespace
specification. |
boolean |
isS(int ch)
Tests if the specified character matches the S production
in the XML specification. |
void |
mustRead(char testch)
Tests that the next character is the specified one, otherwise it throws an Alert. |
void |
mustReadEq()
Tests if the next input sequence matches the Eq production
in the XML specification, otherwise it throws an Alert. |
void |
mustReadS()
Tests if the next input sequence matches the S production
in the XML specification, otherwise it throws an Alert. |
Token |
nextToken()
Returns this tokenizers next token. |
boolean |
peekRead(char testch)
Tests that the next character is the specified one. |
boolean |
peekS()
Tests if the next available character matches the S
production in the XML specification. |
void |
pushBack(char ch)
Pushes back the specified character so it will be the next one returned by the getChar() method. |
void |
pushBack(java.lang.String s)
Pushes back all the characters in the string, so they will be returned by subsequent calls to the getChar() method. |
AttListDeclToken |
readAttlistDecl()
Returns the next input sequence as an attribute list declaration token. |
java.lang.String |
readAttValue()
Returns the next input sequence as an attribute value string. |
CharDataToken |
readCDSect()
Returns the next input sequence as a CDATA section. |
CharDataToken |
readCharData()
Returns the next input sequence as a character data token. |
CommentToken |
readComment()
Returns the next input sequence as a comment token. |
DTDToken |
readDoctypeDecl()
Returns the next input sequence as an document type declaration token. |
ElementDeclToken |
readElementDecl()
Returns the next input sequence as an element declaration token. |
java.lang.String |
readEncodingDecl()
Returns the next input sequence as an encoding declaration. |
EntityDeclToken |
readEntityDecl()
Returns the next input sequence as an entity declaration token. |
EndTagToken |
readETag()
Returns the next input sequence as an end tag token. |
void |
readIntSubset()
Reads the next input sequence as the internal subset of a document type declaration. |
java.lang.String |
readNmtoken()
Returns the next input sequence as an nmtoken. |
NotationDeclToken |
readNotationDecl()
Returns the next input sequence as a notation declaration token. |
PIToken |
readPI()
Returns the next input sequence as a processing instruction token. |
java.lang.String |
readPubidLiteral()
Returns the next input sequence as a public literal. |
EntityImpl |
readReference()
Returns the next input sequence as an entity reference. |
java.lang.String |
readSDDecl()
Returns the next input sequence as a standalone declaration. |
StartTagToken |
readSTag()
Returns the next input sequence as a start tag token. |
java.lang.String |
readSystemLiteral()
Returns the next input sequence as a system literal. |
java.lang.String |
readVersionInfo()
Returns the next input sequence as a version declaration. |
void |
setNamespaces(org.millscript.commons.util.IMap<java.lang.String,java.lang.String> spaces)
Sets the mapping of namespace prefix to namespace IRI for tokenizing subsequent prefixed and unprefixed names. |
boolean |
tryRead(char testch)
Tests that the next character is the specified one. |
boolean |
tryRead(char testch,
char testch2)
Tests that the next characters match the two character sequence. |
boolean |
tryReadS()
Tests if the next available characters match the S
production in the XML specification. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected int columnNumber
protected int lineNumber
| Constructor Detail |
|---|
protected AbstractXmlTokenizerImpl(java.io.InputStream is,
java.nio.charset.Charset cs,
boolean namespaceAware)
is - the InputStream to read fromcs - the Charset to decode the
InputStream withnamespaceAware - indicates if the tokenizer should be namespace
aware
protected AbstractXmlTokenizerImpl(java.io.Reader r,
boolean namespaceAware)
r - the Reader to obtain characters fromnamespaceAware - indicates if the tokenizer should be namespace
awareprotected AbstractXmlTokenizerImpl(AbstractXmlTokenizerImpl axti)
axti - the existing tokenizer to copy state from
protected AbstractXmlTokenizerImpl(AbstractXmlTokenizerImpl axti,
java.io.Reader rr)
axti - the existing tokenizer to copy state fromrr - the new reader this tokenizer should read characters from| Method Detail |
|---|
public void appendCurrentTokenData(char ch)
char to the current token.
ch - the char to appendpublic void dropS()
S
production in the XML specification.
[3] S ::= (#x20 | #x9 | #xD | #xA)+
public char getChar()
char from the input streampublic int getIntChar()
int version of the next char, handling any
push back characters and XML version dependencies. This method accounts
for the set of legal characters in an XML document.
int version of the next char or
-1 if there are no more characterspublic int getLineNumber()
XmlTokenizer
getLineNumber in interface XmlTokenizerint value for the one-based line number in the
source documentXmlTokenizer.getLineNumber()public char getQuoteChar()
char, checking that it is a legal quote
character.
char, if it is a legal quote characterpublic abstract int handleIntChar(int ch)
ch - the character to test
public boolean hasNextToken()
XmlTokenizer
hasNextToken in interface XmlTokenizertrue if this tokenizer has any more tokens to
returnXmlTokenizer.hasNextToken()public abstract boolean isChar(int ch)
Char
production in the XML specification.
ch - the character to test
true if the character is a Char and
false otherwisepublic abstract boolean isNameChar(char ch)
NameChar
production in the XML specification.
ch - the character to test
true if the character is a NameChar
and false otherwisepublic abstract boolean isNameStartChar(char ch)
NameStartChar
production in the XML specification.
ch - the character to test
true if the character is a
NameStartChar and false otherwisepublic boolean isNCNameChar(char ch)
NCNameChar
production in the XML namespace specification.
ch - the character to test
true if the character is a NCNameChar
and false otherwisepublic boolean isNCNameStartChar(char ch)
NCNameStartChar production in the XML namespace
specification.
ch - the character to test
true if the character is a
NCNameStartChar and false otherwisepublic boolean isS(int ch)
S production
in the XML specification.
[3] S ::= (#x20 | #x9 | #xD | #xA)+
ch - the character to test
true if the character is a S character
and false otherwisepublic void mustRead(char testch)
testch - the character we must read nextpublic void mustReadEq()
Eq production
in the XML specification, otherwise it throws an Alert. If the sequence
matches, it will be dropped.
[25] Eq ::= S? '=' S?
public void mustReadS()
S production
in the XML specification, otherwise it throws an Alert. If the sequence
matches, it will be dropped.
[25] Eq ::= S? '=' S?
public Token nextToken()
XmlTokenizer
nextToken in interface XmlTokenizerTokenXmlTokenizer.nextToken()public boolean peekRead(char testch)
testch - the character to test for.
true if the character is the required one and
false otherwisepublic boolean peekS()
S
production in the XML specification.
[3] S ::= (#x20 | #x9 | #xD | #xA)+
true if the next character is a S
character and false otherwisepublic void pushBack(char ch)
getChar() method.
ch - the char to push backpublic void pushBack(java.lang.String s)
getChar() method. The characters are
pushed in reverse order, so that the first character in the string will
be the first character returned by getChar().
s - the String to push backpublic AttListDeclToken readAttlistDecl()
Alert if the input sequence doesn't
match the AttlistDecl production in the XML specification.
[52] AttlistDecl ::=''
[53] AttDef ::= S Name S AttType S DefaultDecl
[54] AttType ::= StringType | TokenizedType | EnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID' [VC: ID][VC: One ID per Element Type][VC: ID Attribute Default]
| 'IDREF' [VC: IDREF]
| 'IDREFS' [VC: IDREF]
| 'ENTITY' [VC: Entity Name]
| 'ENTITIES' [VC: Entity Name]
| 'NMTOKEN' [VC: Name Token]
| 'NMTOKENS' [VC: Name Token]
[57] EnumeratedType ::= NotationType | Enumeration
[58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' [VC: Notation Attributes][VC: One Notation Per Element Type][VC: No Notation on Empty Element][VC: No Duplicate Tokens]
[59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' [VC: Enumeration] [VC: No Duplicate Tokens]
[60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue) [VC: Required Attribute][VC: Attribute Default Value Syntactically Correct][WFC: No < in Attribute Values][VC: Fixed Attribute Default]
When this method is called the identifying sequence, i.e.
'<!ATTLIST', and it should NOT be expected.
AttListDeclToken for the attribute list
declarationpublic java.lang.String readAttValue()
Alert if the input sequence doesn't match the
AttValue production in the XML specification.
String holding the attribute valuepublic CharDataToken readCDSect()
Alert if the input sequence doesn't match the
CDSect production in the XML specification.
[18] CDSect ::= CDStart CData CDEnd [19] CDStart ::= '' Char*)) [21] CDEnd ::= ']]>'
When this method is called the first three characters
'<![' will have already been processed and should NOT be
expected.
CharDataToken for the CDATA sectionpublic CharDataToken readCharData()
Alert if the input sequence doesn't match the
CharData production in the XML specification.
[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
CharDataToken for the character datapublic CommentToken readComment()
Alert if the input sequence doesn't match the
CharData production in the XML specification.
[15] Comment ::= ''
When this method is called the first four characters
'<!--' will have already been processed and should NOT
be expected.
CommentToken for the commentpublic DTDToken readDoctypeDecl()
Alert if the input sequence doesn't
match the doctypedecl production in the XML specification.
[28] doctypedecl ::= '' [VC: Root Element Type] [WFC: External Subset] [75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral
When this method is called the identifying sequence, i.e.
'<!DOCTYPE', and it should NOT be expected.
DTDToken for the document type declarationpublic ElementDeclToken readElementDecl()
Alert if the input sequence doesn't match
the elementdecl production in the XML specification.
[45] elementdecl ::= '' [VC: Unique Element Type Declaration]
When this method is called the identifying sequence, i.e.
'<!ELEMENT', and it should NOT be expected.
ElementDeclToken for the element declarationpublic java.lang.String readEncodingDecl()
Alert if the input sequence doesn't match the
encodingDecl production in the XML specification.
[80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" )
[81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
String holding the value of the encoding
declarationpublic EntityDeclToken readEntityDecl()
Alert if the input sequence doesn't match
the EntityDecl production in the XML specification.
[9] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"'
| "'" ([^%&'] | PEReference | Reference)* "'"
[70] EntityDecl ::= GEDecl | PEDecl
[71] GEDecl ::= ''
[72] PEDecl ::= ''
[73] EntityDef ::= EntityValue| (ExternalID NDataDecl?)
[74] PEDef ::= EntityValue | ExternalID
When this method is called the identifying sequence, i.e.
'<!ENTITY', and it should NOT be expected.
EntityDeclToken for the entity declarationpublic EndTagToken readETag()
Alert if the input sequence doesn't match the
ETag production in the XML specification.
[42] ETag ::= '' Name S? '>'
When this method is called the first two characters '</'
will have already been processed and should NOT be expected.
EndTagToken for the end tagpublic void readIntSubset()
Alert if the input
sequence doesn't match the intSubset production in the XML
specification.
[28a] DeclSep ::= PEReference | S [WFC: PE Between Declarations] [28b] intSubset ::= (markupdecl | DeclSep)* [29] markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment [69] PEReference ::= '%' Name ';' [VC: Entity Declared] [WFC: No Recursion] [WFC: In DTD]
public java.lang.String readNmtoken()
Alert if the input sequence doesn't match the
Nmtoken production in the XML specification.
[7] Nmtoken ::= (NameChar)+
String holding the Nmtokenpublic NotationDeclToken readNotationDecl()
Alert if the input sequence doesn't match
the NotationDecl production in the XML specification.
[82] NotationDecl ::= '' [VC: Unique Notation Name] [83] PublicID ::= 'PUBLIC' S PubidLiteral
When this method is called the identifying sequence, i.e.
'<!NOTATION', and it should NOT be expected.
NotationDeclToken for the notation declarationpublic PIToken readPI()
Alert if the input sequence doesn't match
the PIToken production in the XML specification.
[16] PI ::= '' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
When this method is called the first two characters '<?'
will have already been processed and should NOT be expected.
PIToken for the processing instructionpublic java.lang.String readPubidLiteral()
Alert if the input sequence doesn't match the
PubidLiteral production in the XML specification.
[12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]
String holding the public identifierpublic EntityImpl readReference()
Alert if the input sequence doesn't match the
Reference production in the XML specification.
When this method is called the first character '' will
have already been processed and should NOT be expected.
Entity for the referencepublic java.lang.String readSDDecl()
Alert if the input sequence doesn't match the
SDDecl production in the XML specification.
[32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))
String holding the value of the standalone
declarationpublic StartTagToken readSTag()
Alert if the input sequence doesn't match the
STag or EmptyElemTag production in the XML
specification.
[44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [WFC: Unique Att Spec] [40] STag ::= '<' Name (S Attribute)* S? '>' [WFC: Unique Att Spec] [41] Attribute ::= Name Eq AttValue [VC: Attribute Value Type] [WFC: No External Entity References] [WFC: No < in Attribute Values]
When this method is called the first character '<' will
have already been processed and should NOT be expected.
StartTagToken or EmptyElemToken for
the next tagpublic java.lang.String readSystemLiteral()
Alert if the input sequence doesn't match the
SystemLiteral production in the XML specification.
[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")
String holding the system identifierpublic java.lang.String readVersionInfo()
Alert if the input sequence doesn't match the
VersionInfo production in the XML specification.
[24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
String holding the value of the version
declarationpublic void setNamespaces(org.millscript.commons.util.IMap<java.lang.String,java.lang.String> spaces)
XmlTokenizer
setNamespaces in interface XmlTokenizerspaces - an IMap containing the namespace prefix
to IRI mapping for subsequent namesXmlTokenizer.setNamespaces(org.millscript.commons.util.IMap)public boolean tryRead(char testch)
testch - the character to test for.
true if the character is the required one and
false otherwise
public boolean tryRead(char testch,
char testch2)
testch - the first character to test for.testch2 - the second character to test for.
true if both characters match and
false otherwisepublic boolean tryReadS()
S
production in the XML specification. Any sequence of matching characters
will be dropped.
[3] S ::= (#x20 | #x9 | #xD | #xA)+
true if the any characters matched the
S production and false otherwise
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||