|
<< Back
|
Extended Backus-Naur Form (EBNF) is a metalanguage for defining the syntax of a context-free language. It is almost universally used to define programming langues. A context-free grammar can define a language with several kinds of symbols that must occur in matching pairs (like BEGIN and END) and can be nested, to any depth. A context-free language cannot define the parts of a language that depend on the context in which a symbol is written. For example, it cannot define the type-matching rules of C.
A small set of meta-symbols are part of EBNF itself. These are:
- A definition symbol such as "::=" or simply "=" that is used to define the meaning of a nonterminal symbol.
- Grouping symbols, normally large parentheses, that mark the beginning and end of a series of symbols that are to be treated as a unit.
- Large square brackets are use to mark the beginning and end of an optional series of symbols.
- Large braces (curly brackets) are use to delimit a series of symbols that may be omitted altogether or repeated as many times as needed.
- A large vertical bar, written between units, denotes alternatives.
In addition to the metasymbols, an EBNF grammar has four parts:
- A set of terminal symbols, that is, symbols that are part of the final programs that one can derive using the grammar. Terminal symbols are written in boldface, or underlined, or written in single quotes to distinguish them from other symbols.
- A set of nonterminal symbols. These are syntactic categories such as "expression" or "statement" that are used to define the structure of a well-formed program. They are normally written in lower case, not boldface, and often enclosed in angle brackets.
- One non-terminal symbol must be designated as the "starting symbol". In the official grammar for Pascal, that symbol is "program".
- A set derivation rules, formally called "productions". Each rule starts with the non-terminal symbol that is being defined, followed by the definition symbol. After that is a series of terminal symbols, nonterminal symbols, and meta-symbols that define the structure of one small part of the language. In some versions of EBNF, each rule ends in a period.
For example, here is an EBNF grammar for a silly language called "blab":
- Terminal symbols a, b, c, x, z, (, ), and ;
- Nonterminal symbols: shaker, widget, basket, hamper .
- The starting symbol is shaker.
- Derivation rules:
- shaker ::= widget '(' ( basket | hamper ) ')' widget .
- basket ::= 'a' [ basket ] 'z' .
- widget ::= 'b' | 'b x' widget .
- hamper ::= 'c' { '; c ' } .
In rule 1, note the difference between the parentheses that are metasymbols and the ones that are terminal symbols. This rule also illustrates the typical pattern for defining matched pairs of symbols (such as parentheses.)
Rule 2 illustrates repetition by nesting. This rule could produce any number of a's followed by the same number of z's.
Rule 3 illustrates repetition by recursion. Such rules must have two alternatives, a non-recursive case and the recursive case. If the recursion is at the right end of the rule, it is easy for a compiler to parse. Recursions on the left are harder to work with. Here are a few widgets: b b x b b x b x b b x b x b x b
Rule 4 gives us one c or a series of any number of c's separated by semicolons. Rules like this are used very often in practice.
The shortest two blab sentences are: b ( c ) b and b ( a z ) b
A longer sentence is: b ( c; c; c; c; c; c; c; c; c; c; c; c; c ) b
A sentence with nested repetition is: b ( a a a z z z ) b