Remember that a language is a describable set of finite strings, drawn from a fixed alphabet. A grammar is one way to "describe" the language.
The grammar contains terminals and nonterminals. The terminals are the symbols of the alphabet. Nonterminals are other symbols, which act as variables.
The grammar consists of a finite list of rules, where each rule replaces one substring with another. The string on the left must contain at least one nonterminal. The first string "produces" or "generates" the second. Thus a rule is also called a production.
The grammar designates one of the nonterminals as the "start" symbol. This is often denoted s, by convention. Sometimes the terminals are upper case and the nonterminals are lower case, for clarity.
A given string of terminals is in the language determined by the grammar if it can be produced by taking the start symbol and repeatedly replacing substrings with the strings they generate, as defined by the rules of the grammar. The sequence of intermediate strings is the "derivation". Fortunately, a list of rule numbers and offsets conveys the same information, eliminating the need to record each, potentially long, intermediate string.
Similarly, a given string of terminals is in the language if the start symbol can be retroactively obtained by taking the terminal string and repeatedly replacing substrings with the strings that generate them. Productions are applied in reverse, and are called reductions. Of course, both definitions are equivalent; simply reverse the derivation.
At this point we desperately need an example. The following grammar generates all binary palindromes.
s → 0s0 | 1s1 | t
t → 0 | 1 | E
In this notation, | delimits alternative replacement strings associated with the string on the left hand side. Thus the first line actually describes rules 1 2 and 3, and the second line describes rules 4 5 and 6. The start symbol is s, and E is the empty string.
Using this grammar, and the [rule,offset] notation, the derivation for the word 1011101 is:
[2,1] [1,2] [2,3] [3,4] [5,4]
In this simple example the offsets need not be specified, since each intermediate string contains exactly one nonterminal. But that is rare.
Set theory generates the finite ordinals, or if you prefer, the natural numbers. Relations and functions can also be built. We can therefore take the first step, and write a formula for "is a string of terminals".
A set w is a string of terminals if there is some number n, such that w is a function from the integers 1 to n into the set of terminals, which is also known as the alphabet. Thus w is a sequence of terminals, w1 w2 w3 etc. If n = 0 then w is the empty sequence, the empty word. Some languages do admit the empty word.
Along with the alphabet, we have a set of nonterminals, and the union, terminals ∪ nonterminals, forms the set of symbols. We can build a sequence of symbols, just as we built a sequence of terminals. There is therefore a formula that asserts IsSequenceOfSymbols(x) for any set x.
A derivation d is a sequence of strings of symbols. Again, each integer from 1 to n is mapped to a set, and this set must be a sequence of symbols, as described above.
To show w is in the language, we write something like this. There is a derivation d, of length n, such that d1 = s (the start symbol), and dn = w (the candidate word), and w is a string of terminals, and for every j from 1 to n, there is a k, such that dj produces dj+1 using rule k in the grammar. Obviously the last step is the tricky part.
How can each production in the derivation be translated into logic? One string produces the next if there is some offset m, so that the strings are equal up to m, then the substrings after m match the left and right sides of rule k respectively, then the strings are equal thereafter. We can "or" all the rules of the grammar together to say the substrings after m match the left and right sides of rule 1, or rule 2, or rule 3, etc.
I'm glossing over a lot of technical details, but I think you can see that the entire grammar can be encoded in a very large formula in logic. this formula is applied to the set of all finite strings of terminals, and the language defined by the grammar is indeed a set.
If you're a fan of cardinality, you'll notice there are collections of words that are not languages. This is true even if the alphabet is as simple as possible, i.e. the single letter A. Words now look like AA, AAAAAA, AAAAAAAAAAAA, etc, and can be equated with nonnegative numbers, representing word length. There are uncountably many infinite collections of numbers, but the grammars that might generate these collections are countable. Therefore some collections of words are left on the cuttingroom floor.
Using a similar argument, there are countably many chains of inference that start with the axioms of set theory and demonstrate a collection of numbers is indeed a set. Thus countably many collections of integers are sets, and there are lots of collections of strings of A's that are not sets, and not languages, since every language is a set.