The abstract multiplicative operator *, when applied to sets, often means a cross product, so A*B might be comfused with the direct product of A and B. Similarly, + often indicates a direct product when the underlying groups are abelian, which is the case for modules. Therefore I will use the times symbol (A×B), or juxtaposition (ab) to indicate tensor product.
Don't confuse bilinearity with a group homomorphism on the direct product A*B. They are not the same at all. Bilinearity means f(x1,y) + f(x2,y) = f(x1+x2,y). If this were a group homomorphism on A*B, then f(x1,y) + f(x2,y) would equal f(x1+x2,2y).
Let A and B be the integers, denoted Z. Now integer multiplication is an example of a bilinear map from Z cross Z onto Z. Use the distributive property to show f respects addition in A and in B. Use the associative property of multiplication to show f(xd,y) = f(x,dy) = xdy. This map, integer multiplication, is not a group homomorphism on Z*Z. If it were, f(1+1,1+1) would equal 2, rather than 4.
Note that 0 in either component maps to 0. If f(0,x) = z, use linearity to write: z + z = f(0,x) + f(0,x) = f(0+0,x) = z. In an abelian group, this implies z = 0.
For T to be a tensor product it must be universal. Here's what we mean by universal. Consider any other bilinear map g(A,B) into another abelian group U. There is a unique group homomorphism h(T) into U satisfying fh = g.
Suppose U is also universal. There are unique homomorphisms from T to U, and from U back to T, that make the diagrams commute. Combine these homomorphisms to give an endomorphism e on T, such that fe = f. Let I be the identity map on T, and fI = f. Now the function that completes the diagram is suppose to be unique, hence I = e. The composition of the two homomorphisms gives the identity map on T, and the same can be done in reverse to give the identity map on U. Thus T and U are isomorphic, and after relabeling, f and g are essentially the same functions.
So the tensor product is unique, but does it exist? Let's try to build it.
For each x in A and each y in B, let xy be a generator of the free abelian group S. Remember, each of these symbols is independent in S. There is no meaningful way to add x1y1 + x2y2, even though x1+x2 is a particular element in A, and y1+y2 is a particular element in B. One can however add x1y3 + x1y3, giving 2x1y3. This is what we mean by a free group.
Next, build a collection of relations that will span a subgroup K inside S. For instance, let x1+x2 = x3 in A. This gives the relation x3y-x1y-x2y, a generator in the kernel K. Do this for all pairs x1 x2 in A and all y in B, then do the same for all pairs y1 y2 in B and all x in A. Finally include relations for scaling by d in R. Since (xd)y equals x(dy), we have the generator (xd)y-x(dy). All these generators span a kernel K, and the quotient S/K = T is the tensor product.
Of course we have to prove T is the tensor product, and that is rather technical. First show f(x,y) → xy is a bilinear map. This is straightforward given the relations in the kernel K.
Now let g(A,B) map into an abelian group U. We need a function h that makes the diagram commute. Start by defining h on the generators of S. Specifically, h(xy) = g(x,y). Then extend this definition to all of S. We can do this because S is a free group. Thus h is a group homomorphism from S into U.
What happens to K? Consider a generator of K, and pull back to the corresponding sum in A cross B, then map this forward to U. The result has to be 0. This is because g is a bilinear map. The generators of K are the bilinear relations, and these have to go to 0 in any bilinear map. Thus h(K) = 0.
Two elements in the same coset of K map to the same element in U. Therefore the cosets of K, which are the elements of T, have well defined images in U. Furthermore, the map from T into U is a group homomorphism.
By construction h makes the diagram commute. In other words, fh = g.
Finally, any deviation from h would force g(x,y) ≠ h(f(x,y)) for some pair xy, hence h is unique. That completes the proof.
Consider any of the generators of K, and apply the action of c in R. The result is another generator of K. Thus K is an R submodule of S, and T is a quotient R module. We didn't have to do anything to T to make it an R module; it just is.
Apply c to x, then evaluate f(cx,y). This is the same as c times f(x,y). In other words, our bilinear map f respects the action of R.
Let U be an R module, and let g be a bilinear map from A cross B into U that respects the action of R. Let h be the unique group homomorphism from T into U. Now h(c*xy) = h((cx)y) = g(cx,y) = c*g(x,y) = c*h(xy). The unique group homomorphism has become an R module homomorphism.
This suggests another category, where an object O is an R module and an R bilinear homomorphism from A cross B into O. Morphisms are R module homomorphisms between objects that cause the diagram to commute, and the tensor product is the universal object in this category.
The tensor product is constructed as before. It becomes an R module in a natural way, and the resulting bilinear map respects the action of R per component. Furthermore, the unique group homomorphism from T into U is an R module homomorphism, hence T remains universal in its new category.
At this point the word bilinear becomes ambiguous. A bilinear function f respects addition per component, and allows the action of R to pass between components; but what happens if you scale a component by an element in R? If the function scales accordingly, I call it R bilinear, but some books use the word bilinear for this situation. This is understandable, since a linear function, e.g. on a vector space, is a module homomorphism, so perhaps a bilinear function should, by default, respect addition and scaling per component. Perhaps - I don't have a good answer to this one. Most of the time it doesn't matter. When it does, I'll try to say bilinear for the basic definition and R bilinear for a bilinear function that respects the action of R per component.
Think of A as the free group/module on its generators, mod some kernel KA. The relations in KA become relations in the kernel K, which leads to the tensor product. For instance, let w be a relation in KA, hence w is a linear combination of generators of A that yields 0. Perhaps w = 17x3+9x5-8x9. Select any y in B and cross y with w, giving 17x3y+9x5y-8x9y. This becomes a relation in K.
We don't have to bring in every relation in KA, i.e. every linear combination that yields 0; a set of generating relations will do. Bring in x1-x2, and x2-x3, and x1-x3 is implied. After all, these relations wind up acting as generators for K, so all of KA is implied. Similar results hold for KB.
We don't have to join A relation in Ka with each y in B; we can restrict attention to the generators of B. Similarly, the generating relations of KB join with the generators of A.
The relations that pass the action of R between A and B, (xd)y-x(dy), are still in K, as they were before. This time we only need include relations for the generators of A and B.
finally K is a subgroup or submodule of S. The quotient is T, which becomes the tensor product.
Once again we must prove T is the tensor product, but the proof is similar to that given above. Consider the question of linearity. Let w1 + w2 = w3 in A. Each is a linear combination of generators in A, and w3-w1-w2 produces a linear combination that yields 0. This in turn is a linear combination of relations in KA. Cross this with y and apply f. The result lies in K, which is 0 in T. Thus f(w1y) + f(w2y) = f(w3y).
You can verify the rest of the proof yourself.
It follows that the tensor of finitely generated modules is finitely generated, and the tensor of finitely presented modules is finitely presented.
With R commutative, let's have a look at associativity. Choose your favorite generators for the three modules A B and C. Now A×B can be expressed using generators and relations. In fact, that is how A×B was constructed. When this is tensored with C, the resulting generators are the generators of A, cross the generators of B, cross the generators of C. These would be the generators whether we derive (A×B)×C, or A×(B×C). In other words, the result is symbolically symmetric.
What can we say about the kernel of (A×B)×C? To assure linearity in C, cross xy with the relations that generate KC. This is done for all x in A and for all y in B. Similarly, the relations that lead to T are crossed with each z in C. But what are the relations that produce T? They are the relations of K, as defined in the construction of A tensor B. These are the relations of KA, crossed with every y in B, and the relations of KB, crossed with every x in A. Put this all together and find triples with two variables, and one relation drawn from KA, KB, or KC. This is symbolically symmetric.
Finally we need to pass the action of R between the two operands of T×C. This produces relations of the form (d(xy))z-(xy)(dz) We are interpreting T is an R module, hence d(xy) is the same as dx cross y or x cross dy. So we have the relations (dx)yz-xy(dz) and x(dy)z-xy(dz). In addition, (dx)y-x(dy) is a relation that builds T, and this must be crossed with z, giving (dx)yz-x(dy)z. Put this all together and find relations that look like xyz-xyz, but one variable is scaled by d in the first term and another variable is scaled by d in the second term. This is symbolically symmetric. As an abelian group, (A×B)×C is the same as A×(B×C).
Finally look at the action of R. Given a triple xyz, which acts as one of our generators, and an element d in R, scale one of the variables by d. This is the action of R. It is well defined, because scaling x by d is the same as scaling y by d, is the same as scaling z by d. Therefore we obtain the same R module, and tensor product is associative.
Notice that f(x,y,z) → xyz defines an R trilinear map from A cross B cross C into the tensor product. This map is linear in A B and C (per component), and respects the action of R (per component), and allows the action of R to be passed between components. acting on finitely many modules.
Assume a trilinear map carries A cross B cross C into an R module M. Let T equal A×B. Ignoring C for the moment, the trilinear map becomes bilinear from A cross B into M. Vector through T, so that A cross B maps onto T, and into M. (Remember, we can always do this because T is universal.) Combine this with a second bilinear map from T cross C into M. Set A = B = 0 to see how C maps into M, and that, along with the homomorphism from T into M, determines the second bilinear map, which must indeed be biliniear, since the original map was trilinear. The image of A cross B cross C in M is the same either way. Here again, T×C becomes a universal object. A unique homomorphism from T×C into M completes the diagram, and is compatible with the original trilinear map. In summary, given a trilinear map from A cross B cross C into M, a unique module homomorphism carries A×B×C into M and completes the diagram.
All this generalizes to the tensor product of finitely many R modules. The resulting module T is universal in a category of multilinear functions and compatible homomorphisms. I'm skipping past the details, but I think it's pretty intuitive.
The free group S is the generators of A cross the generators of B. The kernel K is generated by the relations that produce A, cross each generator of B, and the relations that yield each component module Bi, cross the generators of A. We also have relations that pass the action of R between the generators of A and the generators of each Bi.
Let Ti be the submodule of T spanned by the generators of A cross the generators of Bi. Clearly the submodules Ti span T. We want to show T is the direct sum over Ti.
Let a sum of elements wi from two or more submodules Ti produce 0. Pulling back to S, the sum over wi lies in K. The sum is equal to a linear combination of relations drawn from K. Since S is the direct sum over Si, and B is the direct sum over Bi, each wi is spanned by the relations of K that come from Bi. Thus each wi already lies in K, and is 0 in T. The submodules Ti are linearly independent, and T is the direct sum over Ti. Therefore, direct sum commutes with tensor product.
Write f(R,M) = R*M, i.e. the action of R on M. Verify that this is a bilinear map from R cross M onto M.
Let g(R,M) be some other bilinear map into U. We need a function h that completes the diagram. Given c in M, pull back to xy in R cross M. Since R contains 1, this is the same as 1(xy), which is the same as 1c. All the preimages of c are equivalent to 1c, and must map to the same element in U, which we will call h(c). Thus h is well defined, and unique, and the diagram commutes. Pull c+d back through R cross M, and forward to U, to show h is a group homomorphism; and if R is commutative h is a module homomorphism. Therefore R tensor M = M, with a canonical bilinear map.
Let W be a free R module with rank j. In other words, W = Rj. Since tensor and direct sum commute, W×M = Mj, i.e. the direct sum of j copies of M.